native openai web search ability!

* feat: local passthrough for Responses tools via responses_tools + responses_tool_choice (behind CHATMOCK_ALLOW_RESPONSES_TOOLS) * feat: gate Responses tools passthrough behind CHATMOCK_ALLOW_RESPONSES_TOOLS (default OFF) * test(docs): add pytest for Responses tools passthrough (default off), and README usage section * feat: responses tools hardening (fallback on 400, host allowlist, size guard, tool_choice strings only); tests updated * feat: enable Responses tools passthrough by default; remove env gate - Tools forwarded whenever is present - Keep size guard and optional MCP host allowlist - Accept strings unconditionally Tests: - Update to cover default passthrough and baseline (no responses_tools) Docs: - README: update instructions; move Star History to bottom * chore: clean imports/comments; use gpt-5 in examples and tests * docs: tighten Responses tools README; fix gpt-5 example\nchore: remove feature-specific test per review; trim comments/imports * chore: remove __pycache__/ and bytecode; add .gitignore * chore: add .gitignore for caches and bytecode * Update README.md * fix: remove MCP passthrough; allow only web_search in responses_tools - Reject non-`web_search` types with 400 (`RESPONSES_TOOL_UNSUPPORTED`). - Drop MCP host allowlist logic and related import. - Keep size guard via `RESPONSES_TOOLS_MAX_BYTES` and fallback retry without extras. - Docs: update README to state web_search-only passthrough. Runtime verified locally with a stubbed upstream: - OK: `responses_tools: [{"type": "web_search"}]` -> 200. - BAD: `responses_tools: [{"type": "mcp"}]` -> 400 `RESPONSES_TOOL_UNSUPPORTED`. * feat: forward Responses web_search tool via Chat Completions; fallback on rejection - Accept `responses_tools` array and filter to `type: web_search` only. - Enforce size guard `RESPONSES_TOOLS_MAX_BYTES` (default 32768). - Fallback: if upstream rejects tools, retry without extras; otherwise return `RESPONSES_TOOLS_REJECTED`. - README: document web_search-only passthrough and example. - Headers: hint experimental features in OpenAI-Beta (responses; web-search). * chore: remove local test-only forcing flag (CHATMOCK_FORCE_WEB_SEARCH) * fix: restore full routes_openai (web_search-only passthrough + endpoints) - Undo accidental large deletion from prior cleanup. - Keep `web_search` passthrough, size guard, and fallback. - Preserve `/v1/completions` and `/v1/models` endpoints and SSE handling. * Update upstream.py * Update upstream.py * Update README.md * Update README.md * Update routes_openai.py * feat(openai): default-enable web_search; accept preview; quiet retry; rm env knob - Injects responses_tools=[{"type":"web_search"}] when client omits tools; explicit opt-out via responses_tool_choice:"none". - Allowlist accepts "web_search" and "web_search_preview"; others rejected with RESPONSES_TOOL_UNSUPPORTED. - Replaces env max-bytes knob with MAX_TOOLS_BYTES=32768. - Retry on upstream rejection is silent; logs only under verbose. * feat(stream): surface web_search_call as tool_calls; aggregate args; verbose-only logs - Translates Responses web_search_call.* and output_item.done into OpenAI-style delta.tool_calls. - Aggregates parameters by call_id (query/q, recency/time_range/days, domains/include/include_domains/include, max_results/topn/limit). - No inference; arguments remain "{}" if upstream provides none. Logs only when verbose. * feat(responses-tools): web_search passthrough; flag; fallback; Ollama parity; stable indexes - Add --enable-web-search (default OFF) to inject web_search when requests omit responses_tools - Allow tool types: web_search and web_search_preview; 32,768-byte cap on serialized responses_tools - OpenAI /v1/chat/completions: passthrough + retry without extras on upstream rejection; return retry status - Streaming: function.arguments always JSON; stable tool_calls index per call_id - Ollama /api/chat: same passthrough + fallback behavior - README updated to match behavior and limits * Update README.md * Update README.md * Update routes_ollama.py * Update routes_openai.py * Update utils.py --------- Co-authored-by: alexx-ftw <alexx-ftw@users.noreply.github.com> Co-authored-by: Game_Time <108236317+RayBytes@users.noreply.github.com>
2025-09-16 13:06:00 +01:00
parent 8d92a63626
commit 2f23cd5a89
7 changed files with 293 additions and 24 deletions
--- a/chatmock/utils.py
+++ b/chatmock/utils.py
@@ -250,6 +250,9 @@ def sse_translate_chat(
    saw_any_summary = False
    pending_summary_paragraph = False
    upstream_usage = None
+    ws_state: dict[str, Any] = {}
+    ws_index: dict[str, int] = {}
+    ws_next_index: int = 0
    
    def _extract_usage(evt: Dict[str, Any]) -> Dict[str, int] | None:
        try:
@@ -284,6 +287,86 @@ def sse_translate_chat(
            if isinstance(evt.get("response"), dict) and isinstance(evt["response"].get("id"), str):
                response_id = evt["response"].get("id") or response_id

+            if isinstance(kind, str) and ("web_search_call" in kind):
+                try:
+                    call_id = evt.get("item_id") or "ws_call"
+                    if verbose and vlog:
+                        try:
+                            vlog(f"CM_TOOLS {kind} id={call_id} -> tool_calls(web_search)")
+                        except Exception:
+                            pass
+                    item = evt.get('item') if isinstance(evt.get('item'), dict) else {}
+                    params_dict = ws_state.setdefault(call_id, {}) if isinstance(ws_state.get(call_id), dict) else {}
+                    def _merge_from(src):
+                        if not isinstance(src, dict):
+                            return
+                        for whole in ('parameters','args','arguments','input'):
+                            if isinstance(src.get(whole), dict):
+                                params_dict.update(src.get(whole))
+                        if isinstance(src.get('query'), str): params_dict.setdefault('query', src.get('query'))
+                        if isinstance(src.get('q'), str): params_dict.setdefault('query', src.get('q'))
+                        for rk in ('recency','time_range','days'):
+                            if src.get(rk) is not None and rk not in params_dict: params_dict[rk] = src.get(rk)
+                        for dk in ('domains','include_domains','include'):
+                            if isinstance(src.get(dk), list) and 'domains' not in params_dict: params_dict['domains'] = src.get(dk)
+                        for mk in ('max_results','topn','limit'):
+                            if src.get(mk) is not None and 'max_results' not in params_dict: params_dict['max_results'] = src.get(mk)
+                    _merge_from(item)
+                    _merge_from(evt if isinstance(evt, dict) else None)
+                    params = params_dict if params_dict else None
+                    if isinstance(params, dict):
+                        try:
+                            ws_state.setdefault(call_id, {}).update(params)
+                        except Exception:
+                            pass
+                    eff_params = ws_state.get(call_id, params if isinstance(params, (dict, list, str)) else {})
+                    if isinstance(eff_params, (dict, list)):
+                        args_str = json.dumps(eff_params)
+                    elif isinstance(eff_params, str):
+                        args_str = json.dumps({"query": eff_params})
+                    else:
+                        args_str = "{}"
+                    if call_id not in ws_index:
+                        ws_index[call_id] = ws_next_index
+                        ws_next_index += 1
+                    _idx = ws_index.get(call_id, 0)
+                    delta_chunk = {
+                        "id": response_id,
+                        "object": "chat.completion.chunk",
+                        "created": created,
+                        "model": model,
+                        "choices": [
+                            {
+                                "index": 0,
+                                "delta": {
+                                    "tool_calls": [
+                                        {
+                                            "index": _idx,
+                                            "id": call_id,
+                                            "type": "function",
+                                            "function": {"name": "web_search", "arguments": args_str},
+                                        }
+                                    ]
+                                },
+                                "finish_reason": None,
+                            }
+                        ],
+                    }
+                    yield f"data: {json.dumps(delta_chunk)}\n\n".encode("utf-8")
+                    if kind.endswith(".completed") or kind.endswith(".done"):
+                        finish_chunk = {
+                            "id": response_id,
+                            "object": "chat.completion.chunk",
+                            "created": created,
+                            "model": model,
+                            "choices": [
+                                {"index": 0, "delta": {}, "finish_reason": "tool_calls"}
+                            ],
+                        }
+                        yield f"data: {json.dumps(finish_chunk)}\n\n".encode("utf-8")
+                except Exception:
+                    pass
+
            if kind == "response.output_text.delta":
                delta = evt.get("delta") or ""
                if compat == "think-tags" and think_open and not think_closed:
@@ -308,10 +391,34 @@ def sse_translate_chat(
                yield f"data: {json.dumps(chunk)}\n\n".encode("utf-8")
            elif kind == "response.output_item.done":
                item = evt.get("item") or {}
-                if isinstance(item, dict) and item.get("type") == "function_call":
+                if isinstance(item, dict) and (item.get("type") == "function_call" or item.get("type") == "web_search_call"):
                    call_id = item.get("call_id") or item.get("id") or ""
-                    name = item.get("name") or ""
-                    args = item.get("arguments") or ""
+                    name = item.get("name") or ("web_search" if item.get("type") == "web_search_call" else "")
+                    raw_args = item.get("arguments") or item.get("parameters")
+                    if isinstance(raw_args, dict):
+                        try:
+                            ws_state.setdefault(call_id, {}).update(raw_args)
+                        except Exception:
+                            pass
+                    eff_args = ws_state.get(call_id, raw_args if isinstance(raw_args, (dict, list, str)) else {})
+                    try:
+                        if isinstance(eff_args, (dict, list)):
+                            args = json.dumps(eff_args)
+                        elif isinstance(eff_args, str):
+                            args = json.dumps({"query": eff_args})
+                        else:
+                            args = "{}"
+                    except Exception:
+                        args = "{}"
+                    if item.get("type") == "web_search_call" and verbose and vlog:
+                        try:
+                            vlog(f"CM_TOOLS response.output_item.done web_search_call id={call_id} has_args={bool(args)}")
+                        except Exception:
+                            pass
+                    if call_id not in ws_index:
+                        ws_index[call_id] = ws_next_index
+                        ws_next_index += 1
+                    _idx = ws_index.get(call_id, 0)
                    if isinstance(call_id, str) and isinstance(name, str) and isinstance(args, str):
                        delta_chunk = {
                            "id": response_id,
@@ -324,7 +431,7 @@ def sse_translate_chat(
                                    "delta": {
                                        "tool_calls": [
                                            {
-                                                "index": 0,
+                                                "index": _idx,
                                                "id": call_id,
                                                "type": "function",
                                                "function": {"name": name, "arguments": args},
@@ -573,3 +680,4 @@ def sse_translate_text(upstream, model: str, created: int, verbose: bool = False
                break
    finally:
        upstream.close()
+