native openai web search ability!

* feat: local passthrough for Responses tools via responses_tools + responses_tool_choice (behind CHATMOCK_ALLOW_RESPONSES_TOOLS) * feat: gate Responses tools passthrough behind CHATMOCK_ALLOW_RESPONSES_TOOLS (default OFF) * test(docs): add pytest for Responses tools passthrough (default off), and README usage section * feat: responses tools hardening (fallback on 400, host allowlist, size guard, tool_choice strings only); tests updated * feat: enable Responses tools passthrough by default; remove env gate - Tools forwarded whenever is present - Keep size guard and optional MCP host allowlist - Accept strings unconditionally Tests: - Update to cover default passthrough and baseline (no responses_tools) Docs: - README: update instructions; move Star History to bottom * chore: clean imports/comments; use gpt-5 in examples and tests * docs: tighten Responses tools README; fix gpt-5 example\nchore: remove feature-specific test per review; trim comments/imports * chore: remove __pycache__/ and bytecode; add .gitignore * chore: add .gitignore for caches and bytecode * Update README.md * fix: remove MCP passthrough; allow only web_search in responses_tools - Reject non-`web_search` types with 400 (`RESPONSES_TOOL_UNSUPPORTED`). - Drop MCP host allowlist logic and related import. - Keep size guard via `RESPONSES_TOOLS_MAX_BYTES` and fallback retry without extras. - Docs: update README to state web_search-only passthrough. Runtime verified locally with a stubbed upstream: - OK: `responses_tools: [{"type": "web_search"}]` -> 200. - BAD: `responses_tools: [{"type": "mcp"}]` -> 400 `RESPONSES_TOOL_UNSUPPORTED`. * feat: forward Responses web_search tool via Chat Completions; fallback on rejection - Accept `responses_tools` array and filter to `type: web_search` only. - Enforce size guard `RESPONSES_TOOLS_MAX_BYTES` (default 32768). - Fallback: if upstream rejects tools, retry without extras; otherwise return `RESPONSES_TOOLS_REJECTED`. - README: document web_search-only passthrough and example. - Headers: hint experimental features in OpenAI-Beta (responses; web-search). * chore: remove local test-only forcing flag (CHATMOCK_FORCE_WEB_SEARCH) * fix: restore full routes_openai (web_search-only passthrough + endpoints) - Undo accidental large deletion from prior cleanup. - Keep `web_search` passthrough, size guard, and fallback. - Preserve `/v1/completions` and `/v1/models` endpoints and SSE handling. * Update upstream.py * Update upstream.py * Update README.md * Update README.md * Update routes_openai.py * feat(openai): default-enable web_search; accept preview; quiet retry; rm env knob - Injects responses_tools=[{"type":"web_search"}] when client omits tools; explicit opt-out via responses_tool_choice:"none". - Allowlist accepts "web_search" and "web_search_preview"; others rejected with RESPONSES_TOOL_UNSUPPORTED. - Replaces env max-bytes knob with MAX_TOOLS_BYTES=32768. - Retry on upstream rejection is silent; logs only under verbose. * feat(stream): surface web_search_call as tool_calls; aggregate args; verbose-only logs - Translates Responses web_search_call.* and output_item.done into OpenAI-style delta.tool_calls. - Aggregates parameters by call_id (query/q, recency/time_range/days, domains/include/include_domains/include, max_results/topn/limit). - No inference; arguments remain "{}" if upstream provides none. Logs only when verbose. * feat(responses-tools): web_search passthrough; flag; fallback; Ollama parity; stable indexes - Add --enable-web-search (default OFF) to inject web_search when requests omit responses_tools - Allow tool types: web_search and web_search_preview; 32,768-byte cap on serialized responses_tools - OpenAI /v1/chat/completions: passthrough + retry without extras on upstream rejection; return retry status - Streaming: function.arguments always JSON; stable tool_calls index per call_id - Ollama /api/chat: same passthrough + fallback behavior - README updated to match behavior and limits * Update README.md * Update README.md * Update routes_ollama.py * Update routes_openai.py * Update utils.py --------- Co-authored-by: alexx-ftw <alexx-ftw@users.noreply.github.com> Co-authored-by: Game_Time <108236317+RayBytes@users.noreply.github.com>
2025-09-16 13:06:00 +01:00
parent 8d92a63626
commit 2f23cd5a89
7 changed files with 293 additions and 24 deletions
--- a/chatmock/routes_openai.py
+++ b/chatmock/routes_openai.py
@@ -70,6 +70,47 @@ def chat_completions() -> Response:
    tools_responses = convert_tools_chat_to_responses(payload.get("tools"))
    tool_choice = payload.get("tool_choice", "auto")
    parallel_tool_calls = bool(payload.get("parallel_tool_calls", False))
+    responses_tools_payload = payload.get("responses_tools") if isinstance(payload.get("responses_tools"), list) else []
+    extra_tools: List[Dict[str, Any]] = []
+    had_responses_tools = False
+    if isinstance(responses_tools_payload, list):
+        for _t in responses_tools_payload:
+            if not (isinstance(_t, dict) and isinstance(_t.get("type"), str)):
+                continue
+            if _t.get("type") not in ("web_search", "web_search_preview"):
+                return (
+                    jsonify(
+                        {
+                            "error": {
+                                "message": "Only web_search/web_search_preview are supported in responses_tools",
+                                "code": "RESPONSES_TOOL_UNSUPPORTED",
+                            }
+                        }
+                    ),
+                    400,
+                )
+            extra_tools.append(_t)
+
+        if not extra_tools and bool(current_app.config.get("DEFAULT_WEB_SEARCH")):
+            responses_tool_choice = payload.get("responses_tool_choice")
+            if not (isinstance(responses_tool_choice, str) and responses_tool_choice == "none"):
+                extra_tools = [{"type": "web_search"}]
+
+        if extra_tools:
+            import json as _json
+            MAX_TOOLS_BYTES = 32768
+            try:
+                size = len(_json.dumps(extra_tools))
+            except Exception:
+                size = 0
+            if size > MAX_TOOLS_BYTES:
+                return jsonify({"error": {"message": "responses_tools too large", "code": "RESPONSES_TOOLS_TOO_LARGE"}}), 400
+            had_responses_tools = True
+            tools_responses = (tools_responses or []) + extra_tools
+
+    responses_tool_choice = payload.get("responses_tool_choice")
+    if isinstance(responses_tool_choice, str) and responses_tool_choice in ("auto", "none"):
+        tool_choice = responses_tool_choice

    input_items = convert_chat_messages_to_responses_input(messages)
    if not input_items and isinstance(payload.get("prompt"), str) and payload.get("prompt").strip():
@@ -100,12 +141,41 @@ def chat_completions() -> Response:
            err_body = json.loads(raw.decode("utf-8", errors="ignore")) if raw else {"raw": upstream.text}
        except Exception:
            err_body = {"raw": upstream.text}
-        if verbose:
-            print("Upstream error status=", upstream.status_code, " body:", json.dumps(err_body)[:2000])
-        return (
-            jsonify({"error": {"message": (err_body.get("error", {}) or {}).get("message", "Upstream error")}}),
-            upstream.status_code,
-        )
+        if had_responses_tools:
+            if verbose:
+                print("[Passthrough] Upstream rejected tools; retrying without extra tools (args redacted)")
+            base_tools_only = convert_tools_chat_to_responses(payload.get("tools"))
+            safe_choice = payload.get("tool_choice", "auto")
+            upstream2, err2 = start_upstream_request(
+                model,
+                input_items,
+                instructions=BASE_INSTRUCTIONS,
+                tools=base_tools_only,
+                tool_choice=safe_choice,
+                parallel_tool_calls=parallel_tool_calls,
+                reasoning_param=reasoning_param,
+            )
+            if err2 is None and upstream2 is not None and upstream2.status_code < 400:
+                upstream = upstream2
+            else:
+                return (
+                    jsonify(
+                        {
+                            "error": {
+                                "message": (err_body.get("error", {}) or {}).get("message", "Upstream error"),
+                                "code": "RESPONSES_TOOLS_REJECTED",
+                            }
+                        }
+                    ),
+                    (upstream2.status_code if upstream2 is not None else upstream.status_code),
+                )
+        else:
+            if verbose:
+                print("Upstream error status=", upstream.status_code)
+            return (
+                jsonify({"error": {"message": (err_body.get("error", {}) or {}).get("message", "Upstream error")}}),
+                upstream.status_code,
+            )

    if is_stream:
        resp = Response(
@@ -371,3 +441,4 @@ def list_models() -> Response:
    for k, v in build_cors_headers().items():
        resp.headers.setdefault(k, v)
    return resp
+