native openai web search ability!

* feat: local passthrough for Responses tools via responses_tools + responses_tool_choice (behind CHATMOCK_ALLOW_RESPONSES_TOOLS)

* feat: gate Responses tools passthrough behind CHATMOCK_ALLOW_RESPONSES_TOOLS (default OFF)

* test(docs): add pytest for Responses tools passthrough (default off), and README usage section

* feat: responses tools hardening (fallback on 400, host allowlist, size guard, tool_choice strings only); tests updated

* feat: enable Responses tools passthrough by default; remove env gate

- Tools forwarded whenever  is present
- Keep size guard and optional MCP host allowlist
- Accept  strings unconditionally

Tests:
- Update to cover default passthrough and baseline (no responses_tools)

Docs:
- README: update instructions; move Star History to bottom

* chore: clean imports/comments; use gpt-5 in examples and tests

* docs: tighten Responses tools README; fix gpt-5 example\nchore: remove feature-specific test per review; trim comments/imports

* chore: remove __pycache__/ and bytecode; add .gitignore

* chore: add .gitignore for caches and bytecode

* Update README.md

* fix: remove MCP passthrough; allow only web_search in responses_tools

- Reject non-`web_search` types with 400 (`RESPONSES_TOOL_UNSUPPORTED`).
- Drop MCP host allowlist logic and related import.
- Keep size guard via `RESPONSES_TOOLS_MAX_BYTES` and fallback retry without extras.
- Docs: update README to state web_search-only passthrough.

Runtime verified locally with a stubbed upstream:
- OK: `responses_tools: [{"type": "web_search"}]` -> 200.
- BAD: `responses_tools: [{"type": "mcp"}]` -> 400 `RESPONSES_TOOL_UNSUPPORTED`.

* feat: forward Responses web_search tool via Chat Completions; fallback on rejection

- Accept `responses_tools` array and filter to `type: web_search` only.
- Enforce size guard `RESPONSES_TOOLS_MAX_BYTES` (default 32768).
- Fallback: if upstream rejects tools, retry without extras; otherwise return `RESPONSES_TOOLS_REJECTED`.
- README: document web_search-only passthrough and example.
- Headers: hint experimental features in OpenAI-Beta (responses; web-search).

* chore: remove local test-only forcing flag (CHATMOCK_FORCE_WEB_SEARCH)

* fix: restore full routes_openai (web_search-only passthrough + endpoints)

- Undo accidental large deletion from prior cleanup.
- Keep `web_search` passthrough, size guard, and fallback.
- Preserve `/v1/completions` and `/v1/models` endpoints and SSE handling.

* Update upstream.py

* Update upstream.py

* Update README.md

* Update README.md

* Update routes_openai.py

* feat(openai): default-enable web_search; accept preview; quiet retry; rm env knob

- Injects responses_tools=[{"type":"web_search"}] when client omits tools; explicit opt-out via responses_tool_choice:"none".
- Allowlist accepts "web_search" and "web_search_preview"; others rejected with RESPONSES_TOOL_UNSUPPORTED.
- Replaces env max-bytes knob with MAX_TOOLS_BYTES=32768.
- Retry on upstream rejection is silent; logs only under verbose.

* feat(stream): surface web_search_call as tool_calls; aggregate args; verbose-only logs

- Translates Responses web_search_call.* and output_item.done into OpenAI-style delta.tool_calls.
- Aggregates parameters by call_id (query/q, recency/time_range/days, domains/include/include_domains/include, max_results/topn/limit).
- No inference; arguments remain "{}" if upstream provides none. Logs only when verbose.

* feat(responses-tools): web_search passthrough; flag; fallback; Ollama parity; stable indexes

- Add --enable-web-search (default OFF) to inject web_search when requests omit responses_tools
- Allow tool types: web_search and web_search_preview; 32,768-byte cap on serialized responses_tools
- OpenAI /v1/chat/completions: passthrough + retry without extras on upstream rejection; return retry status
- Streaming: function.arguments always JSON; stable tool_calls index per call_id
- Ollama /api/chat: same passthrough + fallback behavior
- README updated to match behavior and limits

* Update README.md

* Update README.md

* Update routes_ollama.py

* Update routes_openai.py

* Update utils.py

---------

Co-authored-by: alexx-ftw <alexx-ftw@users.noreply.github.com>
Co-authored-by: Game_Time <108236317+RayBytes@users.noreply.github.com>
This commit is contained in:
alexx-ftw
2025-09-16 13:06:00 +01:00
committed by GitHub
parent 8d92a63626
commit 2f23cd5a89
7 changed files with 293 additions and 24 deletions

View File

@@ -147,12 +147,42 @@ def ollama_chat() -> Response:
tool_choice = payload.get("tool_choice", "auto")
parallel_tool_calls = bool(payload.get("parallel_tool_calls", False))
# Passthrough Responses API tools (web_search) via ChatMock extension fields
extra_tools: List[Dict[str, Any]] = []
had_responses_tools = False
rt_payload = payload.get("responses_tools") if isinstance(payload.get("responses_tools"), list) else []
if isinstance(rt_payload, list):
for _t in rt_payload:
if not (isinstance(_t, dict) and isinstance(_t.get("type"), str)):
continue
if _t.get("type") not in ("web_search", "web_search_preview"):
return jsonify({"error": "Only web_search/web_search_preview are supported in responses_tools"}), 400
extra_tools.append(_t)
if not extra_tools and bool(current_app.config.get("DEFAULT_WEB_SEARCH")):
rtc = payload.get("responses_tool_choice")
if not (isinstance(rtc, str) and rtc == "none"):
extra_tools = [{"type": "web_search"}]
if extra_tools:
import json as _json
MAX_TOOLS_BYTES = 32768
try:
size = len(_json.dumps(extra_tools))
except Exception:
size = 0
if size > MAX_TOOLS_BYTES:
return jsonify({"error": "responses_tools too large"}), 400
had_responses_tools = True
tools_responses = (tools_responses or []) + extra_tools
rtc = payload.get("responses_tool_choice")
if isinstance(rtc, str) and rtc in ("auto", "none"):
tool_choice = rtc
if not isinstance(model, str) or not isinstance(messages, list) or not messages:
return jsonify({"error": "Invalid request format"}), 400
input_items = convert_chat_messages_to_responses_input(messages)
# Infer effort from model variant (gpt-5-high, etc.) but send base model upstream
model_reasoning = extract_reasoning_from_model_name(model)
upstream, error_resp = start_upstream_request(
normalize_model_name(model),
@@ -171,12 +201,34 @@ def ollama_chat() -> Response:
err_body = json.loads(upstream.content.decode("utf-8", errors="ignore")) if upstream.content else {"raw": upstream.text}
except Exception:
err_body = {"raw": upstream.text}
if verbose:
print("/api/chat upstream error status=", upstream.status_code, " body:", json.dumps(err_body)[:2000])
return (
jsonify({"error": (err_body.get("error", {}) or {}).get("message", "Upstream error")}),
upstream.status_code,
)
if had_responses_tools:
if verbose:
print("[Passthrough] Upstream rejected tools; retrying without extras (args redacted)")
base_tools_only = convert_tools_chat_to_responses(normalize_ollama_tools(tools_req))
safe_choice = payload.get("tool_choice", "auto")
upstream2, err2 = start_upstream_request(
normalize_model_name(model),
input_items,
instructions=BASE_INSTRUCTIONS,
tools=base_tools_only,
tool_choice=safe_choice,
parallel_tool_calls=parallel_tool_calls,
reasoning_param=build_reasoning_param(reasoning_effort, reasoning_summary, model_reasoning),
)
if err2 is None and upstream2 is not None and upstream2.status_code < 400:
upstream = upstream2
else:
return (
jsonify({"error": {"message": (err_body.get("error", {}) or {}).get("message", "Upstream error"), "code": "RESPONSES_TOOLS_REJECTED"}}),
(upstream2.status_code if upstream2 is not None else upstream.status_code),
)
else:
if verbose:
print("/api/chat upstream error status=", upstream.status_code, " body:", json.dumps(err_body)[:2000])
return (
jsonify({"error": (err_body.get("error", {}) or {}).get("message", "Upstream error")}),
upstream.status_code,
)
created_at = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
model_out = model if isinstance(model, str) and model.strip() else normalize_model_name(model)