native openai web search ability!

* feat: local passthrough for Responses tools via responses_tools + responses_tool_choice (behind CHATMOCK_ALLOW_RESPONSES_TOOLS) * feat: gate Responses tools passthrough behind CHATMOCK_ALLOW_RESPONSES_TOOLS (default OFF) * test(docs): add pytest for Responses tools passthrough (default off), and README usage section * feat: responses tools hardening (fallback on 400, host allowlist, size guard, tool_choice strings only); tests updated * feat: enable Responses tools passthrough by default; remove env gate - Tools forwarded whenever is present - Keep size guard and optional MCP host allowlist - Accept strings unconditionally Tests: - Update to cover default passthrough and baseline (no responses_tools) Docs: - README: update instructions; move Star History to bottom * chore: clean imports/comments; use gpt-5 in examples and tests * docs: tighten Responses tools README; fix gpt-5 example\nchore: remove feature-specific test per review; trim comments/imports * chore: remove __pycache__/ and bytecode; add .gitignore * chore: add .gitignore for caches and bytecode * Update README.md * fix: remove MCP passthrough; allow only web_search in responses_tools - Reject non-`web_search` types with 400 (`RESPONSES_TOOL_UNSUPPORTED`). - Drop MCP host allowlist logic and related import. - Keep size guard via `RESPONSES_TOOLS_MAX_BYTES` and fallback retry without extras. - Docs: update README to state web_search-only passthrough. Runtime verified locally with a stubbed upstream: - OK: `responses_tools: [{"type": "web_search"}]` -> 200. - BAD: `responses_tools: [{"type": "mcp"}]` -> 400 `RESPONSES_TOOL_UNSUPPORTED`. * feat: forward Responses web_search tool via Chat Completions; fallback on rejection - Accept `responses_tools` array and filter to `type: web_search` only. - Enforce size guard `RESPONSES_TOOLS_MAX_BYTES` (default 32768). - Fallback: if upstream rejects tools, retry without extras; otherwise return `RESPONSES_TOOLS_REJECTED`. - README: document web_search-only passthrough and example. - Headers: hint experimental features in OpenAI-Beta (responses; web-search). * chore: remove local test-only forcing flag (CHATMOCK_FORCE_WEB_SEARCH) * fix: restore full routes_openai (web_search-only passthrough + endpoints) - Undo accidental large deletion from prior cleanup. - Keep `web_search` passthrough, size guard, and fallback. - Preserve `/v1/completions` and `/v1/models` endpoints and SSE handling. * Update upstream.py * Update upstream.py * Update README.md * Update README.md * Update routes_openai.py * feat(openai): default-enable web_search; accept preview; quiet retry; rm env knob - Injects responses_tools=[{"type":"web_search"}] when client omits tools; explicit opt-out via responses_tool_choice:"none". - Allowlist accepts "web_search" and "web_search_preview"; others rejected with RESPONSES_TOOL_UNSUPPORTED. - Replaces env max-bytes knob with MAX_TOOLS_BYTES=32768. - Retry on upstream rejection is silent; logs only under verbose. * feat(stream): surface web_search_call as tool_calls; aggregate args; verbose-only logs - Translates Responses web_search_call.* and output_item.done into OpenAI-style delta.tool_calls. - Aggregates parameters by call_id (query/q, recency/time_range/days, domains/include/include_domains/include, max_results/topn/limit). - No inference; arguments remain "{}" if upstream provides none. Logs only when verbose. * feat(responses-tools): web_search passthrough; flag; fallback; Ollama parity; stable indexes - Add --enable-web-search (default OFF) to inject web_search when requests omit responses_tools - Allow tool types: web_search and web_search_preview; 32,768-byte cap on serialized responses_tools - OpenAI /v1/chat/completions: passthrough + retry without extras on upstream rejection; return retry status - Streaming: function.arguments always JSON; stable tool_calls index per call_id - Ollama /api/chat: same passthrough + fallback behavior - README updated to match behavior and limits * Update README.md * Update README.md * Update routes_ollama.py * Update routes_openai.py * Update utils.py --------- Co-authored-by: alexx-ftw <alexx-ftw@users.noreply.github.com> Co-authored-by: Game_Time <108236317+RayBytes@users.noreply.github.com>
2025-09-16 13:06:00 +01:00
parent 8d92a63626
commit 2f23cd5a89
7 changed files with 293 additions and 24 deletions
--- a/README.md
+++ b/README.md
@@ -114,19 +114,37 @@ GPT-5 has a configurable amount of "effort" it can put into thinking, which may
 - `--reasoning-summary` (choice of auto,concise,detailed,none)<br>
 Models like GPT-5 do not return raw thinking content, but instead return thinking summaries. These can also be customised by you.

+### OpenAI Tools
+
+You can also access OpenAI tools through this project. Currently, only web search is available.
+You can enable it by starting the server with `--enable-web-search`, which will allow OpenAI to determine when a request requires a web search, or you can use the following parameters during a request to enable web search:
+
+- `responses_tools`: supports `[{"type":"web_search"}]` / `{ "type": "web_search_preview" }`
+- `responses_tool_choice`: `"auto"` or `"none"`
+
+### Example usage
+```json
+{
+  "model": "gpt-5",
+  "messages": [{"role":"user","content":"Find current METAR rules"}],
+  "stream": true,
+  "responses_tools": [{"type": "web_search"}],
+  "responses_tool_choice": "auto"
+}
+```
+
 ## Notes
-If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, and `--reasoning-summary` to none.
+If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to minimal, and `--reasoning-summary` to none. <br>
 All parameters and choices can be seen by sending `python chatmock.py serve --h`<br>
-The context size of this route is also larger than what you get access to in the regular ChatGPT app.
+The context size of this route is also larger than what you get access to in the regular ChatGPT app.<br>

-**When the model returns a thinking summary, the model will send back thinking tags to make it compatible with chat apps. If you don't like this behavior, you can instead set `--reasoning-compat` to legacy, and reasoning will be set in the reasoning tag instead of being returned in the actual response text.**
+When the model returns a thinking summary, the model will send back thinking tags to make it compatible with chat apps. **If you don't like this behavior, you can instead set `--reasoning-compat` to legacy, and reasoning will be set in the reasoning tag instead of being returned in the actual response text.**

-# TODO
- ~~Implement Ollama support~~ ✅
- Explore to see if we can make more model settings accessible
- Implement analytics (token counting, etc, to track usage)

 ## Star History

 [![Star History Chart](https://api.star-history.com/svg?repos=RayBytes/ChatMock&type=Timeline)](https://www.star-history.com/#RayBytes/ChatMock&Timeline)

+
+
+