* feat: local passthrough for Responses tools via responses_tools + responses_tool_choice (behind CHATMOCK_ALLOW_RESPONSES_TOOLS)
* feat: gate Responses tools passthrough behind CHATMOCK_ALLOW_RESPONSES_TOOLS (default OFF)
* test(docs): add pytest for Responses tools passthrough (default off), and README usage section
* feat: responses tools hardening (fallback on 400, host allowlist, size guard, tool_choice strings only); tests updated
* feat: enable Responses tools passthrough by default; remove env gate
- Tools forwarded whenever is present
- Keep size guard and optional MCP host allowlist
- Accept strings unconditionally
Tests:
- Update to cover default passthrough and baseline (no responses_tools)
Docs:
- README: update instructions; move Star History to bottom
* chore: clean imports/comments; use gpt-5 in examples and tests
* docs: tighten Responses tools README; fix gpt-5 example\nchore: remove feature-specific test per review; trim comments/imports
* chore: remove __pycache__/ and bytecode; add .gitignore
* chore: add .gitignore for caches and bytecode
* Update README.md
* fix: remove MCP passthrough; allow only web_search in responses_tools
- Reject non-`web_search` types with 400 (`RESPONSES_TOOL_UNSUPPORTED`).
- Drop MCP host allowlist logic and related import.
- Keep size guard via `RESPONSES_TOOLS_MAX_BYTES` and fallback retry without extras.
- Docs: update README to state web_search-only passthrough.
Runtime verified locally with a stubbed upstream:
- OK: `responses_tools: [{"type": "web_search"}]` -> 200.
- BAD: `responses_tools: [{"type": "mcp"}]` -> 400 `RESPONSES_TOOL_UNSUPPORTED`.
* feat: forward Responses web_search tool via Chat Completions; fallback on rejection
- Accept `responses_tools` array and filter to `type: web_search` only.
- Enforce size guard `RESPONSES_TOOLS_MAX_BYTES` (default 32768).
- Fallback: if upstream rejects tools, retry without extras; otherwise return `RESPONSES_TOOLS_REJECTED`.
- README: document web_search-only passthrough and example.
- Headers: hint experimental features in OpenAI-Beta (responses; web-search).
* chore: remove local test-only forcing flag (CHATMOCK_FORCE_WEB_SEARCH)
* fix: restore full routes_openai (web_search-only passthrough + endpoints)
- Undo accidental large deletion from prior cleanup.
- Keep `web_search` passthrough, size guard, and fallback.
- Preserve `/v1/completions` and `/v1/models` endpoints and SSE handling.
* Update upstream.py
* Update upstream.py
* Update README.md
* Update README.md
* Update routes_openai.py
* feat(openai): default-enable web_search; accept preview; quiet retry; rm env knob
- Injects responses_tools=[{"type":"web_search"}] when client omits tools; explicit opt-out via responses_tool_choice:"none".
- Allowlist accepts "web_search" and "web_search_preview"; others rejected with RESPONSES_TOOL_UNSUPPORTED.
- Replaces env max-bytes knob with MAX_TOOLS_BYTES=32768.
- Retry on upstream rejection is silent; logs only under verbose.
* feat(stream): surface web_search_call as tool_calls; aggregate args; verbose-only logs
- Translates Responses web_search_call.* and output_item.done into OpenAI-style delta.tool_calls.
- Aggregates parameters by call_id (query/q, recency/time_range/days, domains/include/include_domains/include, max_results/topn/limit).
- No inference; arguments remain "{}" if upstream provides none. Logs only when verbose.
* feat(responses-tools): web_search passthrough; flag; fallback; Ollama parity; stable indexes
- Add --enable-web-search (default OFF) to inject web_search when requests omit responses_tools
- Allow tool types: web_search and web_search_preview; 32,768-byte cap on serialized responses_tools
- OpenAI /v1/chat/completions: passthrough + retry without extras on upstream rejection; return retry status
- Streaming: function.arguments always JSON; stable tool_calls index per call_id
- Ollama /api/chat: same passthrough + fallback behavior
- README updated to match behavior and limits
* Update README.md
* Update README.md
* Update routes_ollama.py
* Update routes_openai.py
* Update utils.py
---------
Co-authored-by: alexx-ftw <alexx-ftw@users.noreply.github.com>
Co-authored-by: Game_Time <108236317+RayBytes@users.noreply.github.com>
ChatMock
OpenAI & Ollama compatible API powered by your ChatGPT plan.
Use your ChatGPT Plus/Pro account to call OpenAI models from code or alternate chat UIs.
What It Does
ChatMock runs a local server that creates an OpenAI/Ollama compatible API, and requests are then fulfilled using your authenticated ChatGPT login with the oauth client of Codex, OpenAI's coding CLI tool. This allows you to use GPT-5 and other models right through your OpenAI account, without requiring an api key. This does require a paid ChatGPT account.
Quickstart
Mac Users
GUI Application
If you're on macOS, you can download the GUI app from the GitHub releases.
Note: Since ChatMock isn't signed with an Apple Developer ID, you may need to run the following command in your terminal to open the app:
xattr -dr com.apple.quarantine /Applications/ChatMock.app
Command Line (Homebrew)
You can also install ChatMock as a command-line tool using Homebrew:
brew tap RayBytes/chatmock
brew install chatmock
Python
If you wish to just simply run this as a python flask server, you are also freely welcome too.
Clone or download this repository, then cd into the project directory. Then follow the instrunctions listed below.
- Sign in with your ChatGPT account and follow the prompts
python chatmock.py login
You can make sure this worked by running python chatmock.py info
- After the login completes successfully, you can just simply start the local server
python chatmock.py serve
Then, you can simply use the address and port as the baseURL as you require (http://127.0.0.1:8000 by default)
Reminder: When setting a baseURL, make you sure you include /v1/ at the end of the URL if you're using this as a OpenAI compatible endpoint (e.g http://127.0.0.1:8000/v1)
Examples
Python
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="key" # ignored
)
resp = client.chat.completions.create(
model="gpt-5",
messages=[{"role": "user", "content": "hello world"}]
)
print(resp.choices[0].message.content)
curl
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Authorization: Bearer key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [{"role":"user","content":"hello world"}]
}'
What's supported
- Tool calling
- Vision/Image understanding
- Thinking summaries (through thinking tags)
Notes & Limits
- Requires an active, paid ChatGPT account.
- Expect lower rate limits than what you may recieve in the ChatGPT app.
- Some context length might be taken up by internal instructions (but they dont seem to degrade the model)
- Use responsibly and at your own risk. This project is not affiliated with OpenAI, and is a educational exercise.
Supported models
gpt-5codex-mini
Customisation / Configuration
Thinking effort
--reasoning-effort(choice of minimal,low,medium,high)
GPT-5 has a configurable amount of "effort" it can put into thinking, which may cause it to take more time for a response to return, but may overall give a smarter answer. Applying this parameter afterserveforces the server to use this reasoning effort by default, unless overrided by the API request with a different effort set. The default reasoning effort without setting this parameter ismedium.
Thinking summaries
--reasoning-summary(choice of auto,concise,detailed,none)
Models like GPT-5 do not return raw thinking content, but instead return thinking summaries. These can also be customised by you.
OpenAI Tools
You can also access OpenAI tools through this project. Currently, only web search is available.
You can enable it by starting the server with --enable-web-search, which will allow OpenAI to determine when a request requires a web search, or you can use the following parameters during a request to enable web search:
responses_tools: supports[{"type":"web_search"}]/{ "type": "web_search_preview" }responses_tool_choice:"auto"or"none"
Example usage
{
"model": "gpt-5",
"messages": [{"role":"user","content":"Find current METAR rules"}],
"stream": true,
"responses_tools": [{"type": "web_search"}],
"responses_tool_choice": "auto"
}
Notes
If you wish to have the fastest responses, I'd recommend setting --reasoning-effort to minimal, and --reasoning-summary to none.
All parameters and choices can be seen by sending python chatmock.py serve --h
The context size of this route is also larger than what you get access to in the regular ChatGPT app.
When the model returns a thinking summary, the model will send back thinking tags to make it compatible with chat apps. If you don't like this behavior, you can instead set --reasoning-compat to legacy, and reasoning will be set in the reasoning tag instead of being returned in the actual response text.