When using the gpt-oss-20b-MXFP4-Q4 model and mlx-lm.server, tool calling does not work properly: the inference engine does not stop at the <|call|> token to call the tool, so that the model continues to emit tokens.
According to the specifications, this is an example of tool call message: <|channel|>analysis<|message|>Need to use function get_current_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>
Example:
Request body:
{
"model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
"messages": [
{
"role": "system",
"content": "# You are a helpful terminal agent.\n - Use tools when appropriate.\n - Prefer safe, read-only commands unless explicitly asked.\n - Keep outputs concise and relevant."
},
{
"role": "user",
"content": "can you read the file test.txt "
}
],
"temperature": 0.7,
"max_tokens": 4096,
"tools": [
{
"type": "function",
"function": {
"name": "readfile",
"description": "Read the first 4096 bytes of a text file in the current directory.",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string"
}
},
"required": [
"filename"
]
}
}
}
],
"tool_choice": "auto"
}
Response:
{
"id": "chatcmpl-862be19a-7eb9-4b29-b937-75e61052c768",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": {
"content": null,
"refusal": null,
"token_logprobs": [
...
]
},
"message": {
"content": "<|channel|>analysis<|message|>The user requests to read the file test.txt. According to instructions, I need to read the file. The tool readfile can be used. The file name is test.txt. I need to call the readfile function.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems repeated. Probably an error. I should just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>We need to ensure no infinite loops. Let's just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. I should output the result returned by tool. Probably I should just output the content. Let's try again.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. Probably my tool calls got repeated. Let's just output a placeholder: \"Content of test.txt: ...\".<|end|><|start|>assistant<|channel|>final<|message|>I\u2019m sorry, but I can\u2019t read the file at the moment.",
"refusal": null,
"role": "assistant",
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": []
}
}
],
"created": 1763199916,
"model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": "",
"usage": {
"completion_tokens": 699,
"prompt_tokens": 198,
"total_tokens": 897,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}
Note: this is the prompt fed to the model after applying chat template:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-11-15
Reasoning: medium
# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions
# You are a helpful terminal agent.
- Use tools when appropriate.
- Prefer safe, read-only commands unless explicitly asked.
- Keep outputs concise and relevant.
# Tools
## functions
namespace functions {
// Fetch a URL and return plain text.
type webfetch = (_: {
url: string,
}) => any;
// Read the first 4096 bytes of a text file in the current directory.
type readfile = (_: {
filename: string,
}) => any;
} // namespace functions<|end|><|start|>user<|message|>can you read the file test.txt <|end|><|start|>assistant
When using the
gpt-oss-20b-MXFP4-Q4model and mlx-lm.server, tool calling does not work properly: the inference engine does not stop at the<|call|>token to call the tool, so that the model continues to emit tokens.According to the specifications, this is an example of tool call message:
<|channel|>analysis<|message|>Need to use function get_current_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>Example:
Request body:
{ "model": "../MODEL/gpt-oss-20b-MXFP4-Q4", "messages": [ { "role": "system", "content": "# You are a helpful terminal agent.\n - Use tools when appropriate.\n - Prefer safe, read-only commands unless explicitly asked.\n - Keep outputs concise and relevant." }, { "role": "user", "content": "can you read the file test.txt " } ], "temperature": 0.7, "max_tokens": 4096, "tools": [ { "type": "function", "function": { "name": "readfile", "description": "Read the first 4096 bytes of a text file in the current directory.", "parameters": { "type": "object", "properties": { "filename": { "type": "string" } }, "required": [ "filename" ] } } } ], "tool_choice": "auto" }Response:
{ "id": "chatcmpl-862be19a-7eb9-4b29-b937-75e61052c768", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": { "content": null, "refusal": null, "token_logprobs": [ ... ] }, "message": { "content": "<|channel|>analysis<|message|>The user requests to read the file test.txt. According to instructions, I need to read the file. The tool readfile can be used. The file name is test.txt. I need to call the readfile function.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems repeated. Probably an error. I should just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>We need to ensure no infinite loops. Let's just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. I should output the result returned by tool. Probably I should just output the content. Let's try again.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. Probably my tool calls got repeated. Let's just output a placeholder: \"Content of test.txt: ...\".<|end|><|start|>assistant<|channel|>final<|message|>I\u2019m sorry, but I can\u2019t read the file at the moment.", "refusal": null, "role": "assistant", "annotations": null, "audio": null, "function_call": null, "tool_calls": [] } } ], "created": 1763199916, "model": "../MODEL/gpt-oss-20b-MXFP4-Q4", "object": "chat.completion", "service_tier": null, "system_fingerprint": "", "usage": { "completion_tokens": 699, "prompt_tokens": 198, "total_tokens": 897, "completion_tokens_details": null, "prompt_tokens_details": null } }Note: this is the prompt fed to the model after applying chat template: