Tool / Function call issue with gpt-oss-20b-MXFP4-Q4

When using the `gpt-oss-20b-MXFP4-Q4` [model](https://huggingface.co/mlx-community/gpt-oss-20b-MXFP4-Q4) and mlx-lm.server, **tool calling does not work properly**: the inference engine does not stop at the `<|call|>` token to call the tool, so that the model continues to emit tokens.

According to the [specifications](https://cookbook.openai.com/articles/openai-harmony), this is an example of tool call message: `<|channel|>analysis<|message|>Need to use function get_current_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>`

Example:

Request body:
```json
{
  "model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
  "messages": [
    {
      "role": "system",
      "content": "# You are a helpful terminal agent.\n  - Use tools when appropriate.\n  - Prefer safe, read-only commands unless explicitly asked.\n  - Keep outputs concise and relevant."
    },
    {
      "role": "user",
      "content": "can you read the file test.txt "
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "readfile",
        "description": "Read the first 4096 bytes of a text file in the current directory.",
        "parameters": {
          "type": "object",
          "properties": {
            "filename": {
              "type": "string"
            }
          },
          "required": [
            "filename"
          ]
        }
      }
    }
  ],
  "tool_choice": "auto"
}
```

Response:
```json
{
  "id": "chatcmpl-862be19a-7eb9-4b29-b937-75e61052c768",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "content": null,
        "refusal": null,
        "token_logprobs": [
            ...
        ]
      },
      "message": {
        "content": "<|channel|>analysis<|message|>The user requests to read the file test.txt. According to instructions, I need to read the file. The tool readfile can be used. The file name is test.txt. I need to call the readfile function.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems repeated. Probably an error. I should just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>We need to ensure no infinite loops. Let's just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. I should output the result returned by tool. Probably I should just output the content. Let's try again.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. Probably my tool calls got repeated. Let's just output a placeholder: \"Content of test.txt: ...\".<|end|><|start|>assistant<|channel|>final<|message|>I\u2019m sorry, but I can\u2019t read the file at the moment.",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": []
      }
    }
  ],
  "created": 1763199916,
  "model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "",
  "usage": {
    "completion_tokens": 699,
    "prompt_tokens": 198,
    "total_tokens": 897,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  }
}
```

Note: this is the prompt fed to the model after applying chat template:
```text
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-11-15

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

# You are a helpful terminal agent.
  - Use tools when appropriate.
  - Prefer safe, read-only commands unless explicitly asked.
  - Keep outputs concise and relevant.

# Tools

## functions

namespace functions {

// Fetch a URL and return plain text.
type webfetch = (_: {
url: string,
}) => any;

// Read the first 4096 bytes of a text file in the current directory.
type readfile = (_: {
filename: string,
}) => any;

} // namespace functions<|end|><|start|>user<|message|>can you read the file test.txt <|end|><|start|>assistant
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool / Function call issue with gpt-oss-20b-MXFP4-Q4 #613

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool / Function call issue with gpt-oss-20b-MXFP4-Q4 #613

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions