Skip to content

Tool / Function call issue with gpt-oss-20b-MXFP4-Q4 #613

@veganmosfet

Description

@veganmosfet

When using the gpt-oss-20b-MXFP4-Q4 model and mlx-lm.server, tool calling does not work properly: the inference engine does not stop at the <|call|> token to call the tool, so that the model continues to emit tokens.

According to the specifications, this is an example of tool call message: <|channel|>analysis<|message|>Need to use function get_current_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>

Example:

Request body:

{
  "model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
  "messages": [
    {
      "role": "system",
      "content": "# You are a helpful terminal agent.\n  - Use tools when appropriate.\n  - Prefer safe, read-only commands unless explicitly asked.\n  - Keep outputs concise and relevant."
    },
    {
      "role": "user",
      "content": "can you read the file test.txt "
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "readfile",
        "description": "Read the first 4096 bytes of a text file in the current directory.",
        "parameters": {
          "type": "object",
          "properties": {
            "filename": {
              "type": "string"
            }
          },
          "required": [
            "filename"
          ]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Response:

{
  "id": "chatcmpl-862be19a-7eb9-4b29-b937-75e61052c768",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "content": null,
        "refusal": null,
        "token_logprobs": [
            ...
        ]
      },
      "message": {
        "content": "<|channel|>analysis<|message|>The user requests to read the file test.txt. According to instructions, I need to read the file. The tool readfile can be used. The file name is test.txt. I need to call the readfile function.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems repeated. Probably an error. I should just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>We need to ensure no infinite loops. Let's just call once.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. I should output the result returned by tool. Probably I should just output the content. Let's try again.<|end|><|start|>assistant<|channel|>commentary to=functions.readfile <|constrain|>json<|message|>{\"filename\":\"test.txt\"}<|call|>commentary<|channel|>commentary to=functions.readfile<|channel|>commentary<|channel|>commentary to=functions.readfile ...<|end|><|start|>assistant<|channel|>analysis<|message|>It seems stuck. Probably my tool calls got repeated. Let's just output a placeholder: \"Content of test.txt: ...\".<|end|><|start|>assistant<|channel|>final<|message|>I\u2019m sorry, but I can\u2019t read the file at the moment.",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": []
      }
    }
  ],
  "created": 1763199916,
  "model": "../MODEL/gpt-oss-20b-MXFP4-Q4",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "",
  "usage": {
    "completion_tokens": 699,
    "prompt_tokens": 198,
    "total_tokens": 897,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  }
}

Note: this is the prompt fed to the model after applying chat template:

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-11-15

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

# You are a helpful terminal agent.
  - Use tools when appropriate.
  - Prefer safe, read-only commands unless explicitly asked.
  - Keep outputs concise and relevant.

# Tools

## functions

namespace functions {

// Fetch a URL and return plain text.
type webfetch = (_: {
url: string,
}) => any;

// Read the first 4096 bytes of a text file in the current directory.
type readfile = (_: {
filename: string,
}) => any;

} // namespace functions<|end|><|start|>user<|message|>can you read the file test.txt <|end|><|start|>assistant

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions