Bug: Flash 2.5 transcription exhausts output tokens due to repeated `[unclear]`, resulting in truncated JSON (recent regression)

**Describe the bug**
When using **Flash 2.5 on Vertex AI** for audio transcription with the `google-genai` package with batching enabled, the model repeatedly outputs the literal token `[unclear]`. This repetition consumes the entire `max_output_tokens` budget before transcription completes, causing the response to be truncated and resulting in invalid or incomplete JSON.

This behavior appears to be a **recent regression**. The same transcription pipeline was significantly more reliable approximately **1–1.5 months ago**, with far fewer `[unclear]` repetitions and successful completion of JSON responses.

---

**Environment**

* Platform: Vertex AI
* Model: Flash 2.5
* Library: google-genai
* Task: Audio transcription with batching
* Response MIME type: `application/json`
* Response schema: Enabled
* Thinking mode: Diabled

---

**Steps to reproduce**

1. Send an audio file via `file_uri` with a transcription prompt
2. Enable structured JSON output using `response_schema`
3. Set `max_output_tokens` appropriate for the expected transcription length
4. Invoke Flash 2.5 on Vertex AI with batching

---

**Expected behavior**

* The model should avoid excessive repetition of `[unclear]`
* The model should complete transcription within the token budget
* The model should consistently return a valid JSON response conforming to the schema

---

**Actual behavior**

* The model repeatedly emits `[unclear]` segments
* Output tokens are exhausted before transcription completes
* JSON output is truncated or malformed

---

**Code snippet**

```python
parts = [
    {
        "file_data": {
            "file_uri": uri,
            "mime_type": self._get_mime_type(file_path)
        }
    },
    {
        "text": final_prompt
    }
]

generation_config = {
    "response_mime_type": "application/json",
    "temperature": transcription_config.TEMPERATURE,
    "max_output_tokens": transcription_config.get_max_output_tokens(model),
}

schema_class = get_transcription_result_class(model, phase)
if schema_class:
    if isinstance(schema_class, dict):
        generation_config["response_schema"] = schema_class
    else:
        schema_dict = schema_class.model_json_schema()
        schema_dict = self._resolve_json_schema_refs(schema_dict)
        schema_dict.pop("$defs", None)
        generation_config["response_schema"] = schema_dict

generation_config["thinking_config"] = {
    "thinking_budget": transcription_config.THINKING_BUDGET
}

instance = {
    "id": str(i - 1),
    "request": {
        "contents": [
            {
                "role": "user",
                "parts": parts
            }
        ],
        "generation_config": generation_config
    }
}
```

---

**Additional context**

* Increasing `max_output_tokens` doesn't reduce the issue.
* The regression has been observed consistently over the last **1–1.5 months**

---

**Questions**

* Is this a known regression in Flash 2.5 transcription behavior?
* Are there recommended mitigations to prevent token exhaustion due to repeated `[unclear]` output?
* Is Flash 2.5 currently recommended for transcription workloads on Vertex AI?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Flash 2.5 transcription exhausts output tokens due to repeated `[unclear]`, resulting in truncated JSON (recent regression) #1961

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Flash 2.5 transcription exhausts output tokens due to repeated [unclear], resulting in truncated JSON (recent regression) #1961

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: Flash 2.5 transcription exhausts output tokens due to repeated `[unclear]`, resulting in truncated JSON (recent regression) #1961