Ollama returns empty-body responses under sustained load in generate_from_raw

## Description

Under sustained load on a 32GB M1 Mac with `granite4:micro`, `generate_from_raw` occasionally returns `ModelOutputThunk(value="")` for one or more of its parallel requests — not due to a caught exception, but as a legitimate empty response from Ollama itself.

**Evidence**: In a 20-run soak test, 18/20 runs had at least one empty result. The `FancyLogger.warning` added in #598 did **not** fire during these runs, confirming the empty string was Ollama's actual response, not a swallowed exception.

## Current behaviour

`generate_from_raw` uses `asyncio.gather(return_exceptions=True)`, so non-exception empty responses pass through silently. Tests now use `assert all(r.value for r in results)` to surface this clearly (added in #598).

## Suspected causes

1. **Machine exhaustion** — 32GB M1 running other workloads, Ollama NUM_PARALLEL≥4 with 4.6GB model, context auto-capped at 32K. May not reproduce on idle/cold machine.
2. **Ollama bug** — server returns empty body for some requests at high concurrency, not reflecting a real OOM or timeout.

## Next steps

- [ ] Reproduce on idle/cold machine to isolate machine-exhaustion vs Ollama bug
- [ ] Check if `CONTEXT_WINDOW: 2048` (added in #598) reduces or eliminates the issue
- [ ] Consider exposing a configurable `timeout` on `OllamaModelBackend.__init__` so tests can set a ceiling
- [ ] If confirmed Ollama bug, open upstream issue with repro
- [ ] If machine exhaustion, document in test infrastructure notes

## Related

- #432 — exception propagation in `generate_from_raw` (remove `return_exceptions=True`)
- #598 — PR that added diagnostic logging and stronger assertions

/label bug, ollama

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama returns empty-body responses under sustained load in generate_from_raw #599

Description

Current behaviour

Suspected causes

Next steps

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama returns empty-body responses under sustained load in generate_from_raw #599

Description

Description

Current behaviour

Suspected causes

Next steps

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions