feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder)

## Status

**Placeholder issue — needs elaboration through discussion in the comments below.**

Raised so that PR #942 has somewhere concrete to link its thread on the broader MOT-owned chunking direction, agent-friendly authoring patterns, and related Phase 2 work. The design summary below captures what is known from epic #891 and the PR #942 discussion. Specific implementation decisions are intentionally deferred to comments.

## Context

`mellea/stdlib/streaming.py` (landing in PR #942, closing #901) provides Phase 1 streaming validation — a call-site `ChunkingStrategy` with three built-in chunkers (`SentenceChunker`, `WordChunker`, `ParagraphChunker`) and an orchestrator (`stream_with_chunking`). This is a scoped pragmatic choice. Epic #891 names the longer-term direction:

> The right long-term owner of chunking is the MOT itself, since it already owns `parsed_repr` and has the semantic knowledge to produce meaningful chunks for its specific type. A follow-on issue will cover adding `stream_parsed_repr` to MOT.

This is that follow-on issue.

## Consolidated design summary

### Motivation

Phase 1 collapses two semantic concerns onto the call site: how to chunk the stream (a property of the output type) and how to validate chunks (a property of the requirement). These are independent and belong in different owners.

- **Type semantics.** What counts as a complete chunk of this kind of output? JSON value, prose sentence, code statement, audio segment, image region. Invariant across requirements.
- **Constraint semantics.** What makes a particular output acceptable? Max three sentences, matches schema X, no hallucinated entities. Invariant across outputs of the same type.

Under Phase 1, both are author-written at the call site. Under Phase 2, the type semantics move onto the MOT (via `stream_parsed_repr`), leaving the requirement author with only the `stream_validate` override to write.

### Motivating output types (all in scope)

Phase 1 chunkers cover prose only (sentence/word/paragraph, all operating on `accumulated_text: str`). Phase 2 must support at least these output types, each with genuinely different chunk semantics:

- **Prose** — sentence, word, paragraph boundaries. Already covered by Phase 1 chunkers; Phase 2 should subsume them.
- **Structured text** — JSON values, YAML documents, code statements/blocks. Chunk boundary is "one complete parseable unit."
- **Multi-modal streams** — audio (silence-delimited segments, fixed windows, VAD-detected utterances), image (region or tile boundaries), potentially video. Chunk boundary is inherently non-string.

Multi-modal is first-class motivation for this work, not deferred scope. Epic #891 explicitly names the audio case ("Audio that goes wrong in the first few seconds can't be caught until the full clip is done"), and the `ChunkingStrategy.split(accumulated_text: str) -> list[str]` signature in Phase 1 forecloses on multi-modal by design — that foreclosure is what this issue exists to address.

### Proposed direction

Add `stream_parsed_repr` as an async method/generator on `ModelOutputThunk` — emitting typed, complete chunks as the stream progresses, where "complete" is defined by the MOT's own parsed-repr type. Each MOT subclass (prose, JSON, audio, image, code) provides its own implementation.

Consequences for Phase 1 APIs:

- `stream_with_chunking()` gains an alternative mode that consumes `mot.stream_parsed_repr()` instead of applying an external `ChunkingStrategy`. Call-site interface stays the same.
- External `ChunkingStrategy` implementations can be deprecated once sufficient MOT types exist.
- Requirements that currently need internal state to track accumulated output can instead read from context, since the MOT will carry the partial parsed state.

Everything else Phase 1 delivers — `stream_validate`, `PartialValidationResult`, the event types (#902), the orchestration logic — is unaffected.

### Open design questions (for comments)

1. **Signature and generator shape.** Is `stream_parsed_repr` an async generator on the MOT? Does it share a queue with the raw `astream()`, or run in parallel? For multi-modal, does the signature accept bytes / frames / tensors rather than `str`?
2. **Chunking boundary authority.** Which component decides what a "complete chunk" is — the MOT's parser, a pluggable chunk-boundary predicate, or both? Answer likely differs between text and multi-modal.
3. **Backpressure.** If parsed chunks are slower to produce than raw tokens/frames, where does the buffering live?
4. **Backwards compatibility.** How to migrate from external `ChunkingStrategy` to MOT-native chunking without breaking Phase 1 call sites? Are both modes supported in parallel during a transition period?
5. **Typed output.** Does `stream_parsed_repr` yield values of type `S` (the MOT's `parsed_repr` type parameter), or a richer container that carries partial-parse state? Multi-modal MOTs may need the latter.
6. **Error handling.** What happens if the MOT's parser fails on a partial stream — surface immediately, wait for more data, or fall back to raw chunking?
7. **Testability.** Each MOT type's `stream_parsed_repr` needs verification against its non-streaming `parsed_repr`. Shape of the shared test harness? Multi-modal test fixtures are their own problem.
8. **Agent authoring.** If we want new MOT types to be agent-writable (see PR #942 discussion), what contracts are needed on the MOT base class? Clear APIs and guidelines are the substrate; a skill can sit on top where the framework supports one.

### Broader scope (to discuss)

The PR #942 thread raised a parallel observation. Both `stream_validate` authoring and (future) `stream_parsed_repr` authoring have deterministic checks against a non-streaming counterpart (`validate()` and `parsed_repr` respectively), which makes them plausible candidates for agent-friendly extension patterns — potentially skills in frameworks that support them. Worth discussing whether this issue should cover just `stream_parsed_repr` or also the broader agent-authoring story, or whether the latter warrants its own separate issue.

## Dependencies

- Phase 1 streaming validation must be complete: #901 (PR #942, in review), #902 (deferred event types and orchestrator-level OTEL).
- May interact with #909 (MOT structural cleanup) — worth coordinating to avoid `base.py` collision.

## References

- Parent epic: #891
- Phase 1 PR: #942
- Discussion origin: PR #942 review comment thread on `mellea/stdlib/__init__.py` and `mellea/stdlib/chunking.py`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013

Status

Context

Consolidated design summary

Motivation

Motivating output types (all in scope)

Proposed direction

Open design questions (for comments)

Broader scope (to discuss)

Dependencies

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013

Description

Status

Context

Consolidated design summary

Motivation

Motivating output types (all in scope)

Proposed direction

Open design questions (for comments)

Broader scope (to discuss)

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions