Skip to content

feat(core): Phase 2 — MOT-owned streaming chunking via stream_parsed_repr (placeholder) #1013

@planetf1

Description

@planetf1

Status

Placeholder issue — needs elaboration through discussion in the comments below.

Raised so that PR #942 has somewhere concrete to link its thread on the broader MOT-owned chunking direction, agent-friendly authoring patterns, and related Phase 2 work. The design summary below captures what is known from epic #891 and the PR #942 discussion. Specific implementation decisions are intentionally deferred to comments.

Context

mellea/stdlib/streaming.py (landing in PR #942, closing #901) provides Phase 1 streaming validation — a call-site ChunkingStrategy with three built-in chunkers (SentenceChunker, WordChunker, ParagraphChunker) and an orchestrator (stream_with_chunking). This is a scoped pragmatic choice. Epic #891 names the longer-term direction:

The right long-term owner of chunking is the MOT itself, since it already owns parsed_repr and has the semantic knowledge to produce meaningful chunks for its specific type. A follow-on issue will cover adding stream_parsed_repr to MOT.

This is that follow-on issue.

Consolidated design summary

Motivation

Phase 1 collapses two semantic concerns onto the call site: how to chunk the stream (a property of the output type) and how to validate chunks (a property of the requirement). These are independent and belong in different owners.

  • Type semantics. What counts as a complete chunk of this kind of output? JSON value, prose sentence, code statement, audio segment, image region. Invariant across requirements.
  • Constraint semantics. What makes a particular output acceptable? Max three sentences, matches schema X, no hallucinated entities. Invariant across outputs of the same type.

Under Phase 1, both are author-written at the call site. Under Phase 2, the type semantics move onto the MOT (via stream_parsed_repr), leaving the requirement author with only the stream_validate override to write.

Motivating output types (all in scope)

Phase 1 chunkers cover prose only (sentence/word/paragraph, all operating on accumulated_text: str). Phase 2 must support at least these output types, each with genuinely different chunk semantics:

  • Prose — sentence, word, paragraph boundaries. Already covered by Phase 1 chunkers; Phase 2 should subsume them.
  • Structured text — JSON values, YAML documents, code statements/blocks. Chunk boundary is "one complete parseable unit."
  • Multi-modal streams — audio (silence-delimited segments, fixed windows, VAD-detected utterances), image (region or tile boundaries), potentially video. Chunk boundary is inherently non-string.

Multi-modal is first-class motivation for this work, not deferred scope. Epic #891 explicitly names the audio case ("Audio that goes wrong in the first few seconds can't be caught until the full clip is done"), and the ChunkingStrategy.split(accumulated_text: str) -> list[str] signature in Phase 1 forecloses on multi-modal by design — that foreclosure is what this issue exists to address.

Proposed direction

Add stream_parsed_repr as an async method/generator on ModelOutputThunk — emitting typed, complete chunks as the stream progresses, where "complete" is defined by the MOT's own parsed-repr type. Each MOT subclass (prose, JSON, audio, image, code) provides its own implementation.

Consequences for Phase 1 APIs:

  • stream_with_chunking() gains an alternative mode that consumes mot.stream_parsed_repr() instead of applying an external ChunkingStrategy. Call-site interface stays the same.
  • External ChunkingStrategy implementations can be deprecated once sufficient MOT types exist.
  • Requirements that currently need internal state to track accumulated output can instead read from context, since the MOT will carry the partial parsed state.

Everything else Phase 1 delivers — stream_validate, PartialValidationResult, the event types (#902), the orchestration logic — is unaffected.

Open design questions (for comments)

  1. Signature and generator shape. Is stream_parsed_repr an async generator on the MOT? Does it share a queue with the raw astream(), or run in parallel? For multi-modal, does the signature accept bytes / frames / tensors rather than str?
  2. Chunking boundary authority. Which component decides what a "complete chunk" is — the MOT's parser, a pluggable chunk-boundary predicate, or both? Answer likely differs between text and multi-modal.
  3. Backpressure. If parsed chunks are slower to produce than raw tokens/frames, where does the buffering live?
  4. Backwards compatibility. How to migrate from external ChunkingStrategy to MOT-native chunking without breaking Phase 1 call sites? Are both modes supported in parallel during a transition period?
  5. Typed output. Does stream_parsed_repr yield values of type S (the MOT's parsed_repr type parameter), or a richer container that carries partial-parse state? Multi-modal MOTs may need the latter.
  6. Error handling. What happens if the MOT's parser fails on a partial stream — surface immediately, wait for more data, or fall back to raw chunking?
  7. Testability. Each MOT type's stream_parsed_repr needs verification against its non-streaming parsed_repr. Shape of the shared test harness? Multi-modal test fixtures are their own problem.
  8. Agent authoring. If we want new MOT types to be agent-writable (see PR feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 discussion), what contracts are needed on the MOT base class? Clear APIs and guidelines are the substrate; a skill can sit on top where the framework supports one.

Broader scope (to discuss)

The PR #942 thread raised a parallel observation. Both stream_validate authoring and (future) stream_parsed_repr authoring have deterministic checks against a non-streaming counterpart (validate() and parsed_repr respectively), which makes them plausible candidates for agent-friendly extension patterns — potentially skills in frameworks that support them. Worth discussing whether this issue should cover just stream_parsed_repr or also the broader agent-authoring story, or whether the latter warrants its own separate issue.

Dependencies

References

Metadata

Metadata

Assignees

Labels

area/samplingSamplingStrategy, SamplingResult, ModelOption, generation optionsarea/stdlibCore abstractions: Context, MOT, SamplingStrategy, formatters, serializationarea/streamingStreaming chunks, events, per-chunk validationenhancementNew feature or requestepicHigh level Epic

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions