Skip to content

feat: streaming support for query responses #2

@0xneobyte

Description

@0xneobyte

Problem

The query method waits for the complete response before returning anything to the caller. For long-form answers this introduces noticeable latency — the caller receives nothing until generation is fully complete. Every major AI SDK (Anthropic, OpenAI, Gemini) exposes streaming as a first-class feature.

Proposed Behaviour

Add a query_stream method that yields response chunks as they arrive rather than waiting for the full response.

async for chunk in client.query_stream("What is Python?"):
    print(chunk, end="", flush=True)
  • Existing query() is unchanged
  • query_stream() returns an async generator
  • Citations and metadata returned at end of stream

Files to Modify

File Change
src/brainus_ai/client.py Add query_stream async generator method
src/brainus_ai/models.py Add streaming chunk model
src/brainus_ai/__init__.py Export new types

Acceptance Criteria

  • query_stream() yields text chunks progressively as they arrive
  • Existing query() behaviour is unchanged
  • Citations accessible at end of stream
  • Stream can be cancelled mid-way without errors
  • Full type hints on all new methods and models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions