Skip to content

Commit b78f882

Browse files
committed
fix examples
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
1 parent 0a54269 commit b78f882

5 files changed

Lines changed: 35 additions & 13 deletions

File tree

docs/examples/streaming/README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,11 @@ uv run --with mellea docs/examples/streaming/advanced_streaming.py
3131
## Key Concepts
3232

3333
### Streaming Requires Async
34-
Streaming is only available with async functions (`ainstruct`, `aact`) using `await_result=False`:
34+
Streaming is only available with async functions (`ainstruct`, `aact`) using `await_result=False` and `strategy=None`:
3535

3636
```python
37-
# This works - async with await_result=False
38-
thunk = await m.ainstruct("Hello", await_result=False)
37+
# This works - async with await_result=False and strategy=None
38+
thunk = await m.ainstruct("Hello", await_result=False, strategy=None)
3939
last_length = 0
4040
while not thunk.is_computed():
4141
current_value = await thunk.astream()
@@ -54,9 +54,10 @@ result = m.instruct("Hello") # Already computed, cannot stream
5454
- **`ComputedModelOutputThunk`**: Already computed, cannot be streamed
5555

5656
### Limitations
57-
- Cannot stream when using `SamplingStrategy` (validation requires complete output)
57+
- Cannot stream when using `SamplingStrategy` (validation requires complete output) - must set `strategy=None`
5858
- Cannot stream from synchronous functions (would cause deadlock)
5959
- Streaming requires an async context
60+
- Default `strategy=RejectionSamplingStrategy(loop_budget=2)` must be disabled for streaming
6061

6162
## See Also
6263
- [Tutorial Chapter 13: Streaming Model Outputs](../../tutorial.md#chapter-13-streaming-model-outputs)

docs/examples/streaming/advanced_streaming.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,14 +91,22 @@ async def compare_streaming_vs_blocking():
9191
# Streaming
9292
print("\n2. Streaming mode (await_result=False):")
9393
print(" Tokens appear as generated: ", end="", flush=True)
94-
thunk = await m.ainstruct("Write a haiku about programming.", await_result=False)
94+
thunk = await m.ainstruct(
95+
"Write a haiku about programming.",
96+
await_result=False,
97+
strategy=None, # Must disable strategy for streaming
98+
)
9599

100+
# Stream until complete - call astream() at least once even if already computed
96101
last_length = 0
97-
while not thunk.is_computed():
102+
while True:
98103
current_value = await thunk.astream()
99104
new_content = current_value[last_length:]
100105
print(new_content, end="", flush=True)
101106
last_length = len(current_value)
107+
108+
if thunk.is_computed():
109+
break
102110
print()
103111

104112

docs/examples/streaming/basic_streaming.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ async def stream_story():
1818

1919
# Get uncomputed thunk for streaming
2020
thunk = await m.ainstruct(
21-
"Write a short story about a robot learning to paint.", await_result=False
21+
"cont up 1 through 100.",
22+
await_result=False,
23+
strategy=None, # Must disable strategy for streaming
2224
)
2325

2426
# Stream the output - astream() returns accumulated value so far

docs/examples/streaming/interactive_chat.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,23 @@
22
33
This example shows how to build an interactive chat application where
44
the AI's responses are streamed incrementally for a better user experience.
5+
6+
Note: This example uses ChatContext which triggers a warning about async usage.
7+
The warning is expected but safe here because we await the result after streaming.
8+
For production use, consider using SimpleContext or handling the context updates manually.
59
"""
610

711
# pytest: ollama, llm
812

913
import asyncio
1014

1115
import mellea
12-
from mellea.stdlib.context import ChatContext
16+
from mellea.stdlib.context import SimpleContext
1317

1418

1519
async def interactive_chat():
1620
"""Run an interactive chat session with streaming responses."""
17-
m = mellea.start_session(ctx=ChatContext())
21+
m = mellea.start_session(ctx=SimpleContext())
1822

1923
print("Chat with the AI (type 'quit' to exit)")
2024
print("-" * 50)
@@ -27,7 +31,11 @@ async def interactive_chat():
2731
print("AI: ", end="", flush=True)
2832

2933
# Stream the response
30-
thunk = await m.ainstruct(user_input, await_result=False)
34+
thunk = await m.ainstruct(
35+
user_input,
36+
await_result=False,
37+
strategy=None, # Must disable strategy for streaming
38+
)
3139

3240
last_length = 0
3341
while not thunk.is_computed():
@@ -37,6 +45,8 @@ async def interactive_chat():
3745
print(new_content, end="", flush=True)
3846
last_length = len(current_value)
3947

48+
# Await the final result to update context properly
49+
await thunk.avalue()
4050
print() # New line after response
4151

4252

docs/tutorial.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1434,7 +1434,7 @@ Mellea supports streaming model outputs, allowing you to process tokens as they
14341434

14351435
### Streaming with Async Functions
14361436

1437-
To enable streaming, use the async versions of session functions (`ainstruct`, `aact`) with the `await_result=False` parameter:
1437+
To enable streaming, use the async versions of session functions (`ainstruct`, `aact`) with the `await_result=False` parameter and `strategy=None`:
14381438

14391439
```python
14401440
# file: https://github.com/generative-computing/mellea/blob/main/docs/examples/streaming/basic_streaming.py#L1-L35
@@ -1447,7 +1447,8 @@ async def stream_story():
14471447
# Get uncomputed thunk for streaming
14481448
thunk = await m.ainstruct(
14491449
"Write a short story about a robot learning to paint.",
1450-
await_result=False
1450+
await_result=False,
1451+
strategy=None # Must disable strategy for streaming
14511452
)
14521453

14531454
# Stream the output - astream() returns accumulated value so far
@@ -1496,7 +1497,7 @@ async for chunk in thunk.astream(): # Stream the generation
14961497

14971498
Therefore, sync functions always await the result internally and return `ComputedModelOutputThunk`.
14981499

1499-
**Streaming and sampling are incompatible**: When using `SamplingStrategy` or `return_sampling_results=True`, the function must await the complete result to perform validation. In these cases, the function always returns a computed result regardless of the `await_result` parameter.
1500+
**Streaming and sampling are incompatible**: When using `SamplingStrategy` or `return_sampling_results=True`, the function must await the complete result to perform validation. In these cases, the function always returns a computed result regardless of the `await_result` parameter. To enable streaming, you must explicitly set `strategy=None` (the default is `RejectionSamplingStrategy(loop_budget=2)`).
15001501

15011502
### Practical Example: Interactive Chat
15021503

0 commit comments

Comments
 (0)