Conversation
Member
|
Will be good to create an e2e test that is equivalent to https://llamastack.github.io/docs/getting_started/quickstart#step-3-run-the-demo |
raghotham
reviewed
Oct 12, 2025
Member
raghotham
left a comment
There was a problem hiding this comment.
Will there need to be another change when conversation support is added?
Contributor
Author
@raghotham confused by your statement:
|
- replace legacy `client.alpha.agents.*` paths in both sync and async agent implementations with the `/v1/responses` + `/v1/conversations` flow - treat each `Agent.create_session()` as a lazily created conversation, caching the returned `conv_…` ID for later turns - stream turns via `client.responses.create(..., stream=True)` and translate OpenAI `ResponseObjectStream` events into the agent event surface introduced in `lib/agents/stream_events.py` - run client and builtin tool calls by emitting follow-up responses with `previous_response_id`, mirroring the old turn-resume semantics - remove the legacy `AgentTurnResponseStreamChunk` dependency, introduce a lightweight `AgentStreamChunk`, and keep tool outputs inside `lib/` only - clean up aux imports, drop the unused `__future__` pragmas, and ensure the entire module passes `ruff check` This refactor keeps the public `Agent` API (create_session/create_turn) intact while aligning the implementation with stable responses/conversations APIs, so users can interoperate with standard OpenAI-compatible clients going forward.
This commit implements a high-level turn and step event model that wraps the low-level responses API stream events. The new model provides semantic meaning to agent interactions and distinguishes between server-side and client-side tool execution. Key changes: - Add turn_events.py with new event dataclasses (TurnStarted, StepProgress, etc.) - Add event_synthesizer.py for stateful event translation - Update Agent and AsyncAgent to use new event system - Update event_logger.py to work with new event structures - Separate server-side tools (file_search, web_search) from client-side function calls The turn model represents a complete interaction loop that may span multiple responses, with distinct inference and tool_execution steps. Server-side tools execute within responses and are logged as progress events, while client-side function tools trigger separate tool execution steps.
Major architectural change based on user feedback: - inference steps = model thinking/deciding what to do - tool_execution steps = ANY tool executing (server OR client-side) Previous incorrect design had server-side tools as progress within inference. New correct design: ALL tools (server and client) appear as tool_execution steps. The difference between server and client tools is operational: - Server-side (file_search, web_search, mcp_call): Execute within response stream, synthesizer emits tool_execution boundaries - Client-side (function): Break response stream, agent.py emits tool_execution when executing Both are annotated with metadata.server_side for clarity. Changes: - Rewrote event_synthesizer to emit tool_execution steps for server-side tools - Updated event_logger to differentiate server vs client in logs - Added metadata to StepStarted for server_side flag - Server-side tools now: complete inference -> tool_execution step -> new inference
Three focused tests validate core architecture: 1. test_basic_turn_without_tools - Validates simple text-only turn - Verifies turn_started -> inference step -> turn_completed flow - No tool execution steps 2. test_server_side_file_search_tool ⭐ KEY TEST - Validates server-side tools appear as tool_execution steps - Verifies metadata.server_side=True - Tests inference -> tool_execution (server) -> inference flow 3. test_client_side_function_tool - Validates client-side tools appear as tool_execution steps - Verifies metadata.server_side=False - Tests inference -> tool_execution (client) -> inference flow All tests verify the key principle: tool_execution steps for ALL tools, regardless of where they execute (server or client).
Python dataclasses require fields with default values to come after fields without defaults. Reordered all event dataclass fields to fix TypeError: non-default argument follows default argument.
5b00477 to
4fa1653
Compare
Contributor
Author
|
Landing this! |
ashwinb
added a commit
to llamastack/llama-stack
that referenced
this pull request
Oct 15, 2025
…3810) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from llamastack/llama-stack-client-python#281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Testing