fix(templates): Make each HTTP framework template reuse per-session conversa (#808, #809)#33
Draft
aidandaly24 wants to merge 3 commits into
Draft
fix(templates): Make each HTTP framework template reuse per-session conversa (#808, #809)#33aidandaly24 wants to merge 3 commits into
aidandaly24 wants to merge 3 commits into
Conversation
…oss all HTTP starter base templates (aws#808, aws#809) All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched. Refs aws#808 Refs aws#809
Coverage Report
|
… caches true LRU (aws#808, aws#809) The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver) templates kept per-session state at module scope with no eviction, trading the cross-session leak for an unbounded memory leak in long-running processes. Bound both to 128 active sessions with LRU eviction (delete_session / delete_thread on the least-recently-used key), matching the other three templates. The autogen and strands no-memory caches were plain dicts evicting by insertion order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are accurate. openaiagents already uses functools.lru_cache (true LRU) and is unchanged. Regenerated the assets snapshot. Refs aws#808 Refs aws#809
aws#808) OrderedDict.popitem() returns (key, value) where key is the (user_id, session_id) tuple and value is True. The previous unpack assigned the key tuple to old_user_id and True to old_session_id, so delete_session() was called with garbage and the real session was never deleted -- the InMemorySessionService grew unbounded past the 128-session cap, defeating the aws#808 bound for GoogleADK (LangChain unpacked correctly). Destructure the key tuple and discard the value so the real ids reach delete_session. Add an asset test asserting the store caps at 128 with correct eviction ids, and regenerate the asset snapshot.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs aws#808
Refs aws#809
Issues
agentcore invoke --session-idflag, which is advertised as "Use specific session ID for conversation continuity" and prints "To resume: agentcore invoke --session-id " after each call. AutoGen (a 5th, unlisted HTTP template) has the identical defect._agentacross every in-process invocation, so when a single process serves multiple sessions, conversation history (and any "secret" content) bleeds between sessions/users. In practice this only manifests inagentcore devand in non-Runtime deployments (Lambda/ECS/FastAPI): deployed AgentCore Runtime isolates each session_id in its own microVM, so the cross-user data leak cannot happen in the production runtime. The flip side is that the template also has no per-session multi-turn recall in the no-memory path.Root cause
All three templates fail to thread context.session_id into any per-session history store, verified at v0.20.2 (= HEAD e92c79a). OpenAI (src/assets/python/http/openaiagents/base/main.py): invoke() -> main(prompt) builds a fresh Agent and calls Runner.run(agent, query) with no session= arg; the file never references context.session_id at all (only references are filesystem mount-path template strings). LangGraph (src/assets/python/http/langchain_langgraph/base/main.py:136-180): invoke() calls create_react_agent(...) fresh each call with no checkpointer and ainvoke({"messages":[HumanMessage(...)]}) with no thread_id/config — no checkpoint/thread_id/session anywhere. Google ADK (src/assets/python/http/googleadk/base/main.py:125-148): although context.session_id IS threaded (line 157), setup_session_and_runner() instantiates a brand-new InMemorySessionService() (line 127) and create_session() (line 128) on EVERY invoke, so prior turns are discarded. The Python SDK (bedrock-agentcore v1.9.1 runtime/context.py, app.py) only surfaces session_id to the entrypoint and logging; it does not — and is not expected to — auto-wire conversation history, so this is purely template content owned by the CLI.
Verified by reading the actual template at HEAD (lines 441-468 global
_agent; 386-424 per-session hasMemory branch; 381 NullConversationManager; 581 dispatch), the SDK session_id plumbing (context.py:36, a2a.py:123), and the CLI local invoke session hardcoding (dev/invoke.ts:101). Owner=cli verified: defect is wholly in the CLI-vended asset; SDK delivers session_id correctly and Strands' message accumulation is intended upstream behavior, so neither sdk nor upstream owns it.The fix
Make each template reuse per-session conversation state keyed by context.session_id, in-memory (best-effort, resets on cold start) — the approach in the OPEN closing PR aws#794 (branch fix/template-conversation-history). OpenAI: wrap get_session(session_id) returning openai-agents SQLiteSession in @lru_cache(maxsize=128) and pass session= to Runner.run. Google ADK: hoist _session_service/_runner to module scope, add get_or_create_session() that calls get_session() first and only create_session() when absent (avoids AlreadyExistsError, preserves history). LangGraph: module-level _checkpointer = InMemorySaver() passed to create_react_agent, and config={"configurable": {"thread_id": context.session_id}} on ainvoke. Regenerate the asset snapshot. TWO sharpenings beyond the brief: (1) extend the same fix to the AutoGen template (src/assets/python/http/autogen/base/main.py builds a fresh AssistantAgent every invoke and is equally stateless) — the issue text and PR aws#794 both omit it, leaving the bug half-fixed across HTTP frameworks; (2) the durable alternative is the AgentCore Memory primitive, but note that only Strands currently has a hasMemory variant — these three frameworks ship a base variant only, so in-process best-effort memory is the sole option until/unless a hasMemory path is added.
In src/assets/python/http/strands/base/main.py replace the global
_agentsingleton (lines 441-468) with a per-session-keyed factory so each context.session_id gets its own Agent, mirroring the hasMemory branch. Open PR aws#794's approach is the right design:@lru_cache(maxsize=128) get_or_create_agent(session_id)returning a fresh Agent per session_id, with the entrypoint passing context.session_id; the 128-entry bound caps process memory and evicted sessions restart fresh (best-effort in-memory recall, documented with a pointer to durable session stores for production). Apply the same to the no-memory config-bundle branch (create_agent()at lines 428-438, used at line 579). Design decision required: PR aws#794 keys by session_id and reuses the per-session instance, giving BOTH isolation and in-process multi-turn recall (correct default for a starter template); the closed PR aws#1314 used a per-invocationcreate_agent()that isolates but drops in-process recall — prefer aws#794. Regenerate the asset snapshot at src/assets/tests/snapshots/assets.snapshot.test.ts.snap. The other three HTTP templates' statelessness gap is the same PR's aws#808 scope, not this issue.Files touched: src/assets/python/http/openaiagents/base/main.py (invoke + Runner.run path); src/assets/python/http/googleadk/base/main.py (setup_session_and_runner / call_agent_async); src/assets/python/http/langchain_langgraph/base/main.py (invoke); src/assets/python/http/autogen/base/main.py (invoke — additional, omitted by the brief and PR aws#794); and the regenerated snapshot src/assets/tests/snapshots/assets.snapshot.test.ts.snap. All in the CLI repo (aws/agentcore-cli); tracked by OPEN PR aws#794, which currently covers the 3 named templates + Strands but NOT AutoGen.
src/assets/python/http/strands/base/main.py: global
_agent/get_or_create_agent no-memory no-config-bundle no-payment branch (lines 441-468) and create_agent no-memory config-bundle branch (lines 428-438); entrypoint dispatch at lines 578-582. Snapshot: src/assets/tests/snapshots/assets.snapshot.test.ts.snap. (Open fix: PR aws#794, unmerged; alternative PR aws#1314 closed 2026-05-20.)Validation evidence
The fix was verified by reproducing the original symptom and re-running after the change:
REPRODUCE (pre-fix base = git HEAD, before working-tree changes):
_agent = None(line 441) +global _agent(line 444) -> one shared Agent singleton across all session_ids => cross-session history bleed.Runner.run(agent, query)with NO session= (lines 131/141/154/164),async def main(query)-> stateless every invoke.setup_session_and_runner()constructs a FRESHInMemorySessionService()+create_session()on every invoke (lines 127-128) -> history discarded each turn.graph.ainvoke({...})with no checkpointer and no thread_id (line 174) -> stateless.AssistantAgent(...)constructed INSIDE invoke() every call (line 102) -> stateless.So turn-2 "what is my name?" could not recall "Alice", matching the reported symptom.
AFTER (fix/808-809 working tree):
_agent = None/global _agent= 0 hits). Replaced withagent_factory()returning a bounded module-level cache keyed by session_id (cap 128, LRU-style eviction); invoke() doessession_id = getattr(context,'session_id',...)thenget_or_create_agent(session_id, ...). Mirrors the hasMemory branch (key f"{session_id}/{actor}") => per-session isolation + in-process multi-turn recall.@lru_cache-backed per-session SQLiteSession and passes session= into Runner.run.RE-VERIFY: build clean; targeted greps confirm session_id threading present in all five base/main.py and the strands hasMemory branch + every other framework file are byte-for-byte untouched.
Test suite: green.
Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.