feat(agentserver): Add durable long-running agents to azure-ai-agentserver-core#46839
Draft
RaviPidaparthi wants to merge 14 commits into
Draft
feat(agentserver): Add durable long-running agents to azure-ai-agentserver-core#46839RaviPidaparthi wants to merge 14 commits into
RaviPidaparthi wants to merge 14 commits into
Conversation
…-core Implements a crash-resilient durable task system with: - @durable_task decorator with full lifecycle management (start, run, get, cancel, terminate) - TaskResult[Output] wrapper replacing exception-based suspension handling - Cooperative cancellation and configurable timeouts - Configurable retry policies with backoff - Callable factories for tags, title, and description - Local in-memory provider for development/testing - Task streaming support via AsyncIterator - Lease-based distributed locking - Ephemeral and persistent task modes - Task metadata and source provenance tracking Includes: - 248 passing tests across 17 test modules - 3 sample applications (retry, source, streaming) - Developer guide documentation - Spec files (001-006) covering all design decisions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TaskMetadata: add MutableMapping dict protocol (__setitem__, __getitem__, __delitem__, __contains__, __iter__, __len__, keys, values, items) with dirty-tracking on mutations - Fix cspell CI failures: rename 'sess' abbreviations in _models.py, test_local_provider.py, test_models.py, test_source.py - CHANGELOG 2.0.0b4: document all durable long-running agent features - README: add durable agents section with code examples and dev guide link - Developer guide: update metadata examples to dict-style syntax - Invocations: bump core dep to >=2.0.0b4, add durable samples changelog - Specs 001-007 and backlog: all 16 items resolved Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain the problem (containers can die), the 4-step durability mechanism (persist → lease → recover → complete), and the net effect before listing what the developer doesn't need to think about. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that durable tasks are not a checkpoint/replay engine, not a result store, not a stream log, not app-level persistence, and not unbounded storage. Fix misleading 'checkpoint progress' language to 'lightweight progress signals'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that the framework recovers crashed tasks on container restart automatically, not in response to a caller calling .run() again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix name default: __qualname__, not 'Function name' - Add missing ctx.agent_name and ctx.lease_generation to properties table - Fix recovery description: automatic at startup + on .run()/.start() - Fix cancel semantics: function returning normally = success, not TaskCancelled - Update cancel vs terminate table with accurate outcomes - Fix resume docs: both .run() and .start() handle suspended tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sphinx: remove durable re-exports from core/__init__.py to fix duplicate object description warnings (symbols documented at both core and core.durable levels) - MyPy: fix 3 type errors (_run.py Future type, _manager.py narrowing) - Pylint: fix 55 issues across 7 files (docstrings, unused imports, import ordering, complexity suppressions) - Constitution v1.3.0: add pre-push validation gate (NON-NEGOTIABLE) All checks pass locally: pylint 10.00/10, mypy clean, sphinx clean, 261 tests passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ng, samples Steering: - Full steering implementation with generation model, pending queue, drain logic - ctx.was_steered, ctx.previous_input, ctx.pending_inputs, ctx.generation - SteeringQueueFull exception, TaskResult.is_superseded - Completion-vs-steering race handling with etag - Crash recovery with drain_in_progress flag Task listing: - DurableTask.list(status, session_id) with auto-scoping per function - Server-side: agent_name, session_id, tag, status filters - Client-side: source.type filter (until DEV-009 resolved) - Provider protocol + local provider tag AND filtering Reserved tag protection: - _strip_reserved_tags() at all entry points (decorator, callsite, options) - Framework auto-stamps _durable_task_name tag, always wins Recovery routing: - _find_resume_callback() matches source.name first (stable anchor) - name param documented as stable identity anchor Other: - Local provider payload merge fixed to strict shallow (spec §11) - steering_poll_seconds removed from public API (internal 2s default kept) - Multi-worker references removed (single-container model) - Developer guide cleaned of internal implementation details - Steering spec updated to match implementation - Samples: durable_claude, durable_copilot, updated durable_langgraph Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ming Replace hardcoded asyncio.Queue with a pluggable StreamHandler protocol (put/get/close) for the durable task streaming path. Changes: - New _stream.py: StreamHandler protocol + QueueStreamHandler default - Refactored _context.py, _run.py, _manager.py: _stream_queue -> _stream_handler - Added stream_handler param to start()/run() in _decorator.py - Updated __init__.py exports - Updated test_streaming.py and test_sample_e2e.py - Updated developer guide with Custom Stream Handlers section - SSE streaming samples and invocations framework updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Durable Task Framework for azure-ai-agentserver-core
Adds a crash-resilient durable task system to
azure-ai-agentserver-core, enabling hosted agent scenarios that need persistence, retry, and lifecycle management.Key Features
@durable_taskdecorator — Turns async functions into crash-resilient tasks with full lifecycle (start, run, get, cancel, terminate)TaskResult[Output]— Generic result wrapper with.output,.status,.is_suspended,.suspension_reasonctx.cancelevent + configurable grace period before hard cancellationtags,title,descriptionacceptCallable[[Any, str], ...]for dynamic per-task valuesTaskStoreProviderprotocolAsyncIterator-based streaming with durable checkpointingTesting
Samples & Docs
durable_retry,durable_source,durable_streamingdurable_langgraph,durable_multiturndocs/durable-task-developer-guide.mdFiles Changed
azure-ai-agentserver-core/azure/ai/agentserver/core/durable/— 15 new modulesazure-ai-agentserver-core/tests/durable/— 17 test filesazure-ai-agentserver-core/samples/— 3 sample directoriesazure-ai-agentserver-core/docs/— Developer guideazure-ai-agentserver-invocations/samples/— 2 integration samples