diff --git a/.autover/autover.json b/.autover/autover.json index 8985c52bb..e50361903 100644 --- a/.autover/autover.json +++ b/.autover/autover.json @@ -47,6 +47,10 @@ "Name": "Amazon.Lambda.Core", "Path": "Libraries/src/Amazon.Lambda.Core/Amazon.Lambda.Core.csproj" }, + { + "Name": "Amazon.Lambda.DurableExecution", + "Path": "Libraries/src/Amazon.Lambda.DurableExecution/Amazon.Lambda.DurableExecution.csproj" + }, { "Name": "Amazon.Lambda.DynamoDBEvents", "Path": "Libraries/src/Amazon.Lambda.DynamoDBEvents/Amazon.Lambda.DynamoDBEvents.csproj" diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 31e288af2..63777c644 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -85,6 +85,7 @@ The available projects are: * Amazon.Lambda.ConfigEvents * Amazon.Lambda.ConnectEvents * Amazon.Lambda.Core +* Amazon.Lambda.DurableExecution * Amazon.Lambda.DynamoDBEvents * Amazon.Lambda.DynamoDBEvents.SDK.Convertor * Amazon.Lambda.KafkaEvents diff --git a/Docs/durable-execution-design.md b/Docs/durable-execution-design.md new file mode 100644 index 000000000..6df424c5f --- /dev/null +++ b/Docs/durable-execution-design.md @@ -0,0 +1,2228 @@ +# .NET Lambda Durable Execution SDK Design + +## Table of Contents + +- [Overview](#overview) +- [Motivation](#motivation) +- [How Durable Execution Works](#how-durable-execution-works) +- [User Experience](#user-experience) + - [Quick Start](#quick-start) + - [Steps](#steps) + - [Wait Operations](#wait-operations) + - [Callbacks](#callbacks) + - [Invoke (Chained Functions)](#invoke-chained-functions) + - [Parallel Execution](#parallel-execution) + - [Map Operations](#map-operations) + - [Child Contexts](#child-contexts) + - [Error Handling & Retry](#error-handling--retry) + - [Logging](#logging) +- [Internals](#internals) +- [API Reference](#api-reference) + - [IDurableContext](#idurablecontext) + - [Configuration Types](#configuration-types) + - [Result Types](#result-types) + - [Exception Types](#exception-types) +- [Serialization](#serialization) +- [Integration with Existing Libraries](#integration-with-existing-libraries) +- [Testing](#testing) +- [Local development (Test Tool v2 and Aspire)](#local-development-test-tool-v2-and-aspire) +- [Requirements & Constraints](#requirements--constraints) +- [Package Structure](#package-structure) +- [Implementation plan](#implementation-plan) +- [Cross-SDK API comparison](#cross-sdk-api-comparison) +- [Common Patterns](#common-patterns) + +--- + +## Overview + +Lambda Durable Functions let you write multi-step workflows that persist state automatically. They can run for days or months, survive failures, and you only pay for actual compute time. + +This doc covers the **.NET Durable Execution SDK** (`Amazon.Lambda.DurableExecution`). SDKs already exist for [Python](https://github.com/aws/aws-durable-execution-sdk-python) and [JavaScript/TypeScript](https://github.com/aws/aws-durable-execution-sdk-js). + +Related: [GitHub Issue #2216](https://github.com/aws/aws-lambda-dotnet/issues/2216) + +--- + +## Motivation + +### The problem + +Today, building multi-step Lambda workflows in .NET requires one of: + +1. **Step Functions** -- a separate service with its own state machine language (ASL), adding latency between steps and forcing you to learn a second programming model. +2. **Manual state management** -- rolling your own checkpointing with DynamoDB or S3, plus retry logic, idempotency keys, and resumption code. +3. **Event-driven choreography** -- chaining functions through SQS/SNS/EventBridge, scattering a single workflow's logic across half a dozen Lambda functions. + +All three push infrastructure concerns into your business logic. The code gets harder to read and test, and nobody wants to inherit it. + +### What durable functions do instead + +With this SDK, you write sequential code and the runtime handles persistence: +- Checkpoints each step's result +- Suspends when waiting (no compute charges while idle) +- Resumes from the last checkpoint on the next invocation +- Retries failed steps with configurable backoff +- Waits for callbacks from external systems + +Your function reads like a normal async method. The SDK deals with state, replay, and recovery. + +### Why build a .NET SDK + +.NET has a large Lambda user base, especially in enterprise shops running order processing, document pipelines, and (increasingly) AI agent workflows. Today those teams either use Step Functions or build custom state machines. A native .NET SDK removes that tradeoff. + +--- + +## How Durable Execution Works + +### The replay model + +Durable functions use a replay-based execution model. Every invocation runs your code from the top, but previously completed steps return their cached result instead of re-executing. + +1. Lambda invokes your function with a `DurableExecutionInvocationInput` containing: + - `DurableExecutionArn` -- unique execution identifier + - `CheckpointToken` -- for optimistic concurrency + - `InitialExecutionState` -- previously checkpointed operations + +2. Your function code runs **from the beginning** on every invocation. + +3. When a **step** is encountered: + - Previously completed → return cached result (no re-execution) + - New → execute it, checkpoint the result, continue + +4. When a **wait** is encountered: + - Already elapsed → continue + - Still pending → return `PENDING`, Lambda terminates, service re-invokes later + +5. The function returns one of: + - `SUCCEEDED` -- workflow completed + - `FAILED` -- workflow failed + - `PENDING` -- workflow suspended (waiting for time or callback) + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ First Invocation (t=0s) │ +│ │ +│ handler(event, context) │ +│ │ │ +│ ├─► context.StepAsync(FetchData) → executes, checkpoints │ +│ │ │ +│ ├─► context.WaitAsync(30 seconds) → returns PENDING │ +│ │ │ +│ └── (Lambda terminates, environment recyclable) │ +└─────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────┐ +│ Second Invocation (t=30s) │ +│ │ +│ handler(event, context) │ +│ │ │ +│ ├─► context.StepAsync(FetchData) → returns cached result │ +│ │ │ +│ ├─► context.WaitAsync(30 seconds) → already elapsed, skip │ +│ │ │ +│ ├─► context.StepAsync(ProcessData) → executes, checkpoints │ +│ │ │ +│ └── return result → SUCCEEDED │ +└─────────────────────────────────────────────────────────────────┘ +``` + +--- + +## User Experience + +### Quick Start + +#### Installation + +```shell +dotnet add package Amazon.Lambda.DurableExecution +``` + +#### Minimal Example + +```csharp +using Amazon.Lambda.Annotations; +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; + +[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))] + +namespace MyDurableFunction; + +public class Function +{ + [LambdaFunction] + [DurableExecution] + public async Task Handler(OrderEvent input, IDurableContext context) + { + // Step 1: Validate the order (checkpointed automatically) + var validation = await context.StepAsync( + async (step) => await ValidateOrder(input.OrderId), + name: "validate_order"); + + if (!validation.IsValid) + return new OrderResult { Status = "rejected" }; + + // Step 2: Wait for processing (Lambda is NOT running during this time) + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "processing_delay"); + + // Step 3: Process the order + var result = await context.StepAsync( + async (step) => await ProcessOrder(input.OrderId), + name: "process_order"); + + return new OrderResult { Status = "approved", OrderId = result.OrderId }; + } + + private async Task ValidateOrder(string orderId) { /* ... */ } + private async Task ProcessOrder(string orderId) { /* ... */ } +} +``` + +Things to notice: +- `[LambdaFunction]` + `[DurableExecution]` triggers source generation, so you don't wire up the handler yourself +- Each step function receives an `IStepContext` with a step-scoped logger, attempt number, and operation ID +- Each `StepAsync` call checkpoints its result automatically +- `WaitAsync` suspends the function -- Lambda is not running (or billing you) during the wait +- On replay, completed steps return their cached result without re-executing +- The generated wrapper handles checkpoint batching and cleanup + +#### Manual Handler (Without Annotations) + +If you don't use `Amazon.Lambda.Annotations`, use `DurableFunction.WrapAsync` — a static helper (inspired by [OpenTelemetry's `AWSLambdaWrapper.TraceAsync`](https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.AWSLambda#lambda-function)) that handles the entire durable execution envelope for you: + +```csharp +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; + +[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))] + +namespace MyDurableFunction; + +public class Function +{ + public Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, ILambdaContext context) + => DurableFunction.WrapAsync(MyWorkflow, invocationInput, context); + + private async Task MyWorkflow(OrderEvent input, IDurableContext context) + { + var validation = await context.StepAsync( + async (step) => await ValidateOrder(input.OrderId), + name: "validate_order"); + + if (!validation.IsValid) + return new OrderResult { Status = "rejected" }; + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "processing_delay"); + + var result = await context.StepAsync( + async (step) => await ProcessOrder(input.OrderId), + name: "process_order"); + + return new OrderResult { Status = "approved", OrderId = result.OrderId }; + } + + private async Task ValidateOrder(string orderId) { /* ... */ } + private async Task ProcessOrder(string orderId) { /* ... */ } +} +``` + +`DurableFunction.WrapAsync` handles all the plumbing: +- Hydrates `ExecutionState` from `invocationInput.InitialExecutionState` +- Extracts the user payload from the service envelope +- Runs the workflow through `DurableExecutionHandler.RunAsync` +- Constructs and returns the `DurableExecutionInvocationOutput` envelope (status mapping, JSON serialization) +- Sets execution environment tracking + +For workflows that return no value, use the single-type-parameter overload: + +```csharp +public Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, ILambdaContext context) + => DurableFunction.WrapAsync(MyWorkflow, invocationInput, context); + +private async Task MyWorkflow(OrderEvent input, IDurableContext context) +{ + await context.StepAsync(async (step) => await SendNotification(input.UserId), name: "notify"); + await context.WaitAsync(TimeSpan.FromHours(1), name: "cooldown"); + await context.StepAsync(async (step) => await Cleanup(input.UserId), name: "cleanup"); +} +``` + +For **NativeAOT** deployments, pass a `JsonSerializerContext` so the SDK can serialize/deserialize your input and output types without reflection: + +```csharp +[JsonSerializable(typeof(OrderEvent))] +[JsonSerializable(typeof(OrderResult))] +internal partial class MyJsonContext : JsonSerializerContext { } + +public class Function +{ + public Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, ILambdaContext context) + => DurableFunction.WrapAsync( + MyWorkflow, invocationInput, context, MyJsonContext.Default); + + private async Task MyWorkflow(OrderEvent input, IDurableContext context) + { + // ... + } +} +``` + +To inject a custom `IAmazonLambda` client (e.g., for VPC endpoints or unit testing), use the overload that accepts one: + +```csharp +public class Function +{ + private readonly IAmazonLambda _lambdaClient; + + public Function(IAmazonLambda lambdaClient) => _lambdaClient = lambdaClient; + + public Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, ILambdaContext context) + => DurableFunction.WrapAsync( + MyWorkflow, invocationInput, context, _lambdaClient); +} +``` + +You'd also need to manually configure the CloudFormation template with `DurableConfig` and managed policies: + +```json +{ + "Resources": { + "MyFunction": { + "Type": "AWS::Serverless::Function", + "Properties": { + "Handler": "MyDurableFunction::MyDurableFunction.Function::FunctionHandler", + "Policies": [ + "AWSLambdaBasicExecutionRole", + "AWSLambdaBasicDurableExecutionRolePolicy" + ], + "DurableConfig": { + "Enabled": true + } + } + } + } +} +``` + +##### What WrapAsync does internally + +For reference, here's the expanded version of what `DurableFunction.WrapAsync` eliminates — this is effectively what the source generator produces for the Annotations path: + +```csharp +public async Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext) +{ + // 1. Hydrate execution state from previously checkpointed operations + var state = new ExecutionState(); + state.LoadFromCheckpoint(invocationInput.InitialExecutionState); + + // 2. Extract user payload from the service envelope (internal) + var userPayload = ExtractUserPayload(invocationInput); + + // 3. Run the user's workflow via DurableExecutionHandler.RunAsync + var result = await DurableExecutionHandler.RunAsync( + state, + async (durableContext) => await MyWorkflow(userPayload, durableContext), + invocationInput.DurableExecutionArn); + + // 4. Construct and return the service output envelope + return new DurableExecutionInvocationOutput + { + Status = result.Status, + Result = result.Status == InvocationStatus.Succeeded + ? JsonSerializer.Serialize(result.Result) + : null, + ErrorMessage = result.Message + }; +} +``` + +Key differences between `WrapAsync` and the Annotations approach: +- `WrapAsync` still requires you to define the Lambda entry point signature (`DurableExecutionInvocationInput` → `DurableExecutionInvocationOutput`) +- You configure `DurableConfig` + managed policies in your CloudFormation template manually (not generated) +- No `[LambdaFunction]` or `[DurableExecution]` attributes needed + +With `[LambdaFunction] + [DurableExecution]`, even the entry point and CloudFormation config are generated at compile time — you just write the workflow method. + +--- + +### Steps + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/step.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/step-handler/step-handler.ts) + +A step runs your code and checkpoints the result. On replay, the cached result comes back without re-executing. Each step function receives an `IStepContext` with a step-scoped logger and attempt metadata. + +```csharp +// Basic step +var result = await context.StepAsync(async (step) => await CallExternalApi()); + +// Named step (recommended for debugging/testing) +var user = await context.StepAsync( + async (step) => await FetchUser(userId), + name: "fetch_user"); + +// Using the step-scoped logger (includes step name, attempt number, operation ID) +var order = await context.StepAsync( + async (step) => + { + step.Logger.LogInformation("Fetching order {OrderId}", orderId); + return await orderService.GetOrder(orderId); + }, + name: "get_order"); + +// Step with configuration +var payment = await context.StepAsync( + async (step) => await chargeCard(amount), + name: "charge_card", + config: new StepConfig + { + Semantics = StepSemantics.AtMostOncePerRetry, + RetryStrategy = RetryStrategy.Exponential(maxAttempts: 3, initialDelay: TimeSpan.FromSeconds(1)) + }); +``` + +#### Step Semantics + +| Semantics | Behavior | Use Case | +|-----------|----------|----------| +| `AtLeastOncePerRetry` (default) | Step re-executes on each retry | Idempotent operations (calculations, reads) | +| `AtMostOncePerRetry` | Step executes at most once per retry | Side effects (payments, emails, writes) | + +--- + +### Wait Operations + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/wait.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/wait-handler/wait-handler.ts) + +Waits suspend the function without consuming compute time. Lambda can recycle the execution environment. + +```csharp +// Wait for a specific duration +await context.WaitAsync(TimeSpan.FromSeconds(30)); +await context.WaitAsync(TimeSpan.FromMinutes(5), name: "cooldown"); +await context.WaitAsync(TimeSpan.FromHours(24), name: "daily_check"); +await context.WaitAsync(TimeSpan.FromDays(7), name: "weekly_reminder"); +``` + +> **Validation:** The duration must be at least 1 second. Values less than 1 second throw `ArgumentOutOfRangeException`. Sub-second precision is truncated to whole seconds (the underlying service operates at second granularity). + +--- + +### Callbacks + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/callback.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/callback-handler/callback.ts) + +Callbacks let your workflow pause until an external system responds (human approval, a webhook, a third-party API). + +#### Create a Callback (Advanced) + +```csharp +// Create a callback and get the callback ID +var callback = await context.CreateCallbackAsync( + name: "approval_callback", + config: new CallbackConfig + { + Timeout = TimeSpan.FromHours(24), + HeartbeatTimeout = TimeSpan.FromHours(2) + }); + +// Send the callback ID to an external system +await context.StepAsync( + async () => await SendApprovalEmail(callback.CallbackId, recipientEmail), + name: "send_approval_email"); + +// Wait for the external system to respond +var result = await callback.GetResultAsync(); +``` + +#### Wait For Callback (Simple) + +```csharp +// Combined pattern: create callback, submit to external system, wait for result +var approval = await context.WaitForCallbackAsync( + async (callbackId, ctx) => + { + await SendApprovalEmail(callbackId, managerEmail); + }, + name: "wait_for_approval", + config: new WaitForCallbackConfig + { + Timeout = TimeSpan.FromHours(24), + RetryStrategy = RetryStrategy.Exponential(maxAttempts: 3) + }); + +if (approval.Approved) +{ + await context.StepAsync(async (step) => await ExecutePlan(), name: "execute"); +} +``` + +**Example `SendApprovalEmail` stub:** +```csharp +private async Task SendApprovalEmail(string callbackId, string recipientEmail) +{ + // Include the callbackId in the approval link so the external system + // can complete the callback via the AWS API + var approvalLink = $"https://my-app.example.com/approve?callbackId={callbackId}"; + await emailService.SendAsync(recipientEmail, "Approval Required", $"Please approve: {approvalLink}"); +} +``` + +**External system completes the callback via AWS API:** +```bash +aws lambda send-durable-execution-callback-success \ + --function-name my-function:1 \ + --callback-id "cb-12345" \ + --payload '{"approved": true, "approver": "jane@example.com"}' +``` + +--- + +### Invoke (Chained Functions) + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/invoke.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/invoke-handler/invoke-handler.ts) + +Call another durable function. The invocation is checkpointed, so it survives failures and won't double-fire. + +```csharp +// Invoke another durable function +var paymentResult = await context.InvokeAsync( + functionName: "arn:aws:lambda:us-east-1:123456789012:function:payment-processor:prod", + payload: new PaymentRequest { Amount = 100, Currency = "USD" }, + name: "process_payment", + config: new InvokeConfig + { + Timeout = TimeSpan.FromMinutes(5) + }); +``` + +> **Note:** Durable function invocations require **qualified identifiers** — include a version number, alias, or `$LATEST`: +> - ✅ `arn:aws:lambda:us-east-1:123456789012:function:payment-processor:prod` (alias) +> - ✅ `arn:aws:lambda:us-east-1:123456789012:function:payment-processor:42` (version) +> - ✅ `arn:aws:lambda:us-east-1:123456789012:function:payment-processor:$LATEST` +> - ❌ `arn:aws:lambda:us-east-1:123456789012:function:payment-processor` (unqualified — not supported) + +--- + +### Parallel Execution + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/parallel.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/parallel-handler/parallel-handler.ts) + +Run independent operations concurrently. The JS SDK uses a `DurablePromise` pattern where operations are deferred until awaited; in .NET that isn't necessary because `ParallelAsync` and `MapAsync` cover the same use case idiomatically. `Task`-returning methods start immediately and `await` retrieves the result, so there's no gap to fill with a lazy wrapper. + +> **Prefer `ParallelAsync` over `Task.WhenAll`:** While `Task.WhenAll` works correctly with durable operations (operation IDs are allocated deterministically), it bypasses completion policies, concurrency limits, branch naming, and `IBatchResult` structured output. Always use `ParallelAsync` or `MapAsync` for concurrent durable operations. A future Roslyn analyzer (DE004) will flag `Task.WhenAll` usage with durable tasks and suggest `ParallelAsync` as a replacement. + +```csharp +// Run multiple operations in parallel +var results = await context.ParallelAsync( + new Func>[] + { + async (ctx) => await ctx.StepAsync(async (step) => await FetchUserData(userId), name: "fetch_user"), + async (ctx) => await ctx.StepAsync(async (step) => await FetchOrderHistory(userId), name: "fetch_orders"), + async (ctx) => await ctx.StepAsync(async (step) => await FetchPreferences(userId), name: "fetch_prefs"), + }, + name: "parallel_fetch", + config: new ParallelConfig + { + MaxConcurrency = 3, + CompletionConfig = CompletionConfig.AllSuccessful() + }); + +// Access individual results +var userData = results.GetResults()[0]; +var orderHistory = results.GetResults()[1]; +var preferences = results.GetResults()[2]; +``` + +#### Named Parallel Branches + +For better observability, you can name individual branches (matching the JS SDK pattern): + +```csharp +// Named branches for easier debugging and testing +var results = await context.ParallelAsync( + new NamedBranch[] + { + new("fetch_user", async (ctx) => await ctx.StepAsync(async (step) => await FetchUserData(userId))), + new("fetch_orders", async (ctx) => await ctx.StepAsync(async (step) => await FetchOrderHistory(userId))), + new("fetch_prefs", async (ctx) => await ctx.StepAsync(async (step) => await FetchPreferences(userId))), + }, + name: "parallel_fetch"); + +// In tests, you can find specific branches by name +var fetchUserBranch = result.GetOperation("fetch_user"); +``` + +#### Completion Configurations + +`ParallelAsync` and `MapAsync` accept a `CompletionConfig` to control when the overall operation is considered complete: + +```csharp +// All must succeed (default) +CompletionConfig.AllSuccessful() + +// Complete when any one succeeds +CompletionConfig.FirstSuccessful() + +// Complete when all finish (regardless of success/failure) +CompletionConfig.AllCompleted() + +// Custom: succeed if at least 3 succeed, tolerate up to 2 failures +new CompletionConfig +{ + MinSuccessful = 3, + ToleratedFailureCount = 2 +} +``` + +--- + +### Map Operations + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/map.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/map-handler/map-handler.ts) + +Process a collection in parallel with configurable concurrency. The `items` parameter accepts any `IReadOnlyList` (arrays, lists, etc.). + +```csharp +var orders = new[] { "order-1", "order-2", "order-3", "order-4", "order-5" }; + +var results = await context.MapAsync( + items: orders, // IReadOnlyList + func: async (ctx, orderId, index, allItems) => + { + return await ctx.StepAsync( + async () => await ProcessOrder(orderId), + name: $"process_order_{index}"); + }, + name: "process_all_orders", + config: new MapConfig + { + MaxConcurrency = 3, + CompletionConfig = CompletionConfig.AllSuccessful(), + ItemNamer = (orderId, index) => $"Order-{orderId}" // Readable names for observability + }); + +// Check results +results.ThrowIfError(); // Throws if any item failed +var processedOrders = results.GetResults(); +``` + +--- + +### Child Contexts + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/operation/child.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/run-in-child-context-handler/run-in-child-context-handler.ts) + +Child contexts group related durable operations into a sub-workflow. Use them when you need waits or multiple steps inside a logical unit (you cannot nest durable calls inside a step directly). + +```csharp +// Group operations into a child context +var enrichedData = await context.RunInChildContextAsync( + async (childCtx) => + { + var validated = await childCtx.StepAsync( + async () => await Validate(data), + name: "validate"); + + await childCtx.WaitAsync(TimeSpan.FromSeconds(1), name: "rate_limit"); + + var enriched = await childCtx.StepAsync( + async () => await Enrich(validated), + name: "enrich"); + + return enriched; + }, + name: "validation_phase"); + +// Use the enriched data in a subsequent step +var finalResult = await context.StepAsync( + async () => await SubmitEnrichedData(enrichedData), + name: "submit"); +``` + +> **Why child contexts?** You cannot nest durable operations inside a step. Steps are leaf operations. If you need multiple durable operations grouped together, use a child context. + +--- + +### Error Handling & Retry + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/retries.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/utils/retry/retry-config/index.ts) + +#### Retry Strategies + +```csharp +// Exponential backoff with jitter +var result = await context.StepAsync( + async () => await CallUnreliableApi(), + name: "api_call", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 5, + initialDelay: TimeSpan.FromSeconds(1), + maxDelay: TimeSpan.FromSeconds(60), + backoffRate: 2.0, + jitter: JitterStrategy.Full) + }); + +// Using presets +var result = await context.StepAsync( + async () => await CallApi(), + name: "api_call", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Default // 6 attempts, 2x backoff, 5s initial, Full jitter + }); + +// Available presets: +// RetryStrategy.None — maxAttempts: 1 (no retry) +// RetryStrategy.Default — 6 attempts, 2x backoff, 5s initial delay, Full jitter +// RetryStrategy.Transient — 3 attempts, 2x backoff, 1s initial delay, Full jitter + +// Custom retry strategy +var result = await context.StepAsync( + async () => await CallApi(), + name: "api_call", + config: new StepConfig + { + RetryStrategy = new CustomRetryStrategy((exception, attemptCount) => + { + // Only retry transient errors + if (exception is HttpRequestException httpEx && httpEx.StatusCode >= 500) + return RetryDecision.RetryAfter(TimeSpan.FromSeconds(Math.Pow(2, attemptCount))); + + return RetryDecision.DoNotRetry(); + }) + }); + +// Retry with specific exception types +var result = await context.StepAsync( + async () => await CallApi(), + name: "api_call", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableExceptions: new[] { typeof(TimeoutException), typeof(HttpRequestException) }) + }); + +// Retry with message pattern matching (regex) +var result = await context.StepAsync( + async () => await CallApi(), + name: "api_call", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableExceptions: new[] { typeof(HttpRequestException) }, + retryableMessagePatterns: new[] { "timeout", "throttl", "5\\d{2}" }) + }); +``` + +#### Jitter Strategies + +Jitter prevents thundering-herd scenarios where multiple retrying clients converge on the same backoff schedule. The SDK supports three jitter strategies: + +```csharp +public enum JitterStrategy +{ + /// No randomization — delay is exactly the calculated backoff value. + None, + + /// Random delay between 0 and the calculated backoff value (recommended). + Full, + + /// Random delay between 50% and 100% of the calculated backoff value. + Half +} +``` + +The default jitter for `RetryStrategy.Exponential()` is `JitterStrategy.Full`. All built-in presets (`RetryStrategy.Default`, `RetryStrategy.Transient`) also use `JitterStrategy.Full`. Use `JitterStrategy.None` only when you need deterministic retry timing (e.g., for testing). + +#### Retry Strategy Interface + +```csharp +public interface IRetryStrategy +{ + RetryDecision ShouldRetry(Exception exception, int attemptNumber); +} + +public record RetryDecision +{ + public bool ShouldRetry { get; } + public TimeSpan Delay { get; } + + public static RetryDecision DoNotRetry() => new() { ShouldRetry = false }; + public static RetryDecision RetryAfter(TimeSpan delay) => new() { ShouldRetry = true, Delay = delay }; +} +``` + +`IRetryStrategy` supports implicit conversion from `Func`, enabling inline lambdas: + +```csharp +config: new StepConfig +{ + RetryStrategy = (ex, attempt) => + attempt < 3 && ex is HttpRequestException + ? RetryDecision.RetryAfter(TimeSpan.FromSeconds(Math.Pow(2, attempt))) + : RetryDecision.DoNotRetry() +} +``` + +#### Saga Pattern (Compensating Transactions) + +```csharp +[DurableExecution] +public async Task Handler(BookingRequest input, IDurableContext context) +{ + var compensations = new List<(string Name, Func Action)>(); + + try + { + var flight = await context.StepAsync( + async () => await BookFlight(input), + name: "book_flight"); + compensations.Add(("cancel_flight", async () => await CancelFlight(flight.Id))); + + var hotel = await context.StepAsync( + async () => await BookHotel(input), + name: "book_hotel"); + compensations.Add(("cancel_hotel", async () => await CancelHotel(hotel.Id))); + + var car = await context.StepAsync( + async () => await BookCar(input), + name: "book_car"); + compensations.Add(("cancel_car", async () => await CancelCar(car.Id))); + + return new BookingResult { Status = "confirmed" }; + } + catch (Exception ex) + { + // Execute compensations in reverse order + foreach (var (name, action) in compensations.AsEnumerable().Reverse()) + { + await context.StepAsync(action, name: name); + } + return new BookingResult { Status = "cancelled", Error = ex.Message }; + } +} +``` + +--- + +### Logging + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/logger.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/utils/logger/logger.ts) + +`context.Logger` is replay-aware: it suppresses duplicate messages that would otherwise repeat on every invocation. Use it instead of `Console.WriteLine`. + +> **Implementation note:** The replay-aware logger is implemented entirely in the durable execution SDK. During replay, the SDK tracks which operations are being restored from checkpoint state vs. executing for the first time, and suppresses log output for replayed operations. No changes to `Amazon.Lambda.RuntimeSupport` or the Lambda Runtime API are required. + +```csharp +[DurableExecution] +public async Task Handler(MyEvent input, IDurableContext context) +{ + // ✅ Replay-safe: only logs once even during replay + context.Logger.LogInformation("Starting workflow for {OrderId}", input.OrderId); + + var result = await context.StepAsync( + async () => await ProcessData(input.Data), + name: "process_data"); + + // ✅ Replay-safe + context.Logger.LogInformation("Processing complete: {Result}", result); + + // ❌ NOT replay-safe: will log on every replay + Console.WriteLine("This will repeat!"); + + return result; +} +``` + +The logger integrates with `Microsoft.Extensions.Logging`: + +```csharp +// context.Logger implements ILogger +context.Logger.LogDebug("Debug info"); +context.Logger.LogInformation("Info message"); +context.Logger.LogWarning("Warning: {Detail}", detail); +context.Logger.LogError(exception, "Error occurred"); +``` + +#### Custom Logger Configuration + +You can swap the logger or disable replay-aware filtering (e.g., to see logs during replay for debugging): + +```csharp +// Use a custom logger (e.g., Serilog, AWS Lambda Powertools) +context.ConfigureLogger(new LoggerConfig +{ + CustomLogger = myCustomLogger, + ModeAware = true // true = suppress during replay (default), false = always log +}); + +// Disable replay-aware filtering to see ALL logs (useful for debugging) +context.ConfigureLogger(new LoggerConfig { ModeAware = false }); +``` + +--- + +## Internals + +### AWS APIs used + +| API | Purpose | +|-----|---------| +| `CheckpointDurableExecution` | Persist operation state (step results, waits, etc.) | +| `GetDurableExecutionState` | Retrieve previously checkpointed state on replay | +| `SendDurableExecutionCallbackSuccess` | External systems signal callback completion | +| `SendDurableExecutionCallbackFailure` | External systems signal callback failure | +| `SendDurableExecutionCallbackHeartbeat` | External systems send heartbeat signals | + +### How suspension works internally + +This follows the same pattern as the JavaScript SDK's `Promise.race`. The .NET equivalent is `Task.WhenAny`. + +When `RunAsync` starts, it kicks off two tasks in parallel: user code and a termination signal (a `TaskCompletionSource` that starts unresolved). Whoever finishes first wins: + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ DurableExecutionHandler.RunAsync │ +│ │ +│ var userTask = userHandler(context); │ +│ var terminationTask = terminationManager.TerminationTask; │ +│ │ +│ var winner = await Task.WhenAny(userTask, terminationTask); │ +│ │ +│ ┌─── userTask ───────────────────┐ ┌─── terminationTask ────────┐ │ +│ │ StepAsync("fetch") → execute │ │ (unresolved TCS - waiting) │ │ +│ │ WaitAsync("delay") → ... │ │ │ │ +│ │ calls Terminate() ──────────────► SetResult() → resolves! │ │ +│ │ awaits forever (blocked) │ │ │ │ +│ └────────────────────────────────┘ └────────────────────────────┘ │ +│ │ +│ winner == terminationTask → return PENDING │ +│ (userTask is abandoned, GC collects it) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +The `TerminationManager` is a thin wrapper around `TaskCompletionSource`: +- `TerminationTask` -- a Task that hangs forever until `Terminate()` is called +- `Terminate(reason)` -- resolves the TCS, causing the race to pick termination + +When user code hits a pending wait or callback: +1. It checkpoints the operation state +2. Calls `terminationManager.Terminate(WaitScheduled)` +3. Awaits a new never-completing `TaskCompletionSource` (blocks itself permanently) +4. `Task.WhenAny` sees the termination task resolved and picks it as the winner +5. `RunAsync` returns PENDING; the abandoned user task is left to be GC'd; Lambda terminates + +### Lifecycle and cleanup + +`RunAsync` manages the full lifecycle internally. When the handler completes (SUCCEEDED/FAILED) or suspends (PENDING), `RunAsync` stops the background checkpoint batcher, flushes any pending checkpoint operations, and disposes internal state. Users never call `Dispose` or wrap anything in `await using`. + +--- + +## API Reference + +### DurableFunction + +Static helper for the non-Annotations handler path. Wraps a workflow function, handling all envelope translation between `DurableExecutionInvocationInput`/`DurableExecutionInvocationOutput` and user types. + +```csharp +/// +/// Static helper that wraps a durable workflow function, handling all envelope +/// translation between DurableExecutionInvocationInput/Output and user types. +/// Inspired by OpenTelemetry.Instrumentation.AWSLambda's AWSLambdaWrapper.TraceAsync pattern. +/// +public static class DurableFunction +{ + // ── Reflection-based overloads (JIT only) ────────────────────────── + + /// + /// Wrap a workflow that takes typed input and returns typed output. + /// Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext); + + /// + /// Wrap a workflow (typed input + output) with explicit Lambda client. + /// Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient); + + /// + /// Wrap a void workflow (typed input, no output). + /// Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext); + + /// + /// Wrap a void workflow with explicit Lambda client. + /// Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient); + + // ── AOT-safe overloads (caller supplies JsonSerializerContext) ────── + + /// + /// Wrap a workflow (typed input + output). AOT-safe — requires + /// [JsonSerializable(typeof(TInput))] and [JsonSerializable(typeof(TOutput))] + /// on the supplied jsonContext. + /// + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + JsonSerializerContext jsonContext); + + /// + /// Wrap a workflow (typed input + output) with explicit Lambda client. AOT-safe. + /// + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient, + JsonSerializerContext jsonContext); + + /// + /// Wrap a void workflow (typed input, no output). AOT-safe. + /// + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + JsonSerializerContext jsonContext); + + /// + /// Wrap a void workflow with explicit Lambda client. AOT-safe. + /// + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient, + JsonSerializerContext jsonContext); +} +``` + +### IDurableContext + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/context.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/types/durable-context.ts) + +The primary interface developers interact with: + +```csharp +public interface IDurableContext +{ + /// + /// Replay-safe logger. Messages are de-duplicated during replay. + /// + ILogger Logger { get; } + + /// + /// Metadata about the current durable execution. + /// + IExecutionContext ExecutionContext { get; } + + /// + /// The underlying Lambda context. + /// + ILambdaContext LambdaContext { get; } + + // ── StepAsync overloads ──────────────────────────────────────────── + // The user's function always receives IStepContext, matching the + // Python and JS SDKs (Java has no-context overloads but deprecated + // them — see https://github.com/aws/aws-durable-execution-sdk-java). + + /// + /// Execute a step with automatic checkpointing using reflection-based JSON. + /// The IStepContext provides a step-scoped logger with operation metadata + /// (step name, attempt number, operation ID) and the current attempt number. + /// + [RequiresUnreferencedCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + [RequiresDynamicCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + Task StepAsync( + Func> func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute a step that returns no value. AOT-safe (no payload to serialize). + /// + Task StepAsync( + Func func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute a step with AOT-safe checkpoint serialization. The supplied + /// serializer is used in place of reflection-based JSON. + /// + Task StepAsync( + Func> func, + ICheckpointSerializer serializer, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Suspend execution for the specified duration. + /// Throws ArgumentOutOfRangeException if duration is less than 1 second. + /// + Task WaitAsync( + TimeSpan duration, + string? name = null, + CancellationToken cancellationToken = default); + + /// + /// Create a callback for external system integration. + /// + Task> CreateCallbackAsync( + string? name = null, + CallbackConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Wait for an external system to respond via callback. + /// + Task WaitForCallbackAsync( + Func submitter, + string? name = null, + WaitForCallbackConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Invoke another durable function. + /// + Task InvokeAsync( + string functionName, + TPayload payload, + string? name = null, + InvokeConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute multiple operations in parallel (unnamed branches). + /// + Task> ParallelAsync( + IReadOnlyList>> functions, + string? name = null, + ParallelConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute multiple named operations in parallel. Named branches appear in + /// execution traces and can be inspected by name in tests. + /// + Task> ParallelAsync( + IReadOnlyList> branches, + string? name = null, + ParallelConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Process a collection of items in parallel. + /// + Task> MapAsync( + IReadOnlyList items, + Func, Task> func, + string? name = null, + MapConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Run operations in an isolated child context. + /// + Task RunInChildContextAsync( + Func> func, + string? name = null, + ChildContextConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Poll until a condition is met. + /// + Task WaitForConditionAsync( + Func> check, + WaitForConditionConfig config, + string? name = null, + CancellationToken cancellationToken = default); +} +``` + +#### Supporting Types + +```csharp +/// +/// Context passed to step functions. Provides step-scoped logging and metadata. +/// +public interface IStepContext +{ + /// + /// Logger scoped to this step. Includes step name, operation ID, and attempt + /// number in structured log metadata automatically. + /// + ILogger Logger { get; } + + /// + /// The current retry attempt number (1-based). + /// + int AttemptNumber { get; } + + /// + /// The deterministic operation ID for this step. + /// + string OperationId { get; } +} + +/// +/// A named branch for parallel execution. Named branches appear in execution +/// traces and can be inspected by name in the test runner. +/// +public record DurableBranch(string Name, Func> Func); +``` + +#### CancellationToken behavior + +All methods accept a per-call `CancellationToken` that follows standard .NET semantics: cancellation throws `OperationCanceledException` and the execution fails. Cancellation does **not** trigger suspension — those are separate concepts. + +The durable execution service handles timeout scenarios automatically: if Lambda terminates mid-execution, the next invocation simply replays from the last checkpoint. For advanced users who want to suspend gracefully before timeout, check `context.LambdaContext.RemainingTime` and return early. + +### Configuration Types + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/config.py) | JavaScript: [step](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/types/step.ts) | [batch](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/types/batch.ts) + +```csharp +/// +/// Configuration for step execution. +/// +public class StepConfig +{ + /// + /// Retry strategy for failed steps. Default is no retry. + /// Accepts IRetryStrategy implementations (RetryStrategy.Exponential, etc.) + /// or an inline function via implicit conversion from + /// Func<Exception, int, RetryDecision>. + /// + public IRetryStrategy? RetryStrategy { get; set; } + + /// + /// Execution semantics. Default is AtLeastOncePerRetry. + /// + public StepSemantics Semantics { get; set; } = StepSemantics.AtLeastOncePerRetry; + + // Note: there is no Serializer property here. Custom serializers are + // supplied via the AOT-safe StepAsync(..., ICheckpointSerializer, ...) + // overload, which is type-safe (ICheckpointSerializer instead of the + // non-generic marker) and gives one obvious way to opt into custom or + // AOT-friendly serialization. +} + +public enum StepSemantics +{ + /// + /// Step re-executes on each retry attempt. Safe for idempotent operations. + /// + AtLeastOncePerRetry, + + /// + /// Step executes at most once per retry attempt. Use for side effects. + /// + AtMostOncePerRetry +} + +/// +/// Configuration for callback operations. +/// +public class CallbackConfig +{ + /// + /// Maximum time to wait for callback response. Default (TimeSpan.Zero) means no timeout. + /// + public TimeSpan Timeout { get; set; } = TimeSpan.Zero; + + /// + /// Maximum time between heartbeat signals before timeout. Default (TimeSpan.Zero) means no heartbeat timeout. + /// + public TimeSpan HeartbeatTimeout { get; set; } = TimeSpan.Zero; + + /// + /// Custom serializer for callback result. + /// + public ICheckpointSerializer? Serializer { get; set; } +} + +/// +/// Configuration for wait-for-callback operations. +/// +public class WaitForCallbackConfig : CallbackConfig +{ + /// + /// Retry strategy for the submitter function. + /// + public IRetryStrategy? RetryStrategy { get; set; } +} + +/// +/// Configuration for invoke operations. +/// +public class InvokeConfig +{ + /// + /// Maximum time to wait for the invoked function. Default (TimeSpan.Zero) means no timeout. + /// + public TimeSpan Timeout { get; set; } = TimeSpan.Zero; + + /// + /// Custom serializer for the payload. + /// + public ICheckpointSerializer? PayloadSerializer { get; set; } + + /// + /// Custom serializer for the result. + /// + public ICheckpointSerializer? ResultSerializer { get; set; } +} + +/// +/// Controls how branches are represented in the checkpoint graph. +/// +public enum NestingType +{ + /// + /// Each branch creates a full isolated CONTEXT operation. Higher observability + /// in execution traces but more checkpoint operations (default). + /// + Nested, + + /// + /// Branches use virtual contexts sharing the parent. Reduces checkpoint cost + /// by ~30% at the expense of less granular execution traces. + /// + Flat +} + +/// +/// Configuration for parallel execution. +/// +public class ParallelConfig +{ + /// + /// Maximum concurrent branches. Null = unlimited. + /// + public int? MaxConcurrency { get; set; } + + /// + /// When to consider the operation complete. + /// + public CompletionConfig CompletionConfig { get; set; } = CompletionConfig.AllSuccessful(); + + /// + /// How branches are represented in the checkpoint graph. + /// Nested = full isolated context per branch (default). + /// Flat = virtual contexts sharing parent (~30% fewer checkpoint operations). + /// + public NestingType NestingType { get; set; } = NestingType.Nested; +} + +/// +/// Configuration for map operations. +/// +public class MapConfig +{ + /// + /// Maximum concurrent items. Null = unlimited. + /// + public int? MaxConcurrency { get; set; } + + /// + /// When to consider the operation complete. + /// + public CompletionConfig CompletionConfig { get; set; } = CompletionConfig.AllSuccessful(); + + /// + /// How item branches are represented in the checkpoint graph. + /// + public NestingType NestingType { get; set; } = NestingType.Nested; + + /// + /// Optional batching configuration for grouping items before processing. + /// When set, items are grouped into batches and each batch is processed as a unit. + /// Reduces checkpoint overhead for large collections. + /// + public ItemBatcher? Batcher { get; set; } + + /// + /// Optional function to generate a custom name for each item's branch. + /// Improves observability in execution traces. Receives the item and its index. + /// If null, branches are named by index (e.g., "0", "1", "2"). + /// + public Func? ItemNamer { get; set; } +} + +/// +/// Groups items into batches for map operations to reduce checkpoint overhead. +/// At least one of MaxItemsPerBatch or MaxBytesPerBatch must be set. +/// +public class ItemBatcher +{ + /// + /// Maximum number of items per batch. Null = no count limit. + /// + public int? MaxItemsPerBatch { get; set; } + + /// + /// Maximum serialized size (bytes) per batch. Null = no size limit. + /// + public int? MaxBytesPerBatch { get; set; } +} + +/// +/// Defines completion criteria for parallel/map operations. +/// +public class CompletionConfig +{ + public int? MinSuccessful { get; set; } + public int? ToleratedFailureCount { get; set; } + public double? ToleratedFailurePercentage { get; set; } + + public static CompletionConfig AllSuccessful() => new() { ToleratedFailureCount = 0 }; + public static CompletionConfig FirstSuccessful() => new() { MinSuccessful = 1 }; + public static CompletionConfig AllCompleted() => new(); +} + +/// +/// Configuration for child context operations. +/// +public class ChildContextConfig +{ + /// + /// Custom serializer for the child context's return value. + /// + public ICheckpointSerializer? Serializer { get; set; } + + /// + /// Operation sub-type label for observability (e.g., in test runner output). + /// + public string? SubType { get; set; } + + /// + /// Optional function to transform exceptions from the child context before + /// surfacing them to the parent. Useful for wrapping low-level errors into + /// domain-specific exceptions. + /// + public Func? ErrorMapping { get; set; } +} + +/// +/// Configuration for wait-for-condition (polling). +/// +public class WaitForConditionConfig +{ + /// + /// Initial state passed to the first check invocation. + /// + public required TState InitialState { get; set; } + + /// + /// Strategy controlling how long to wait between checks. + /// + public required IWaitStrategy WaitStrategy { get; set; } +} +``` + +### Result Types + +```csharp +/// +/// Result of a parallel or map operation. +/// +public interface IBatchResult +{ + /// + /// All items (succeeded and failed). + /// + IReadOnlyList> All { get; } + + /// + /// Only successful items. + /// + IReadOnlyList> Succeeded { get; } + + /// + /// Only failed items. + /// + IReadOnlyList> Failed { get; } + + /// + /// Get all successful results. Throws if any failed. + /// + IReadOnlyList GetResults(); + + /// + /// Throw an exception if any item failed. + /// + void ThrowIfError(); + + /// + /// Why the operation completed. + /// + CompletionReason CompletionReason { get; } +} + +public interface IBatchItem +{ + int Index { get; } + BatchItemStatus Status { get; } + T? Result { get; } + DurableExecutionException? Error { get; } +} + +public enum BatchItemStatus { Succeeded, Failed, Cancelled } +public enum CompletionReason { AllCompleted, MinSuccessfulReached, FailureToleranceExceeded } + +/// +/// Represents a pending callback. +/// +public interface ICallback +{ + /// + /// The callback ID to send to external systems. + /// + string CallbackId { get; } + + /// + /// Wait for and return the callback result. + /// Suspends execution until the result is available. + /// + Task GetResultAsync(CancellationToken cancellationToken = default); +} + +/// +/// Metadata about the current execution. +/// +public interface IExecutionContext +{ + /// + /// The ARN of the current durable execution. + /// + string DurableExecutionArn { get; } +} +``` + +### Exception Types + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/exceptions.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/errors/durable-error/durable-error.ts) + +```csharp +/// +/// Base exception for all durable execution errors. +/// +public class DurableExecutionException : Exception { } + +/// +/// Thrown when user code inside a step fails (after retries exhausted). +/// Contains the original error details from the checkpoint. +/// +public class StepException : DurableExecutionException +{ + public string? ErrorType { get; } + public string? ErrorData { get; } + public IReadOnlyList? StackTrace { get; } +} + +/// +/// Thrown when a callback fails or times out. +/// +public class CallbackException : DurableExecutionException +{ + public string? CallbackId { get; } + public bool IsTimeout { get; } +} + +/// +/// Thrown when an invoked function fails. +/// +public class InvokeException : DurableExecutionException +{ + public string? FunctionName { get; } + public string? ErrorType { get; } + public string? ErrorData { get; } +} + +/// +/// Thrown when a child context operation fails. +/// +public class ChildContextException : DurableExecutionException +{ + public string? SubType { get; } +} + +/// +/// Thrown when a wait-for-condition operation exhausts all attempts +/// without the condition being met. +/// +public class WaitForConditionException : DurableExecutionException +{ + public int AttemptsExhausted { get; } +} + +/// +/// Thrown when the operation sequence during replay does not match +/// the previously checkpointed history. Indicates non-deterministic code. +/// +public class NonDeterministicException : DurableExecutionException +{ + public string? ExpectedOperationId { get; } + public string? ActualOperationId { get; } +} + +/// +/// Thrown when a step is interrupted mid-execution (e.g., Lambda timeout or +/// runtime termination). The step did not complete and its result was not +/// checkpointed. On the next invocation, the step will re-execute from scratch. +/// +public class StepInterruptedException : DurableExecutionException +{ + public string? StepName { get; } + public int AttemptNumber { get; } +} + +/// +/// Thrown when checkpoint serialization or deserialization fails. +/// +public class SerializationException : DurableExecutionException { } + +/// +/// Thrown when input validation fails. +/// +public class DurableValidationException : DurableExecutionException { } + +/// +/// Thrown when the checkpoint API call fails. +/// +public class CheckpointException : DurableExecutionException +{ + public bool IsRetriable { get; } +} +``` + +--- + +## Serialization + +> **Implementations:** [Python](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/src/aws_durable_execution_sdk_python/serdes.py) | [JavaScript](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/utils/serdes/serdes.ts) + +### Default behavior + +Step results are serialized to JSON (via `System.Text.Json`) before checkpointing. Your return types need to be JSON-serializable. + +```csharp +// ✅ GOOD: JSON-serializable types +public record OrderResult(string OrderId, decimal Total, bool IsCompleted); + +// ❌ BAD: Non-serializable types +public class BadResult +{ + public Stream DataStream { get; set; } // Not serializable + public HttpClient Client { get; set; } // Not serializable +} +``` + +### Custom Serialization + +Implement `ICheckpointSerializer` for custom serialization: + +```csharp +public interface ICheckpointSerializer +{ + string Serialize(T value, SerializationContext context); + T Deserialize(string data, SerializationContext context); +} + +public record SerializationContext(string OperationId, string DurableExecutionArn); +``` + +Usage — pass the serializer to the AOT-safe `StepAsync` overload directly. +This is the only way to override the default reflection-based JSON path; it's +intentional that there's no `StepConfig.Serializer` knob, so you have one +obvious place to opt in (and the type is `ICheckpointSerializer`, not the +non-generic marker, so the compiler catches a mismatched `T`): + +```csharp +var result = await context.StepAsync( + async () => await GetLargeData(), + new CompressedJsonSerializer(), + name: "get_data"); +``` + +### Class library vs. executable output + +All samples in this doc use the class library pattern (no `Main` method). This is the default for Lambda functions. To turn a durable function project into an executable (required for NativeAOT or custom runtimes): + +**With Annotations** — add the global attribute to auto-generate a `Main` method: +```csharp +[assembly: LambdaGlobalProperties(GenerateMain = true)] +``` + +**Without Annotations** — provide your own `Main` method: +```csharp +public static async Task Main(string[] args) +{ + using var bootstrap = new LambdaBootstrap( + new Function().FunctionHandler, + new DefaultLambdaJsonSerializer()); + await bootstrap.RunAsync(); +} +``` + +Both approaches produce a self-contained executable that the Lambda custom runtime can invoke. + +### NativeAOT compatibility + +The SDK is AOT-friendly but does not require AOT. The default JSON serialization uses reflection (standard `System.Text.Json` behavior), which works in JIT mode. For NativeAOT deployments, AOT safety is addressed at two levels — **at each level there are two overload families: a reflection-based one annotated with `[RequiresUnreferencedCode]` / `[RequiresDynamicCode]` and an AOT-safe one that requires a serializer parameter**. The trimmer warns at the call site when reflection overloads are used in AOT/trimmed builds. + +1. **Entry point (`DurableFunction.WrapAsync`)** — the AOT-safe overload takes a `JsonSerializerContext` parameter that includes type info for your `TInput` and `TOutput` types. + +2. **Step checkpoints (`IDurableContext.StepAsync`)** — the AOT-safe overload takes an `ICheckpointSerializer` directly as a parameter. Internally, the reflection overload constructs `ReflectionJsonCheckpointSerializer` (whose constructor carries `[RequiresUnreferencedCode]`); the AOT-safe overload uses the user-supplied serializer and never touches reflection. The void `StepAsync` overloads are AOT-safe by default — they use a built-in null-only serializer since they have no payload. + +The SDK itself avoids `Activator.CreateInstance`, `Type.GetType()`, and other reflection patterns, and uses `[DynamicallyAccessedMembers]` trimming annotations where needed. + +```csharp +// Default: works with reflection (JIT mode); flagged for AOT. +var result = await context.StepAsync(async (step) => await GetOrder()); + +// AOT mode — entry point: pass JsonSerializerContext to WrapAsync. +[JsonSerializable(typeof(OrderEvent))] +[JsonSerializable(typeof(OrderResult))] +[JsonSerializable(typeof(Order))] +internal partial class MyJsonContext : JsonSerializerContext { } + +public Task FunctionHandler( + DurableExecutionInvocationInput invocationInput, ILambdaContext context) + => DurableFunction.WrapAsync( + MyWorkflow, invocationInput, context, MyJsonContext.Default); + +// AOT mode — step checkpoint: pass ICheckpointSerializer to StepAsync directly. +var result = await context.StepAsync( + async () => await GetOrder(), + new JsonCheckpointSerializer(MyJsonContext.Default.Order), + name: "get_order"); +``` + +### Large payload and checkpoint overflow + +The durable execution service imposes size limits: + +- **256 KB** per individual operation checkpoint +- **6 MB** maximum Lambda response payload + +The SDK handles overflow transparently: + +**Step results exceeding 256 KB:** When a step's serialized result exceeds the checkpoint size limit, the SDK splits the checkpoint into a START operation (before execution) and a separate result checkpoint (after execution). On replay, the SDK fetches the result via the paginated `GetDurableExecutionState` API rather than reading it inline from the operation record. + +**Batch results (map/parallel) exceeding limits:** For large map/parallel operations, the SDK generates a compact summary for the parent operation's checkpoint. The summary includes item count, success/failure counts, and completion reason — but not individual item results. During replay, the SDK sets `ReplayChildren = true` on the state request, which causes the service to return child operation records so full results can be reconstructed. + +**Lambda response exceeding 6 MB:** If the final orchestration result exceeds the response payload limit, the SDK checkpoints the result before returning the `DurableExecutionInvocationOutput`. The service reads the result from the checkpoint rather than from the response body. + +**Guidance for very large results:** For results that are inherently large (multi-MB payloads), use a custom `ICheckpointSerializer` that offloads to external storage (S3, DynamoDB) and returns a reference. This keeps checkpoint sizes small and avoids pagination overhead: + +```csharp +public class S3BackedSerializer : ICheckpointSerializer +{ + public string Serialize(T value, SerializationContext context) + { + var key = $"results/{context.DurableExecutionArn}/{context.OperationId}"; + // Upload to S3, return the key as the checkpoint value + _s3Client.PutObject(new PutObjectRequest { BucketName = _bucket, Key = key, ... }); + return key; + } + + public T Deserialize(string data, SerializationContext context) + { + // Download from S3 using the stored key + var response = _s3Client.GetObject(new GetObjectRequest { BucketName = _bucket, Key = data }); + return JsonSerializer.Deserialize(response.ResponseStream); + } +} +``` + +--- + +## Integration with Existing Libraries + +### Amazon.Lambda.Core + +The SDK uses existing Lambda core interfaces: +- `ILambdaContext` -- available via `context.LambdaContext` +- `ILambdaSerializer` -- used for event deserialization + +### Amazon.Lambda.RuntimeSupport + +The durable execution handler integrates with the existing runtime support bootstrap: + +```csharp +// The [DurableExecution] attribute signals that the handler +// receives DurableExecutionInvocationInput and returns DurableExecutionInvocationOutput +// The SDK handles the translation to/from the user's handler signature +``` + +### Amazon.Lambda.Annotations (optional) + +`Amazon.Lambda.Annotations` is an **optional** dependency. Users can write durable functions without it (see [Manual Handler](#manual-handler-without-annotations) above), but adding Annotations to the project reduces boilerplate significantly. + +When both packages are referenced, the Annotations source generator detects `[DurableExecution]` by fully-qualified name and at compile time: + +1. Generates a handler wrapper that translates `DurableExecutionInvocationInput` to/from your types +2. Manages context lifecycle (creation, checkpoint batching, cleanup) +3. Adds `DurableConfig` to the CloudFormation template +4. Adds the `AWSLambdaBasicDurableExecutionRolePolicy` managed policy + +```csharp +public class Functions +{ + [LambdaFunction] + [DurableExecution(ExecutionTimeout = 3600, RetentionPeriodInDays = 7)] + public async Task ProcessOrder( + [FromBody] OrderRequest request, + IDurableContext context) + { + var validated = await context.StepAsync( + async (step) => await Validate(request), + name: "validate"); + // ... + } +} +``` + +#### Custom Lambda Client + +For VPC endpoints, custom retry policies, or testing with mocked clients, inject a custom `IAmazonLambda` client via the `[DurableExecution]` attribute: + +```csharp +public class Functions +{ + private readonly IAmazonLambda _lambdaClient; + + public Functions(IAmazonLambda lambdaClient) + { + _lambdaClient = lambdaClient; + } + + [LambdaFunction] + [DurableExecution(LambdaClientFactory = nameof(_lambdaClient))] + public async Task ProcessOrder( + [FromBody] OrderRequest request, + IDurableContext context) + { + // ... + } +} +``` + +When no `LambdaClientFactory` is specified, the generated code creates a default `AmazonLambdaClient`. For the manual handler path (`DurableFunction.WrapAsync`), pass the client directly via the `IAmazonLambda lambdaClient` overload. + +> **Dependency boundaries:** `Amazon.Lambda.Annotations` has **no dependency** on the AWS SDK or on `Amazon.Lambda.DurableExecution`. The Annotations source generator references durable execution types by fully-qualified name strings only — it never takes a compile-time dependency on the durable package. The `[DurableExecution]` attribute is defined in `Amazon.Lambda.DurableExecution`, and the generated code resolves against the user's project references. There is only one source generator (Annotations) — no coordination between multiple generators is needed. + +### AWSSDK.Lambda + +The `Amazon.Lambda.DurableExecution` package depends on the AWS SDK for .NET Lambda client to make checkpoint API calls. This dependency is confined to the durable execution package — `Amazon.Lambda.Annotations` does not depend on the AWS SDK. + + +- `CheckpointDurableExecutionAsync` +- `GetDurableExecutionStateAsync` + +--- + +## Testing (customer-facing package) + +> **Implementations:** [JavaScript (local runner)](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-testing/src/test-runner/local/local-durable-test-runner.ts) | [JavaScript (cloud runner)](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-testing/src/test-runner/cloud/cloud-durable-test-runner.ts) + +We ship a separate NuGet package (`Amazon.Lambda.DurableExecution.Testing`) that lets developers test their durable functions locally without deploying to AWS. + +**Why this needs to exist:** A durable function requires multiple Lambda invocations to complete (invoke → PENDING → wait → re-invoke → SUCCEEDED). You can't test that with a normal unit test because there's no Lambda service orchestrating the re-invocations. The test runner simulates this loop in-process: it calls your handler, gets PENDING, marks waits as elapsed, calls your handler again with the prior checkpoint state, and repeats until the workflow completes. + +```csharp +var runner = new DurableTestRunner( + handler: new Function().Handler, + options: new TestRunnerOptions + { + SkipTime = true, // Waits complete instantly (no real delays) + MaxInvocations = 10 // Safety limit to prevent infinite loops + }); + +var result = await runner.RunAsync( + input: new OrderEvent { OrderId = "order-123" }, + timeout: TimeSpan.FromSeconds(30)); + +Assert.Equal(InvocationStatus.Succeeded, result.Status); +Assert.Equal("approved", result.Result.Status); + +// Inspect individual steps +var validateStep = result.GetStep("validate_order"); +Assert.True(validateStep.GetResult().IsValid); +``` + +The Python and JS SDKs both ship equivalent test runner packages. + +### Cloud Test Runner + +For integration testing against deployed functions, the testing package also ships a `CloudDurableTestRunner` with the same API as the local runner. This lets developers run the exact same assertions against a real Lambda function: + +```csharp +var runner = new CloudDurableTestRunner( + functionArn: "arn:aws:lambda:us-east-1:123456789012:function:process-order:$LATEST"); + +var result = await runner.RunAsync( + input: new OrderEvent { OrderId = "order-123" }, + timeout: TimeSpan.FromSeconds(60)); + +Assert.Equal(InvocationStatus.Succeeded, result.Status); +var validateStep = result.GetStep("validate_order"); +Assert.True(validateStep.GetResult().IsValid); +``` + +The cloud runner invokes the deployed function and polls `GetDurableExecutionState` until the execution reaches a terminal state, then reconstructs the same `TestResult` structure as the local runner. + +### Function Registration for Invoke Testing + +To test workflows that use `InvokeAsync` without deploying, register sibling functions with the local test runner: + +```csharp +var paymentHandler = new PaymentFunction().Handler; + +var runner = new DurableTestRunner( + handler: new OrderFunction().Handler, + options: new TestRunnerOptions { SkipTime = true }); + +runner.RegisterFunction("process-payment", paymentHandler); +runner.RegisterFunction( + "arn:aws:lambda:us-east-1:123:function:process-payment:$LATEST", + paymentHandler); + +var result = await runner.RunAsync(input: new OrderEvent { OrderId = "123" }); +``` + +When the workflow calls `context.InvokeAsync("process-payment", payload)`, the test runner routes to the registered handler instead of making an AWS API call. + +--- + +## Local development (Test Tool v2 and Aspire) + +The Lambda Test Tool v2 and the Aspire Lambda integration currently emulate single-invocation Lambda functions. Durable functions require a multi-invocation loop that neither tool supports today. To add support, the local emulator needs three things: + +### Checkpoint API endpoints + +The SDK calls these during execution. The emulator would serve them locally with in-memory storage: + +- `POST /checkpoint-durable-execution` -- store step results, wait records +- `GET /durable-execution-state` -- return accumulated state for replay + +### An orchestration loop + +When the function returns `PENDING`, the emulator needs to: +- Parse the checkpoint to determine what's pending (timer, callback, retry) +- Wait for that condition (or skip it in fast mode) +- Re-invoke the function with the accumulated `DurableExecutionInvocationInput` +- Repeat until `SUCCEEDED` or `FAILED` + +### Callback delivery + +An endpoint that external tools (or the developer via the UI) can call to deliver callback results: + +- `POST /send-durable-execution-callback-success` +- This triggers a re-invocation of the waiting execution + +### How this relates to the testing SDK + +The `DurableTestRunner` in the testing package implements the same orchestration loop programmatically. The test tool / Aspire enhancement would reuse this engine and wrap it in a web UI or Aspire dashboard, giving developers a visual way to see execution state, deliver callbacks manually, skip timers, and inspect checkpoint history. + +### Priority + +This is post-v1 work. For the initial release, developers test durable functions using the programmatic `DurableTestRunner` or by deploying to AWS. Test tool and Aspire support are a fast-follow once the core SDK is stable. + +--- + +## Requirements & Constraints + +- **Target framework:** `net8.0` only. .NET 6 is EOL and not supported. Durable functions are a new feature — adopters will be on the latest managed runtime. Targeting .NET 8 gives access to `required` properties, improved `System.Text.Json` source generation, and better NativeAOT support. +- **Lambda runtime:** Requires the managed .NET 8 runtime or a custom runtime (`provided.al2023`) for NativeAOT deployments. +- **Durable execution service:** The function must be configured with `DurableConfig` (handled automatically by the `[DurableExecution]` source generator). +- **Qualified function identifiers:** `InvokeAsync` requires a version number, alias, or `$LATEST` — unqualified ARNs are not supported for durable invocations. +- **Serializable results:** All step return types must be JSON-serializable (or use a custom `ICheckpointSerializer`). + +--- + +## Package Structure + +### Amazon.Lambda.DurableExecution (Runtime) + +The core SDK that runs in Lambda. Minimal dependencies. + +**Dependencies:** +- `Amazon.Lambda.Core` (existing) +- `AWSSDK.Lambda` (for checkpoint/state APIs) +- `Microsoft.Extensions.Logging.Abstractions` (for ILogger) + +### Amazon.Lambda.DurableExecution.Testing (Dev-only) + +Test runner and helpers for local/cloud testing. + +**Dependencies:** +- `Amazon.Lambda.DurableExecution` +- `Amazon.Lambda.TestUtilities` (existing) + +### Blueprints (`dotnet new` Templates) + +New `dotnet new` templates ship as part of the existing `Amazon.Lambda.Templates` NuGet package (same as all other Lambda blueprints in this repo under `Blueprints/BlueprintDefinitions/`). + +**Templates to ship:** + +| Template short name | Description | +|---------------------|-------------| +| `lambda.DurableFunction` | Minimal durable function with a single step and wait. Includes test project with `DurableTestRunner`. | +| `lambda.DurableFunction.Agentic` | GenAI agentic loop pattern (invoke model → check tool call → execute tool → repeat). | +| `lambda.DurableFunction.HumanInTheLoop` | Callback-based human approval workflow. | + +Each template includes: +- `.csproj` with correct NuGet references (`Amazon.Lambda.DurableExecution`, `Amazon.Lambda.Annotations`) +- Handler class with `[LambdaFunction]` + `[DurableExecution]` attributes +- `serverless.template` (auto-generated by source generator on build) +- Test project with `DurableTestRunner` and a passing test +- `aws-lambda-tools-defaults.json` for deployment via `dotnet lambda deploy-function` + +Running `dotnet new lambda.DurableFunction` should produce a buildable, testable, deployable project in under 30 seconds. + +--- + +## Implementation plan + +| Workstream | Scope | Estimate | +|------------|-------|----------| +| **Durable execution runtime** | Core SDK: replay engine, all context operations (step, wait, callback, invoke, parallel, map), checkpoint batching, retry, logging | ~5-6 weeks | +| **Annotations / source generator** | `[DurableExecution]` attribute, handler wrapper codegen, CloudFormation DurableConfig + IAM policy generation | ~2 weeks | +| **Testing SDK** | Local test runner (in-memory, time-skipping), cloud test runner, step inspection API | ~1.5 weeks | +| **Blueprints, docs, examples** | `dotnet new` project templates, developer guide, API reference, sample projects | ~2 weeks | +| **Roslyn analyzers** (P1 follow-up) | Static analysis detecting non-determinism, nesting violations, closure mutations | ~2 weeks | + +**Total: ~10-11 weeks (1 engineer familiar with the Python/JS SDKs)** + Roslyn analyzers as follow-up + +### Roslyn Analyzers (P1 Follow-up) + +> **Reference implementation:** JavaScript ESLint plugin — [no-non-deterministic-outside-step](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-eslint-plugin/src/rules/no-non-deterministic-outside-step/no-non-deterministic-outside-step.ts) | [no-nested-durable-operations](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-eslint-plugin/src/rules/no-nested-durable-operations/no-nested-durable-operations.ts) | [no-closure-in-durable-operations](https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-eslint-plugin/src/rules/no-closure-in-durable-operations/no-closure-in-durable-operations.ts) + +Ship as a separate NuGet package: `Amazon.Lambda.DurableExecution.Analyzers` + +The JavaScript SDK ships an ESLint plugin (`@aws/durable-execution-sdk-js-eslint-plugin`) with three rules that catch the most common durable execution mistakes at author time. The .NET equivalent uses Roslyn diagnostic analyzers: + +| Diagnostic ID | Severity | Rule | Rationale | +|---------------|----------|------|-----------| +| DE001 | Warning | `DateTime.Now`, `DateTime.UtcNow`, `Guid.NewGuid()`, `Random.Next()`, `Random.Shared`, `Environment.TickCount` used outside a `StepAsync` body | Non-deterministic values produce different results on replay, breaking checkpoint consistency | +| DE002 | Error | Calling `context.StepAsync`, `WaitAsync`, `ParallelAsync`, `MapAsync`, `InvokeAsync`, `RunInChildContextAsync`, `CreateCallbackAsync`, or `WaitForCallbackAsync` inside a `StepAsync` lambda | Steps are leaf operations — nesting durable operations inside a step produces unpredictable behavior | +| DE003 | Warning | Mutable variable captured by a `StepAsync` lambda and written to inside the lambda body | On replay the step returns cached result without executing, so the write never happens — the outer variable has stale state | +| DE004 | Info | `Task.WhenAll` or `Task.WhenAny` called with tasks returned by durable context methods | Suggest using `ParallelAsync` for completion policies, nesting control, and observability | + +These analyzers run at compile time in the IDE (IntelliSense squiggles) and during `dotnet build`, preventing the most confusing class of runtime failures. + +--- + +## Cross-SDK API comparison + +All four SDKs expose the same core operations. The differences are naming conventions, parameter ordering, and concurrency model. + +| Operation | .NET | Python | JavaScript | Java | +|-----------|------|--------|------------|------| +| Step | `context.StepAsync(func, name?, config?)` | `context.step(func, name?, config?)` | `context.step(name?, fn, config?)` → `DurablePromise` | `context.step(name, type, func, config?)` (blocking) / `context.stepAsync(...)` → `DurableFuture` | +| Wait | `context.WaitAsync(duration, name?)` | `context.wait(duration, name?)` | `context.wait(name?, duration)` → `DurablePromise` | +| Create callback | `context.CreateCallbackAsync(name?, config?)` | `context.create_callback(name?, config?)` | `context.createCallback(name?, config?)` | +| Wait for callback | `context.WaitForCallbackAsync(submitter, name?, config?)` | `context.wait_for_callback(submitter, name?, config?)` | `context.waitForCallback(name?, submitter, config?)` | +| Invoke | `context.InvokeAsync(funcName, payload, name?, config?)` | `context.invoke(func_name, payload, name?, config?)` | `context.invoke(name?, funcId, input, config?)` → `DurablePromise` | +| Parallel | `context.ParallelAsync(functions, name?, config?)` | `context.parallel(functions, name?, config?)` | `context.parallel(name?, branches, config?)` | +| Map | `context.MapAsync(items, func, name?, config?)` | `context.map(inputs, func, name?, config?)` | `context.map(name?, items, mapFunc, config?)` | +| Child context | `context.RunInChildContextAsync(func, name?, config?)` | `context.run_in_child_context(func, name?, config?)` | `context.runInChildContext(name?, fn, config?)` | +| Wait for condition | `context.WaitForConditionAsync(check, config, name?)` | `context.wait_for_condition(check, config, name?)` | `context.waitForCondition(name?, checkFunc, config?)` | +| Logger | `context.Logger` (ILogger) | `context.logger` (Logger) | `context.logger` (DurableContextLogger) | +| Lambda context | `context.LambdaContext` | `context.lambda_context` | `context.lambdaContext` | +| Execution context | `context.ExecutionContext` | `context.execution_context` | *(via logger metadata)* | +| Promise combinators | `CompletionConfig` on `ParallelAsync` | `CompletionConfig` on `parallel`/`map` | `context.promise.all/allSettled/any/race` | +| Configure logger | `context.ConfigureLogger(config)` | `context.set_logger(logger)` | `context.configureLogger(config)` | +| Cancellation | `CancellationToken` on all methods | *(N/A)* | *(N/A)* | +| Jitter strategy | `JitterStrategy` enum on `Exponential()` | `jitter_strategy` on `RetryStrategyConfig` | `jitter` on `createRetryStrategy()` | +| Retry presets | `RetryStrategy.None/Default/Transient` | `RetryPresets.none()/default()/transient()` | `retryPresets.default/linear/noRetry` | +| Nesting type | `NestingType` on `ParallelConfig`/`MapConfig` | `NestingType` on parallel/map config | `NestingType` on parallel/map config | +| Item batching | `ItemBatcher` on `MapConfig` | `ItemBatcher` on `MapConfig` | *(checkpoint manager handles batching)* | +| Item namer | `ItemNamer` on `MapConfig` | Item naming function on `MapConfig` | `itemNamer` on `MapConfig` | +| Error mapping | `ErrorMapping` on `ChildContextConfig` | *(typed exception wrapping)* | `errorMapping` on child context config | +| Message-based retry filter | `retryableMessagePatterns` (regex) | `retryable_errors` (regex) | `retryableErrors` (RegExp[]) | +| Step context / scoped logger | `IStepContext` with `Logger`, `AttemptNumber` | `StepContext` with `logger` | `ctx` with `logger` in step callback | +| Named parallel branches | `DurableBranch(name, func)` | Function `__name__` | `{ name, func }` objects | +| Inline retry lambda | `Func` | `Callable[[Exception, int], RetryDecision]` | `(error, attempt) => RetryDecision` | +| Static analysis | Roslyn analyzers (P1 follow-up) | *(N/A)* | ESLint plugin (3 rules) | +| Cloud test runner | `CloudDurableTestRunner` | `pytest --runner-mode=cloud` | `CloudDurableTestRunner` | + +**Key differences:** + +- **Concurrency model:** JS returns `DurablePromise` (lazy, deferred until awaited). Python is synchronous (blocks the thread). Java exposes both `step` (blocking) and `stepAsync` (returns `DurableFuture`). .NET returns `Task` (standard async/await). Note: `Task.WhenAll` works with durable operations but `ParallelAsync`/`MapAsync` are preferred for completion policies and observability. +- **Why .NET ships only the async form:** Java's two-API split exists because Java has no language-level `await` — `step` is the simple blocking ergonomic, `stepAsync` is the composable form. In .NET, `Task` is *already* both: `await context.StepAsync(...)` reads as sequential code, and `Task.WhenAll(...)` composes concurrently. A `Step` (blocking, returns `T`) overload would do nothing except call `.GetAwaiter().GetResult()` on the async version, which is also a Lambda-thread anti-pattern (deadlock-prone, blocks a thread the runtime needs). So .NET intentionally has one shape — `*Async` — matching the rest of `IAmazonLambda` and the broader .NET async convention. Python is single-shape for the same reason in reverse: no async runtime in scope, so blocking is the only ergonomic shape. +- **Step function signature:** Python and JS only expose `Func` — the user always receives a step context. Java has both `Function` and `Supplier` overloads, but the `Supplier` ones are deprecated (*"use the variants accepting StepContext instead"*). .NET follows Python/JS: `IStepContext` is always passed. +- **Name parameter position:** JS puts `name` first; Python, Java, and .NET put it after the function/duration. +- **Parallel semantics in JS:** JS uses `context.promise.all/any/race/allSettled` to combine DurablePromises. .NET, Python, and Java use `CompletionConfig` on the `Parallel`/`Map` operations instead. +- **.NET-only:** `CancellationToken` on every method (standard .NET pattern). +- **Jitter default:** All four SDKs default to full jitter on retry strategies. + +--- + +## Common Patterns + +### GenAI Agentic Loop + +```csharp +[DurableExecution] +public async Task AgentHandler(AgentRequest input, IDurableContext context) +{ + var messages = new List + { + new Message { Role = "user", Content = input.Prompt } + }; + + while (true) + { + var response = await context.StepAsync( + async (step) => await InvokeModel(messages), + name: "invoke_model"); + + if (response.ToolCall == null) + return response.Content; + + var toolResult = await context.StepAsync( + async (step) => await ExecuteTool(response.ToolCall), + name: $"tool_{response.ToolCall.Name}"); + + messages.Add(new Message { Role = "assistant", Content = toolResult }); + } +} +``` + +### Human-in-the-Loop + +```csharp +[DurableExecution] +public async Task ReviewHandler(ReviewRequest input, IDurableContext context) +{ + var analysis = await context.StepAsync( + async (step) => await AnalyzeDocument(input.DocumentUrl), + name: "analyze_document"); + + context.Logger.LogInformation("Analysis complete, requesting human review"); + + var review = await context.WaitForCallbackAsync( + async (callbackId, ctx) => + { + await NotifyReviewer(input.ReviewerEmail, callbackId, analysis); + }, + name: "human_review", + config: new WaitForCallbackConfig + { + Timeout = TimeSpan.FromDays(7), + HeartbeatTimeout = TimeSpan.FromHours(24) + }); + + if (review.Approved) + { + await context.StepAsync( + async (step) => await PublishDocument(input.DocumentUrl), + name: "publish"); + } + + return new ReviewResult { Status = review.Approved ? "published" : "rejected" }; +} +``` + +### Scheduled Pipeline with Retries + +```csharp +[DurableExecution] +public async Task DataPipeline(PipelineInput input, IDurableContext context) +{ + // Extract + var rawData = await context.StepAsync( + async (step) => await ExtractFromSource(input.SourceId), + name: "extract", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential(maxAttempts: 5, initialDelay: TimeSpan.FromSeconds(2)) + }); + + // Transform (fan-out) + var transformed = await context.MapAsync( + items: rawData.Chunks, + func: async (ctx, chunk, index, _) => + { + return await ctx.StepAsync( + async (step) => await TransformChunk(chunk), + name: $"transform_{index}"); + }, + name: "transform_all", + config: new MapConfig { MaxConcurrency = 10 }); + + transformed.ThrowIfError(); + + // Load + var loadResult = await context.StepAsync( + async (step) => await LoadToDestination(transformed.GetResults()), + name: "load", + config: new StepConfig + { + Semantics = StepSemantics.AtMostOncePerRetry + }); + + // Wait before next run + await context.WaitAsync(TimeSpan.FromHours(1), name: "schedule_delay"); + + return new PipelineResult { RecordsProcessed = loadResult.Count }; +} +``` + +--- + +## References + +- [AWS Blog: Build multi-step applications and AI workflows with AWS Lambda durable functions](https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/) +- [AWS Documentation: Lambda Durable Functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html) +- [Python SDK Repository](https://github.com/aws/aws-durable-execution-sdk-python) +- [JavaScript/TypeScript SDK Repository](https://github.com/aws/aws-durable-execution-sdk-js) +- [GitHub Issue #2216: .NET Durable Functions Support](https://github.com/aws/aws-lambda-dotnet/issues/2216) +- [Existing .NET Annotations Design Doc](lambda-annotations-design.md) diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Amazon.Lambda.DurableExecution.csproj b/Libraries/src/Amazon.Lambda.DurableExecution/Amazon.Lambda.DurableExecution.csproj new file mode 100644 index 000000000..de02d8ce2 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Amazon.Lambda.DurableExecution.csproj @@ -0,0 +1,36 @@ + + + + + + $(DefaultPackageTargets) + Amazon Lambda .NET SDK for Durable Execution - write multi-step workflows that persist state automatically. + Amazon.Lambda.DurableExecution + 0.1.0 + Amazon.Lambda.DurableExecution + Amazon.Lambda.DurableExecution + AWS;Amazon;Lambda;Durable;Workflow + true + true + enable + enable + true + IL2026,IL2067,IL2075,IL3050 + + + + + <_Parameter1>Amazon.Lambda.DurableExecution.Tests, PublicKey="0024000004800000940000000602000000240000525341310004000001000100db5f59f098d27276c7833875a6263a3cc74ab17ba9a9df0b52aedbe7252745db7274d5271fd79c1f08f668ecfa8eaab5626fa76adc811d3c8fc55859b0d09d3bc0a84eecd0ba891f2b8a2fc55141cdcc37c2053d53491e650a479967c3622762977900eddbf1252ed08a2413f00a28f3a0752a81203f03ccb7f684db373518b4" + + + + + + + + + + + + + diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Config/IRetryStrategy.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Config/IRetryStrategy.cs new file mode 100644 index 000000000..f291bed1e --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Config/IRetryStrategy.cs @@ -0,0 +1,39 @@ +namespace Amazon.Lambda.DurableExecution; + +/// +/// Determines whether a failed step should be retried and with what delay. +/// +public interface IRetryStrategy +{ + /// + /// Evaluates whether the given exception warrants a retry. + /// + /// The exception that caused the step to fail. + /// The 1-based attempt number that just failed. + /// A decision indicating whether to retry and the delay before the next attempt. + RetryDecision ShouldRetry(Exception exception, int attemptNumber); +} + +/// +/// The outcome of a retry evaluation. +/// +public readonly struct RetryDecision +{ + /// Whether the step should be retried. + public bool ShouldRetry { get; } + + /// The delay before the next retry attempt. + public TimeSpan Delay { get; } + + private RetryDecision(bool shouldRetry, TimeSpan delay) + { + ShouldRetry = shouldRetry; + Delay = delay; + } + + /// Indicates the step should not be retried. + public static RetryDecision DoNotRetry() => new(false, TimeSpan.Zero); + + /// Indicates the step should be retried after the specified delay. + public static RetryDecision RetryAfter(TimeSpan delay) => new(true, delay); +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Config/RetryStrategy.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Config/RetryStrategy.cs new file mode 100644 index 000000000..b8688ca0c --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Config/RetryStrategy.cs @@ -0,0 +1,185 @@ +using System.Text.RegularExpressions; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// Jitter strategy for exponential backoff to prevent thundering-herd scenarios. +/// +public enum JitterStrategy +{ + /// No randomization — delay is exactly the calculated backoff value. + None, + /// Random delay between 0 and the calculated backoff value (recommended). + Full, + /// Random delay between 50% and 100% of the calculated backoff value. + Half +} + +/// +/// Controls whether a step re-executes if the Lambda is re-invoked mid-attempt. +/// +public enum StepSemantics +{ + /// + /// Default. The step may re-execute if the Lambda is re-invoked during execution. + /// Use for idempotent operations. + /// + AtLeastOncePerRetry, + + /// + /// The step executes at most once per retry attempt. A START checkpoint is written + /// before execution; on replay with an existing START, the SDK skips re-execution + /// and proceeds to the retry handler. + /// + AtMostOncePerRetry +} + +/// +/// Factory methods for common retry strategies. +/// +public static class RetryStrategy +{ + /// 6 attempts, 2x backoff, 5s initial delay, 60s max, Full jitter. + public static IRetryStrategy Default { get; } = Exponential( + maxAttempts: 6, + initialDelay: TimeSpan.FromSeconds(5), + maxDelay: TimeSpan.FromSeconds(60), + backoffRate: 2.0, + jitter: JitterStrategy.Full); + + /// 3 attempts, 2x backoff, 1s initial delay, 5s max, Half jitter. + public static IRetryStrategy Transient { get; } = Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromSeconds(1), + maxDelay: TimeSpan.FromSeconds(5), + backoffRate: 2.0, + jitter: JitterStrategy.Half); + + /// No retry — 1 attempt only. + public static IRetryStrategy None { get; } = Exponential(maxAttempts: 1); + + /// + /// Creates an exponential backoff retry strategy. + /// + public static IRetryStrategy Exponential( + int maxAttempts = 3, + TimeSpan? initialDelay = null, + TimeSpan? maxDelay = null, + double backoffRate = 2.0, + JitterStrategy jitter = JitterStrategy.Full, + Type[]? retryableExceptions = null, + string[]? retryableMessagePatterns = null) + { + return new ExponentialRetryStrategy( + maxAttempts, + initialDelay ?? TimeSpan.FromSeconds(5), + maxDelay ?? TimeSpan.FromSeconds(300), + backoffRate, + jitter, + retryableExceptions, + retryableMessagePatterns); + } + + /// + /// Creates a retry strategy from a delegate. + /// + public static IRetryStrategy FromDelegate(Func strategy) + => new DelegateRetryStrategy(strategy); +} + +internal sealed class ExponentialRetryStrategy : IRetryStrategy +{ + private readonly int _maxAttempts; + private readonly TimeSpan _initialDelay; + private readonly TimeSpan _maxDelay; + private readonly double _backoffRate; + private readonly JitterStrategy _jitter; + private readonly Type[]? _retryableExceptions; + private readonly Regex[]? _retryableMessagePatterns; + + [ThreadStatic] + private static Random? t_random; + private static Random Random => t_random ??= new Random(); + + public ExponentialRetryStrategy( + int maxAttempts, + TimeSpan initialDelay, + TimeSpan maxDelay, + double backoffRate, + JitterStrategy jitter, + Type[]? retryableExceptions, + string[]? retryableMessagePatterns) + { + _maxAttempts = maxAttempts; + _initialDelay = initialDelay; + _maxDelay = maxDelay; + _backoffRate = backoffRate; + _jitter = jitter; + _retryableExceptions = retryableExceptions; + _retryableMessagePatterns = retryableMessagePatterns? + .Select(p => new Regex(p, RegexOptions.Compiled)) + .ToArray(); + } + + public RetryDecision ShouldRetry(Exception exception, int attemptNumber) + { + if (attemptNumber >= _maxAttempts) + return RetryDecision.DoNotRetry(); + + if (!IsRetryable(exception)) + return RetryDecision.DoNotRetry(); + + var delay = CalculateDelay(attemptNumber); + return RetryDecision.RetryAfter(delay); + } + + private bool IsRetryable(Exception exception) + { + if (_retryableExceptions == null && _retryableMessagePatterns == null) + return true; + + if (_retryableExceptions != null) + { + var exType = exception.GetType(); + if (_retryableExceptions.Any(t => t.IsAssignableFrom(exType))) + return true; + } + + if (_retryableMessagePatterns != null) + { + var message = exception.Message; + if (_retryableMessagePatterns.Any(p => p.IsMatch(message))) + return true; + } + + return false; + } + + internal TimeSpan CalculateDelay(int attemptNumber) + { + var baseDelay = _initialDelay.TotalSeconds * Math.Pow(_backoffRate, attemptNumber - 1); + var cappedDelay = Math.Min(baseDelay, _maxDelay.TotalSeconds); + + var finalDelay = _jitter switch + { + JitterStrategy.Full => Random.NextDouble() * cappedDelay, + JitterStrategy.Half => cappedDelay * (0.5 + 0.5 * Random.NextDouble()), + _ => cappedDelay + }; + + return TimeSpan.FromSeconds(Math.Max(1, Math.Ceiling(finalDelay))); + } +} + +internal sealed class DelegateRetryStrategy : IRetryStrategy +{ + private readonly Func _strategy; + + public DelegateRetryStrategy(Func strategy) + { + _strategy = strategy; + } + + public RetryDecision ShouldRetry(Exception exception, int attemptNumber) + => _strategy(exception, attemptNumber); +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Config/StepConfig.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Config/StepConfig.cs new file mode 100644 index 000000000..362867c09 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Config/StepConfig.cs @@ -0,0 +1,18 @@ +namespace Amazon.Lambda.DurableExecution; + +/// +/// Configuration for step execution. +/// +public sealed class StepConfig +{ + /// + /// Retry strategy for failed steps. When null (default), failures are not retried. + /// + public IRetryStrategy? RetryStrategy { get; set; } + + /// + /// Controls whether a step may re-execute if the Lambda is re-invoked mid-attempt. + /// Default is . + /// + public StepSemantics Semantics { get; set; } = StepSemantics.AtLeastOncePerRetry; +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs b/Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs new file mode 100644 index 000000000..87a874c2d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs @@ -0,0 +1,147 @@ +using System.Diagnostics.CodeAnalysis; +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution.Internal; +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging.Abstractions; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// Implementation of . Constructs and dispatches +/// per-operation classes (, ); +/// the replay logic lives in those classes. +/// +internal sealed class DurableContext : IDurableContext +{ + private readonly ExecutionState _state; + private readonly TerminationManager _terminationManager; + private readonly OperationIdGenerator _idGenerator; + private readonly string _durableExecutionArn; + private readonly CheckpointBatcher? _batcher; + + public DurableContext( + ExecutionState state, + TerminationManager terminationManager, + OperationIdGenerator idGenerator, + string durableExecutionArn, + ILambdaContext lambdaContext, + CheckpointBatcher? batcher = null) + { + _state = state; + _terminationManager = terminationManager; + _idGenerator = idGenerator; + _durableExecutionArn = durableExecutionArn; + _batcher = batcher; + LambdaContext = lambdaContext; + } + + // Replay-safe logger ships in a follow-up PR; see IDurableContext.Logger doc. + public ILogger Logger => NullLogger.Instance; + public IExecutionContext ExecutionContext => new DurableExecutionContext(_durableExecutionArn); + public ILambdaContext LambdaContext { get; } + + [RequiresUnreferencedCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + [RequiresDynamicCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + public Task StepAsync( + Func> func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default) + => RunStep(func, new ReflectionJsonCheckpointSerializer(), name, config, cancellationToken); + + public async Task StepAsync( + Func func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default) + { + // Void steps don't carry a meaningful payload; we wrap with a null-only + // serializer that doesn't touch reflection. + await RunStep( + async (ctx) => { await func(ctx); return null; }, + NullCheckpointSerializer.Instance, + name, config, cancellationToken); + } + + public Task StepAsync( + Func> func, + ICheckpointSerializer serializer, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default) + => RunStep(func, serializer, name, config, cancellationToken); + + + private Task RunStep( + Func> func, + ICheckpointSerializer serializer, + string? name, + StepConfig? config, + CancellationToken cancellationToken) + { + var operationId = _idGenerator.NextId(); + var op = new StepOperation( + operationId, name, func, config, serializer, Logger, + _state, _terminationManager, _durableExecutionArn, _batcher); + return op.ExecuteAsync(cancellationToken); + } + + public Task WaitAsync( + TimeSpan duration, + string? name = null, + CancellationToken cancellationToken = default) + { + // Service timer granularity is 1 second; sub-second waits would round to 0. + // WaitOptions.WaitSeconds is integer in [1, 31_622_400] (1 second to ~1 year). + if (duration < TimeSpan.FromSeconds(1)) + throw new ArgumentOutOfRangeException(nameof(duration), duration, "Wait duration must be at least 1 second."); + + if (duration > TimeSpan.FromSeconds(31_622_400)) + throw new ArgumentOutOfRangeException(nameof(duration), duration, "Wait duration must be at most 31,622,400 seconds (~1 year)."); + + cancellationToken.ThrowIfCancellationRequested(); + + var operationId = _idGenerator.NextId(); + var waitSeconds = (int)Math.Max(1, Math.Ceiling(duration.TotalSeconds)); + var op = new WaitOperation( + operationId, name, waitSeconds, + _state, _terminationManager, _durableExecutionArn, _batcher); + return op.ExecuteAsync(cancellationToken); + } +} + +/// +/// Trim-safe serializer used by the void StepAsync overloads, which never +/// carry a meaningful payload. Always serializes to "null" and discards +/// on deserialize. +/// +internal sealed class NullCheckpointSerializer : ICheckpointSerializer +{ + public static NullCheckpointSerializer Instance { get; } = new(); + public string Serialize(object? value, SerializationContext context) => "null"; + public object? Deserialize(string data, SerializationContext context) => null; +} + +internal sealed class DurableExecutionContext : IExecutionContext +{ + public DurableExecutionContext(string durableExecutionArn) + { + DurableExecutionArn = durableExecutionArn; + } + + public string DurableExecutionArn { get; } +} + +internal sealed class StepContext : IStepContext +{ + public StepContext(string operationId, int attemptNumber, ILogger logger) + { + OperationId = operationId; + AttemptNumber = attemptNumber; + Logger = logger; + } + + public ILogger Logger { get; } + public int AttemptNumber { get; } + public string OperationId { get; } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/DurableExecutionHandler.cs b/Libraries/src/Amazon.Lambda.DurableExecution/DurableExecutionHandler.cs new file mode 100644 index 000000000..300cc8654 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/DurableExecutionHandler.cs @@ -0,0 +1,119 @@ +using Amazon.Lambda.DurableExecution.Internal; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// The result of running a durable execution handler. +/// +internal sealed class HandlerResult +{ + public required InvocationStatus Status { get; init; } + public TResult? Result { get; init; } + public string? Message { get; init; } + public Exception? Exception { get; init; } +} + +/// +/// Core orchestration engine for durable execution. Races user code against +/// a termination signal using Task.WhenAny. When user code completes, returns +/// SUCCEEDED/FAILED. When termination wins (wait, callback, invoke), returns PENDING. +/// +internal static class DurableExecutionHandler +{ + /// + /// Runs the user's workflow function within the durable execution engine. + /// + /// + /// + /// Suspension flow — example: await ctx.WaitAsync(TimeSpan.FromSeconds(5)): + /// + /// + /// user code DurableContext TerminationMgr RunAsync + /// ───────── ────────────── ────────────── ──────── + /// WaitAsync(5s) ─────► queue WAIT START + /// checkpoint + /// Terminate() ──────► TerminationTask + /// completes + /// ◄────── new TCS().Task + /// (never completes) + /// await blocks + /// forever WhenAny: + /// ── termination wins + /// ── userTask abandoned + /// ── return Pending + /// + /// + /// Key insight: WaitAsync never returns a completed Task — it hands back + /// a TaskCompletionSource that is never resolved. The user's await blocks + /// indefinitely. The escape signal is terminationManager.Terminate(), + /// which Task.WhenAny picks up. We return Pending; the dangling user + /// Task is GC'd. The service flushes checkpoints, fires the wait timer, then + /// re-invokes Lambda — on replay, WaitAsync sees the matching SUCCEED + /// checkpoint and returns Task.CompletedTask normally. + /// + /// + /// The same pattern applies to retries (RetryScheduled), callbacks + /// (CallbackPending), and chained invokes (InvokePending). + /// + /// + /// The workflow return type. + /// Hydrated execution state from prior invocations. + /// Manages the suspension signal. + /// The user's workflow function receiving a DurableContext. + /// The handler result indicating SUCCEEDED, FAILED, or PENDING. + internal static async Task> RunAsync( + ExecutionState executionState, + TerminationManager terminationManager, + Func> userHandler) + { + // Run user code on a threadpool thread so it executes independently of + // the termination signal. When TerminationManager fires (e.g., WaitAsync), + // we need the WhenAny race below to resolve immediately without waiting + // for the user task to reach an await point. + var userTask = Task.Run(userHandler); + + // Race: user code completing vs. termination signal (wait/callback/retry). + // If termination wins, we return PENDING and the abandoned userTask is never awaited. + var winner = await Task.WhenAny(userTask, terminationManager.TerminationTask); + + if (winner == terminationManager.TerminationTask) + { + var terminationResult = await terminationManager.TerminationTask; + + if (terminationResult.Exception != null) + { + return new HandlerResult + { + Status = InvocationStatus.Failed, + Message = terminationResult.Exception.Message, + Exception = terminationResult.Exception + }; + } + + return new HandlerResult + { + Status = InvocationStatus.Pending, + Message = terminationResult.Message + }; + } + + try + { + var result = await userTask; + return new HandlerResult + { + Status = InvocationStatus.Succeeded, + Result = result + }; + } + catch (Exception ex) + { + return new HandlerResult + { + Status = InvocationStatus.Failed, + Message = ex.Message, + Exception = ex + }; + } + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs b/Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs new file mode 100644 index 000000000..d629a0b2e --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs @@ -0,0 +1,338 @@ +using System.Diagnostics.CodeAnalysis; +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Text.Json.Serialization.Metadata; +using System.Threading; +using Amazon.Lambda; +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution.Internal; +using Amazon.Lambda.DurableExecution.Services; +using Amazon.Lambda.Model; +using Amazon.Runtime; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// Static helper that wraps a durable workflow function, handling all envelope +/// translation between DurableExecutionInvocationInput/Output and user types. +/// +public static class DurableFunction +{ + private static readonly Lazy _cachedLambdaClient = + new(() => new AmazonLambdaClient(), LazyThreadSafetyMode.ExecutionAndPublication); + + // ────────────────────────────────────────────────────────────────────── + // Reflection-based overloads (JIT only) + // ────────────────────────────────────────────────────────────────────── + + /// + /// Wrap a workflow (typed input + output). Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON for TInput/TOutput. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON for TInput/TOutput. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext) + { + return WrapAsyncCore(workflow, invocationInput, lambdaContext, _cachedLambdaClient.Value, jsonContext: null); + } + + /// + /// Wrap a workflow (typed input + output) with explicit Lambda client. + /// Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON for TInput/TOutput. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON for TInput/TOutput. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient) + => WrapAsyncCore(workflow, invocationInput, lambdaContext, lambdaClient, jsonContext: null); + + /// + /// Wrap a void workflow (typed input, no output). Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON for TInput. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON for TInput. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext) + { + return WrapAsync(workflow, invocationInput, lambdaContext, _cachedLambdaClient.Value); + } + + /// + /// Wrap a void workflow with explicit Lambda client. Reflection-based JSON — not AOT-safe. + /// + [RequiresUnreferencedCode("Uses reflection-based JSON for TInput. Use the JsonSerializerContext overload for AOT.")] + [RequiresDynamicCode("Uses reflection-based JSON for TInput. Use the JsonSerializerContext overload for AOT.")] + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient) + => WrapAsyncCore( + async (input, ctx) => { await workflow(input, ctx); return null; }, + invocationInput, lambdaContext, lambdaClient, jsonContext: null); + + // ────────────────────────────────────────────────────────────────────── + // AOT-safe overloads (caller supplies JsonSerializerContext) + // ────────────────────────────────────────────────────────────────────── + + /// + /// Wrap a workflow (typed input + output). AOT-safe — requires + /// [JsonSerializable(typeof(TInput))] and [JsonSerializable(typeof(TOutput))] + /// on the supplied . + /// + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + JsonSerializerContext jsonContext) + { + return WrapAsyncCore(workflow, invocationInput, lambdaContext, _cachedLambdaClient.Value, jsonContext); + } + + /// + /// Wrap a workflow (typed input + output) with explicit Lambda client. AOT-safe. + /// + public static Task WrapAsync( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient, + JsonSerializerContext jsonContext) + => WrapAsyncCore(workflow, invocationInput, lambdaContext, lambdaClient, jsonContext); + + /// + /// Wrap a void workflow (typed input, no output). AOT-safe. + /// + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + JsonSerializerContext jsonContext) + { + return WrapAsyncCore( + async (input, ctx) => { await workflow(input, ctx); return null; }, + invocationInput, lambdaContext, _cachedLambdaClient.Value, jsonContext); + } + + /// + /// Wrap a void workflow with explicit Lambda client. AOT-safe. + /// + public static Task WrapAsync( + Func workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient, + JsonSerializerContext jsonContext) + => WrapAsyncCore( + async (input, ctx) => { await workflow(input, ctx); return null; }, + invocationInput, lambdaContext, lambdaClient, jsonContext); + + // ────────────────────────────────────────────────────────────────────── + // Core implementation + // ────────────────────────────────────────────────────────────────────── + + [UnconditionalSuppressMessage("Trimming", "IL2026", + Justification = "When jsonContext is non-null, dispatch goes through JsonTypeInfo; when null, the caller has [RequiresUnreferencedCode].")] + [UnconditionalSuppressMessage("AOT", "IL3050", + Justification = "When jsonContext is non-null, dispatch goes through JsonTypeInfo; when null, the caller has [RequiresDynamicCode].")] + private static async Task WrapAsyncCore( + Func> workflow, + DurableExecutionInvocationInput invocationInput, + ILambdaContext lambdaContext, + IAmazonLambda lambdaClient, + JsonSerializerContext? jsonContext) + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(invocationInput.InitialExecutionState); + + var serviceClient = new LambdaDurableServiceClient(lambdaClient); + var checkpointToken = invocationInput.CheckpointToken; + + var nextMarker = invocationInput.InitialExecutionState?.NextMarker; + while (!string.IsNullOrEmpty(nextMarker)) + { + var (operations, marker) = await serviceClient.GetExecutionStateAsync( + invocationInput.DurableExecutionArn, checkpointToken, nextMarker); + state.AddOperations(operations); + nextMarker = marker; + } + + var userPayload = ExtractUserPayload(invocationInput, jsonContext); + var terminationManager = new TerminationManager(); + var idGenerator = new OperationIdGenerator(); + + await using var batcher = new CheckpointBatcher( + checkpointToken, + (token, ops, ct) => serviceClient.CheckpointAsync( + invocationInput.DurableExecutionArn, token, ops, ct)); + + var context = new DurableContext( + state, terminationManager, idGenerator, + invocationInput.DurableExecutionArn, lambdaContext, batcher); + + HandlerResult result; + try + { + result = await DurableExecutionHandler.RunAsync( + state, terminationManager, + async () => await workflow(userPayload, context)); + + await batcher.DrainAsync(); + } + catch (AmazonServiceException ex) when (IsTerminalCheckpointError(ex)) + { + return new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Failed, + Error = ErrorObject.FromException(ex) + }; + } + + return MapToOutput(result, jsonContext); + } + + /// + /// Returns true for checkpoint-flush SDK errors that should fail the workflow + /// (Failed envelope) instead of escaping to the host (Lambda retry). + /// + /// + /// Classification rule (mirrors CheckpointError in aws-durable-execution-sdk-python): + /// - 4xx (except 429) → terminal: permanent caller-side failure (missing ARN/KMS key, + /// IAM denial, validation). Retrying will not fix it, so return Failed. + /// - 429 / 5xx / no status (network or SDK-internal) → not terminal: transient, + /// allow the exception to escape so Lambda retries the invocation. + /// - Carve-out: InvalidParameterValueException with a message starting with + /// "Invalid Checkpoint Token" is treated as transient — the service rejects a + /// stale token but a retry with a fresh token will succeed. + /// + /// Only checkpoint-flush errors flow through this catch. There are two paths: + /// 1. A flush triggered synchronously from inside a user StepAsync call + /// (the user awaits EnqueueAsync → batch flush → SDK throws). + /// 2. The final after the workflow returns. + /// + /// State-hydration errors (GetExecutionStateAsync) are NOT caught here — they + /// propagate to the host so Lambda retries, matching Python's GetExecutionStateError + /// (which extends InvocationError). + /// + /// User-code SDK errors (e.g. an SDK call inside a Step body) are caught by + /// StepRunner and surfaced as StepException for the workflow's normal + /// step-failure handling. + /// + private static bool IsTerminalCheckpointError(AmazonServiceException ex) + { + var status = (int)ex.StatusCode; + if (status < 400 || status >= 500 || status == 429) + return false; + + if (ex.ErrorCode == "InvalidParameterValueException" + && ex.Message != null + && ex.Message.StartsWith("Invalid Checkpoint Token", StringComparison.Ordinal)) + { + return false; + } + + return true; + } + + // Shared options for both user-payload deserialization (input) and user-result + // serialization (output) so the naming policy stays symmetric. We only enable + // case-insensitive matching here — keep PascalCase on the wire for output to + // preserve compatibility with existing serialized contracts. Only the user payload + // portion uses these options; the durable-execution envelope itself + // (DurableExecutionInvocationInput/Output) is serialized separately and is not + // affected. + private static readonly JsonSerializerOptions UserPayloadOptions = new() + { + PropertyNameCaseInsensitive = true + }; + + [UnconditionalSuppressMessage("Trimming", "IL2026", Justification = "Guarded by jsonContext null check.")] + [UnconditionalSuppressMessage("AOT", "IL3050", Justification = "Guarded by jsonContext null check.")] + // The user's input payload is stored inside the service envelope as an EXECUTION-type + // operation. This is part of the durable execution wire format — each invocation includes + // its input as a checkpoint record so the service can validate replay consistency. + private static TInput ExtractUserPayload( + DurableExecutionInvocationInput input, + JsonSerializerContext? jsonContext) + { + if (input.InitialExecutionState?.Operations == null) + return default!; + + foreach (var op in input.InitialExecutionState.Operations) + { + if (op.Type != OperationTypes.Execution || op.ExecutionDetails?.InputPayload == null) + continue; + + var payload = op.ExecutionDetails.InputPayload; + if (jsonContext != null) + { + if (jsonContext.GetTypeInfo(typeof(TInput)) is JsonTypeInfo typeInfo) + return JsonSerializer.Deserialize(payload, typeInfo) ?? default!; + + throw new InvalidOperationException( + $"JsonSerializerContext {jsonContext.GetType().FullName} has no JsonTypeInfo for {typeof(TInput).FullName}. " + + "Add [JsonSerializable(typeof(YourInput))] to your context."); + } + + return JsonSerializer.Deserialize(payload, UserPayloadOptions) ?? default!; + } + + return default!; + } + + [UnconditionalSuppressMessage("Trimming", "IL2026", Justification = "Guarded by jsonContext null check.")] + [UnconditionalSuppressMessage("AOT", "IL3050", Justification = "Guarded by jsonContext null check.")] + private static DurableExecutionInvocationOutput MapToOutput( + HandlerResult result, + JsonSerializerContext? jsonContext) + { + return result.Status switch + { + InvocationStatus.Succeeded => new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Succeeded, + Result = SerializeOutput(result.Result, jsonContext) + }, + InvocationStatus.Failed => new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Failed, + Error = result.Exception != null + ? ErrorObject.FromException(result.Exception) + : new ErrorObject { ErrorMessage = result.Message } + }, + // Pending = workflow suspended (wait/retry/callback). No Result or Error — + // the service will re-invoke with accumulated checkpoints when ready. + InvocationStatus.Pending => new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Pending + }, + _ => throw new InvalidOperationException($"Unexpected status: {result.Status}") + }; + } + + [UnconditionalSuppressMessage("Trimming", "IL2026", Justification = "Guarded by jsonContext null check.")] + [UnconditionalSuppressMessage("AOT", "IL3050", Justification = "Guarded by jsonContext null check.")] + private static string? SerializeOutput(TOutput? value, JsonSerializerContext? jsonContext) + { + if (value == null) return null; + + if (jsonContext != null) + { + if (jsonContext.GetTypeInfo(typeof(TOutput)) is JsonTypeInfo typeInfo) + return JsonSerializer.Serialize(value, typeInfo); + + throw new InvalidOperationException( + $"JsonSerializerContext {jsonContext.GetType().FullName} has no JsonTypeInfo for {typeof(TOutput).FullName}. " + + "Add [JsonSerializable(typeof(YourOutput))] to your context."); + } + + return JsonSerializer.Serialize(value, UserPayloadOptions); + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Enums.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Enums.cs new file mode 100644 index 000000000..c1bf44403 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Enums.cs @@ -0,0 +1,14 @@ +namespace Amazon.Lambda.DurableExecution; + +/// +/// The terminal status of a durable execution invocation. +/// +public enum InvocationStatus +{ + /// The workflow completed successfully. + Succeeded, + /// The workflow failed with an unhandled exception. + Failed, + /// The workflow suspended (waiting for time, callback, or invocation). + Pending +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Exceptions/DurableExecutionException.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Exceptions/DurableExecutionException.cs new file mode 100644 index 000000000..0f724b4a2 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Exceptions/DurableExecutionException.cs @@ -0,0 +1,49 @@ +namespace Amazon.Lambda.DurableExecution; + +/// +/// Base exception for all durable execution errors. +/// +public class DurableExecutionException : Exception +{ + /// Creates an empty . + public DurableExecutionException() { } + /// Creates a with the given message. + public DurableExecutionException(string message) : base(message) { } + /// Creates a wrapping an inner exception. + public DurableExecutionException(string message, Exception innerException) : base(message, innerException) { } +} + +/// +/// Thrown when code has changed between invocations, causing a replay mismatch. +/// For example, a step at index 0 was previously a WAIT but is now a STEP. +/// +public class NonDeterministicExecutionException : DurableExecutionException +{ + /// Creates an empty . + public NonDeterministicExecutionException() { } + /// Creates a with the given message. + public NonDeterministicExecutionException(string message) : base(message) { } + /// Creates a wrapping an inner exception. + public NonDeterministicExecutionException(string message, Exception innerException) : base(message, innerException) { } +} + +/// +/// Thrown when user code inside a step fails (after retries exhausted). +/// Contains the original error details from the checkpoint. +/// +public class StepException : DurableExecutionException +{ + /// The fully-qualified type name of the original exception. + public string? ErrorType { get; init; } + /// Optional structured error data attached by the user. + public string? ErrorData { get; init; } + /// Stack trace of the original exception, captured before serialization. + public IReadOnlyList? OriginalStackTrace { get; init; } + + /// Creates an empty . + public StepException() { } + /// Creates a with the given message. + public StepException(string message) : base(message) { } + /// Creates a wrapping an inner exception. + public StepException(string message, Exception innerException) : base(message, innerException) { } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/ICheckpointSerializer.cs b/Libraries/src/Amazon.Lambda.DurableExecution/ICheckpointSerializer.cs new file mode 100644 index 000000000..3d7175b4d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/ICheckpointSerializer.cs @@ -0,0 +1,25 @@ +namespace Amazon.Lambda.DurableExecution; + +/// +/// Serializes and deserializes checkpoint operation results. +/// +/// The type to serialize. +public interface ICheckpointSerializer +{ + /// + /// Serializes a value for checkpoint storage. + /// + string Serialize(T value, SerializationContext context); + + /// + /// Deserializes a value from checkpoint storage. + /// + T Deserialize(string data, SerializationContext context); +} + +/// +/// Context information available during serialization/deserialization. +/// +/// The deterministic operation ID for this step. +/// The ARN of the current durable execution. +public record SerializationContext(string OperationId, string DurableExecutionArn); diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs b/Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs new file mode 100644 index 000000000..ff18d1218 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs @@ -0,0 +1,108 @@ +using System.Diagnostics.CodeAnalysis; +using Amazon.Lambda.Core; +using Microsoft.Extensions.Logging; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// The primary interface for durable execution operations. +/// Passed to user workflow functions to access checkpointed steps and waits. +/// Additional operations (callbacks, parallel, map, etc.) are added in +/// follow-up PRs. +/// +public interface IDurableContext +{ + /// + /// A logger scoped to the durable execution. Currently returns + /// ; + /// the replay-safe DurableLogger (suppresses messages during replay) + /// ships in a follow-up PR. + /// + ILogger Logger { get; } + + /// + /// Metadata about the current durable execution. + /// + IExecutionContext ExecutionContext { get; } + + /// + /// The underlying Lambda context. + /// + ILambdaContext LambdaContext { get; } + + /// + /// Execute a step with automatic checkpointing. The step result is serialized + /// to a checkpoint using reflection-based System.Text.Json. + /// For NativeAOT or trimmed deployments, use the overload that takes an + /// . + /// + [RequiresUnreferencedCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + [RequiresDynamicCode("Reflection-based JSON for T. Use the ICheckpointSerializer overload for AOT/trimmed deployments.")] + Task StepAsync( + Func> func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute a step that returns no value. + /// + Task StepAsync( + Func func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Execute a step with AOT-safe checkpoint serialization. The supplied + /// is used in place of reflection-based JSON. + /// + Task StepAsync( + Func> func, + ICheckpointSerializer serializer, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + + /// + /// Suspend execution for the specified duration without consuming compute time. + /// The Lambda is suspended and the service re-invokes it after the wait elapses. + /// Duration must be at least 1 second (service timer granularity). + /// + Task WaitAsync( + TimeSpan duration, + string? name = null, + CancellationToken cancellationToken = default); +} + +/// +/// Context passed to step functions. +/// +public interface IStepContext +{ + /// + /// Logger scoped to this step. + /// + ILogger Logger { get; } + + /// + /// The current retry attempt number (1-based). + /// + int AttemptNumber { get; } + + /// + /// The deterministic operation ID for this step. + /// + string OperationId { get; } +} + +/// +/// Metadata about the current execution. +/// +public interface IExecutionContext +{ + /// + /// The ARN of the current durable execution. + /// + string DurableExecutionArn { get; } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcher.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcher.cs new file mode 100644 index 000000000..b800ef55d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcher.cs @@ -0,0 +1,216 @@ +using System.Runtime.ExceptionServices; +using System.Threading.Channels; +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Background batcher for outbound checkpoint updates. Operations are enqueued +/// via ; a single worker drains the queue and flushes +/// each batch via the supplied flushAsync delegate. Each EnqueueAsync +/// call awaits the flush of its containing batch (sync semantics). +/// +/// +/// Fire-and-forget semantics are achieved by simply not awaiting the returned +/// Task — matching Java/Python/JS SDKs which use the same one-method pattern. +/// Errors still surface deterministically via _terminalError: the next +/// sync or rethrows. +/// Callers using fire-and-forget should observe the discarded Task's exception +/// (see StepOperation.FireAndForget) so it doesn't trip the runtime's +/// UnobservedTaskException event. +/// +internal sealed class CheckpointBatcher : IAsyncDisposable +{ + private readonly Func, CancellationToken, Task> _flushAsync; + private readonly CheckpointBatcherConfig _config; + private readonly Channel _channel; + private readonly Task _worker; + private readonly CancellationTokenSource _shutdownCts = new(); + + private string? _checkpointToken; + private Exception? _terminalError; + private int _disposed; + + public CheckpointBatcher( + string? initialCheckpointToken, + Func, CancellationToken, Task> flushAsync, + CheckpointBatcherConfig? config = null) + { + _checkpointToken = initialCheckpointToken; + _flushAsync = flushAsync; + _config = config ?? new CheckpointBatcherConfig(); + _channel = Channel.CreateUnbounded(new UnboundedChannelOptions + { + SingleReader = true, + SingleWriter = false + }); + _worker = Task.Run(() => RunWorkerAsync(_shutdownCts.Token)); + } + + /// + /// The most recent checkpoint token returned by the service. Updated after + /// every successful batch flush. + /// + public string? CheckpointToken => Volatile.Read(ref _checkpointToken); + + /// + /// Queues for flushing. The returned Task completes + /// when the batch containing this update has been successfully flushed to the + /// service. If the worker has already encountered a terminal error, the + /// exception is rethrown immediately. + /// + public async Task EnqueueAsync(SdkOperationUpdate update, CancellationToken cancellationToken = default) + { + var terminal = Volatile.Read(ref _terminalError); + if (terminal != null) ExceptionDispatchInfo.Throw(terminal); + + var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); + var item = new BatchItem(update, tcs); + + if (!_channel.Writer.TryWrite(item)) + { + // Writer is completed (terminal error or disposed) — surface the cause. + terminal = Volatile.Read(ref _terminalError); + if (terminal != null) ExceptionDispatchInfo.Throw(terminal); + throw new ObjectDisposedException(nameof(CheckpointBatcher)); + } + + await tcs.Task.WaitAsync(cancellationToken).ConfigureAwait(false); + } + + /// + /// Closes the channel and awaits the worker. Any items already enqueued are + /// flushed; any subsequent call throws. + /// + public async Task DrainAsync() + { + _channel.Writer.TryComplete(); + try + { + await _worker.ConfigureAwait(false); + } + catch + { + // Surfaced via _terminalError below. + } + + var terminal = Volatile.Read(ref _terminalError); + if (terminal != null) ExceptionDispatchInfo.Throw(terminal); + } + + public async ValueTask DisposeAsync() + { + if (Interlocked.Exchange(ref _disposed, 1) != 0) return; + + _channel.Writer.TryComplete(); + _shutdownCts.Cancel(); + try { await _worker.ConfigureAwait(false); } + catch { /* swallow on dispose */ } + _shutdownCts.Dispose(); + } + + private async Task RunWorkerAsync(CancellationToken shutdownToken) + { + // TODO: also enforce _config.MaxBatchBytes here. Today we only cap by + // operation count; an item whose serialized size pushes the batch over + // ~750 KB will be sent and rejected service-side. See CheckpointBatcherConfig. + var batch = new List(_config.MaxBatchOperations); + + try + { + while (await _channel.Reader.WaitToReadAsync(shutdownToken).ConfigureAwait(false)) + { + // Drain everything currently queued. + while (_channel.Reader.TryRead(out var item)) + { + batch.Add(item); + if (batch.Count >= _config.MaxBatchOperations) + { + await FlushBatchAsync(batch, shutdownToken).ConfigureAwait(false); + batch.Clear(); + } + } + + // Optionally wait for late arrivals to coalesce into one batch. + if (_config.FlushInterval > TimeSpan.Zero && batch.Count > 0) + { + using var windowCts = CancellationTokenSource.CreateLinkedTokenSource(shutdownToken); + windowCts.CancelAfter(_config.FlushInterval); + try + { + while (await _channel.Reader.WaitToReadAsync(windowCts.Token).ConfigureAwait(false)) + { + while (_channel.Reader.TryRead(out var item)) + { + batch.Add(item); + if (batch.Count >= _config.MaxBatchOperations) + { + await FlushBatchAsync(batch, shutdownToken).ConfigureAwait(false); + batch.Clear(); + } + } + } + } + catch (OperationCanceledException) when (!shutdownToken.IsCancellationRequested) + { + // Window elapsed; fall through to flush. + } + } + + if (batch.Count > 0) + { + await FlushBatchAsync(batch, shutdownToken).ConfigureAwait(false); + batch.Clear(); + } + } + } + catch (OperationCanceledException) when (shutdownToken.IsCancellationRequested) + { + // Disposed mid-wait; fall through to drain. + } + catch (Exception ex) + { + // FlushBatchAsync's exception path already records _terminalError and + // signals batch members. This catch covers anything else (channel, + // logic). Make sure we still propagate. + Volatile.Write(ref _terminalError, ex); + } + finally + { + // Anything left in the channel after the worker exits — fail it. + var failure = Volatile.Read(ref _terminalError) ?? new ObjectDisposedException(nameof(CheckpointBatcher)); + foreach (var leftover in batch) + leftover.Completion.TrySetException(failure); + while (_channel.Reader.TryRead(out var item)) + item.Completion.TrySetException(failure); + + _channel.Writer.TryComplete(); + } + } + + private async Task FlushBatchAsync(IReadOnlyList batch, CancellationToken cancellationToken) + { + var updates = new SdkOperationUpdate[batch.Count]; + for (int i = 0; i < batch.Count; i++) + updates[i] = batch[i].Update; + + try + { + var newToken = await _flushAsync(_checkpointToken, updates, cancellationToken).ConfigureAwait(false); + Volatile.Write(ref _checkpointToken, newToken); + foreach (var item in batch) + item.Completion.TrySetResult(true); + } + catch (Exception ex) + { + Volatile.Write(ref _terminalError, ex); + foreach (var item in batch) + item.Completion.TrySetException(ex); + _channel.Writer.TryComplete(); + // No rethrow: the worker loop exits via the completed channel and + // RunWorkerAsync's finally handles any leftovers. + } + } + + private readonly record struct BatchItem(SdkOperationUpdate Update, TaskCompletionSource Completion); +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcherConfig.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcherConfig.cs new file mode 100644 index 000000000..a5e60b98e --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/CheckpointBatcherConfig.cs @@ -0,0 +1,35 @@ +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Tunables for . +/// +internal sealed class CheckpointBatcherConfig +{ + /// + /// How long the worker waits for additional items to coalesce into a single + /// batch before flushing. Default = flush as soon + /// as the queue drains. Increase to reduce API calls when many checkpoints + /// are emitted concurrently (e.g. parallel branches, future Map operation). + /// + public TimeSpan FlushInterval { get; init; } = TimeSpan.Zero; + + /// + /// Maximum operations per batch. Service-side limit is 200. + /// + public int MaxBatchOperations { get; init; } = 200; + + /// + /// Maximum batch size in bytes. Service-side limit is ~750 KB. + /// + /// + /// TODO: not enforced today. The worker only checks ; + /// a single oversized item (or a batch whose serialized size exceeds 750 KB) + /// will be sent to the service and rejected there. Java/JS/Python all + /// pre-flight this on the in-flight batch and split before the next add. + /// Wire this in alongside the async-flush operations (Map / Parallel / + /// child-context) since those are the scenarios that can actually fill a + /// batch — today every batch is 1 item with + /// = Zero, so the gap is latent. + /// + internal int MaxBatchBytes { get; init; } = 750 * 1024; +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/DurableOperation.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/DurableOperation.cs new file mode 100644 index 000000000..907d6e128 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/DurableOperation.cs @@ -0,0 +1,73 @@ +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Abstract base for durable operations (Step, Wait, ...). Subclasses implement +/// (no prior checkpoint) and +/// (some checkpoint exists); the base handles lookup and dispatch. +/// +/// The operation's result type. +internal abstract class DurableOperation +{ + protected readonly ExecutionState State; + protected readonly TerminationManager Termination; + protected readonly string OperationId; + protected readonly string? Name; + protected readonly string DurableExecutionArn; + protected readonly CheckpointBatcher? Batcher; + + protected DurableOperation( + string operationId, + string? name, + ExecutionState state, + TerminationManager termination, + string durableExecutionArn, + CheckpointBatcher? batcher = null) + { + OperationId = operationId; + Name = name; + State = state; + Termination = termination; + DurableExecutionArn = durableExecutionArn; + Batcher = batcher; + } + + /// The wire-format operation type (e.g. "STEP", "WAIT"). + protected abstract string OperationType { get; } + + /// + /// Looks up any prior checkpoint for this op and dispatches to + /// (none) or (some). + /// + public Task ExecuteAsync(CancellationToken cancellationToken) + { + State.ValidateReplayConsistency(OperationId, OperationType, Name); + + // Record that the workflow has reached this op. If every completed + // checkpointed op has now been visited, the state flips out of replay. + State.TrackReplay(OperationId); + + var existing = State.GetOperation(OperationId); + return existing == null + ? StartAsync(cancellationToken) + : ReplayAsync(existing, cancellationToken); + } + + /// First-time execution path: no prior checkpoint exists. + protected abstract Task StartAsync(CancellationToken cancellationToken); + + /// + /// Replay path: a checkpoint from a prior invocation exists. Subclasses + /// switch on . + /// against constants. + /// + protected abstract Task ReplayAsync(Operation existing, CancellationToken cancellationToken); + + /// + /// Enqueues an outbound checkpoint and awaits its batch flush. No-op when + /// no batcher is wired (e.g. unit tests that don't exercise flushing). + /// + protected Task EnqueueAsync(SdkOperationUpdate update, CancellationToken cancellationToken = default) + => Batcher?.EnqueueAsync(update, cancellationToken) ?? Task.CompletedTask; +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ExecutionState.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ExecutionState.cs new file mode 100644 index 000000000..606614621 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ExecutionState.cs @@ -0,0 +1,134 @@ +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// In-memory store of the operations replayed from +/// plus replay-mode tracking. Outbound checkpoints are owned by +/// ; this type is the inbound side only. +/// +/// +/// Replay tracking mirrors the Python / Java / JavaScript reference SDKs: +/// +/// At construction the workflow is "replaying" iff any user-replayable +/// op is present. The service always sends one EXECUTION-type op +/// carrying the input payload — that's bookkeeping, not user history, +/// so it doesn't count. +/// is called by every DurableOperation.ExecuteAsync +/// at the top of the call. Once every checkpointed completed +/// non-EXECUTION op has been visited, the workflow has caught up +/// to the replay frontier and flips to false +/// for the rest of the invocation. +/// +/// +internal sealed class ExecutionState +{ + private readonly Dictionary _operations = new(); + private readonly HashSet _visitedOperations = new(); + private bool _isReplaying; + + public int CheckpointedOperationCount => _operations.Count; + + /// + /// True when the workflow is re-deriving prior operations from checkpointed + /// state. False when running fresh (not-yet-checkpointed) code. + /// + public bool IsReplaying => _isReplaying; + + public void LoadFromCheckpoint(InitialExecutionState? initialState) + { + if (initialState?.Operations != null) + { + AddOperations(initialState.Operations); + } + + // Only user-replayable ops put us into replay mode. The service-side + // EXECUTION op (input payload bookkeeping) is always present and must + // not count — see Python execution.py:258 / Java ExecutionManager:81 / + // JS execution-context.ts:62 for the same rule. + _isReplaying = HasReplayableOperations(); + } + + public void AddOperations(IEnumerable operations) + { + foreach (var op in operations) + { + if (op.Id == null) continue; + _operations[op.Id] = op; + } + } + + /// + /// Returns the checkpointed record for , or null + /// if none. Callers should switch on against + /// constants to decide replay behavior. + /// + public Operation? GetOperation(string operationId) + { + _operations.TryGetValue(operationId, out var op); + return op; + } + + public bool HasOperation(string operationId) => _operations.ContainsKey(operationId); + + /// + /// Records that the workflow has reached . + /// Once every checkpointed completed non-EXECUTION op has been + /// visited the workflow has caught up to the replay frontier and + /// flips to false. Idempotent: calling more than + /// once with the same id has no additional effect. + /// + public void TrackReplay(string operationId) + { + if (!_isReplaying) return; + + _visitedOperations.Add(operationId); + + // Have we visited every completed non-EXECUTION op? If so, anything + // emitted from here on is fresh execution. + foreach (var op in _operations.Values) + { + if (op.Type == OperationTypes.Execution) continue; + if (!IsTerminalStatus(op.Status)) continue; + if (!_visitedOperations.Contains(op.Id!)) return; + } + + _isReplaying = false; + } + + public void ValidateReplayConsistency(string operationId, string expectedType, string? expectedName) + { + if (!_isReplaying) return; + + if (!_operations.TryGetValue(operationId, out var op)) return; + + if (op.Type != null && op.Type != expectedType) + { + throw new NonDeterministicExecutionException( + $"Non-deterministic execution detected for operation '{operationId}': " + + $"expected type '{expectedType}' but found '{op.Type}' from a previous invocation. " + + $"Code must not change the order or type of durable operations between deployments."); + } + + if (expectedName != null && op.Name != null && op.Name != expectedName) + { + throw new NonDeterministicExecutionException( + $"Non-deterministic execution detected for operation '{operationId}': " + + $"expected name '{expectedName}' but found '{op.Name}' from a previous invocation. " + + $"Code must not change the order or type of durable operations between deployments."); + } + } + + private bool HasReplayableOperations() + { + foreach (var op in _operations.Values) + { + if (op.Type != OperationTypes.Execution) return true; + } + return false; + } + + private static bool IsTerminalStatus(string? status) => + status == OperationStatuses.Succeeded + || status == OperationStatuses.Failed + || status == OperationStatuses.Cancelled + || status == OperationStatuses.Stopped; +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/Operation.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/Operation.cs new file mode 100644 index 000000000..473c7a3b2 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/Operation.cs @@ -0,0 +1,140 @@ +using System.Text.Json.Serialization; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// One operation in the durable execution service's invocation envelope. +/// Property names mirror the wire format exactly so System.Text.Json can +/// populate this type declaratively. Internal — consumed by ExecutionState +/// and DurableContext during replay; never exposed on a public surface. +/// +internal sealed class Operation +{ + [JsonPropertyName("Id")] + public string? Id { get; set; } + + [JsonPropertyName("Type")] + public string? Type { get; set; } + + [JsonPropertyName("Status")] + public string? Status { get; set; } + + [JsonPropertyName("Name")] + public string? Name { get; set; } + + [JsonPropertyName("ParentId")] + public string? ParentId { get; set; } + + [JsonPropertyName("SubType")] + public string? SubType { get; set; } + + [JsonPropertyName("StartTimestamp")] + public long? StartTimestamp { get; set; } + + [JsonPropertyName("EndTimestamp")] + public long? EndTimestamp { get; set; } + + [JsonPropertyName("StepDetails")] + public StepDetails? StepDetails { get; set; } + + [JsonPropertyName("WaitDetails")] + public WaitDetails? WaitDetails { get; set; } + + [JsonPropertyName("ExecutionDetails")] + public ExecutionDetails? ExecutionDetails { get; set; } + + [JsonPropertyName("CallbackDetails")] + public CallbackDetails? CallbackDetails { get; set; } + + [JsonPropertyName("ChainedInvokeDetails")] + public ChainedInvokeDetails? ChainedInvokeDetails { get; set; } + + [JsonPropertyName("ContextDetails")] + public ContextDetails? ContextDetails { get; set; } +} + +internal sealed class StepDetails +{ + [JsonPropertyName("Result")] + public string? Result { get; set; } + + [JsonPropertyName("Error")] + public ErrorObject? Error { get; set; } + + [JsonPropertyName("Attempt")] + public int? Attempt { get; set; } + + [JsonPropertyName("NextAttemptTimestamp")] + public long? NextAttemptTimestamp { get; set; } +} + +internal sealed class WaitDetails +{ + [JsonPropertyName("ScheduledEndTimestamp")] + public long? ScheduledEndTimestamp { get; set; } +} + +internal sealed class ExecutionDetails +{ + [JsonPropertyName("InputPayload")] + public string? InputPayload { get; set; } +} + +internal sealed class CallbackDetails +{ + [JsonPropertyName("CallbackId")] + public string? CallbackId { get; set; } + + [JsonPropertyName("Result")] + public string? Result { get; set; } + + [JsonPropertyName("Error")] + public ErrorObject? Error { get; set; } +} + +internal sealed class ChainedInvokeDetails +{ + [JsonPropertyName("Result")] + public string? Result { get; set; } + + [JsonPropertyName("Error")] + public ErrorObject? Error { get; set; } +} + +internal sealed class ContextDetails +{ + [JsonPropertyName("Result")] + public string? Result { get; set; } + + [JsonPropertyName("Error")] + public ErrorObject? Error { get; set; } +} + +/// +/// Wire-format string constants. +/// Plural name avoids collision with Amazon.Lambda.OperationType. +/// +internal static class OperationTypes +{ + public const string Step = "STEP"; + public const string Wait = "WAIT"; + public const string Callback = "CALLBACK"; + public const string ChainedInvoke = "CHAINED_INVOKE"; + public const string Context = "CONTEXT"; + public const string Execution = "EXECUTION"; +} + +/// +/// Wire-format string constants. +/// Plural name avoids collision with Amazon.Lambda.OperationStatus. +/// +internal static class OperationStatuses +{ + public const string Started = "STARTED"; + public const string Succeeded = "SUCCEEDED"; + public const string Failed = "FAILED"; + public const string Pending = "PENDING"; + public const string Cancelled = "CANCELLED"; + public const string Ready = "READY"; + public const string Stopped = "STOPPED"; +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/OperationIdGenerator.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/OperationIdGenerator.cs new file mode 100644 index 000000000..fef9cab19 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/OperationIdGenerator.cs @@ -0,0 +1,101 @@ +using System.Security.Cryptography; +using System.Text; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Generates deterministic operation IDs for durable operations. Each call +/// increments an internal counter and SHA-256 hashes "<parentId>-<counter>" +/// (or just "<counter>" at the root). Hashing matches the wire format +/// used by the Java/JS/Python SDKs so the same workflow position produces a +/// stable, opaque ID across replays — and the human-readable step name is +/// carried separately on OperationUpdate.Name, so renaming a step does +/// not break replay correlation. +/// +internal sealed class OperationIdGenerator +{ + private int _counter; + private readonly string _prefix; + + /// + /// Creates a root-level generator. + /// + public OperationIdGenerator() + : this(parentId: null) + { + } + + /// + /// Creates a child generator scoped under a parent operation. The parent + /// ID (already hashed) becomes part of the prefix, so child IDs are + /// hash("<parentHash>-1"), hash("<parentHash>-2"), etc. + /// + public OperationIdGenerator(string? parentId) + { + _counter = 0; + ParentId = parentId; + _prefix = parentId != null ? parentId + "-" : string.Empty; + } + + /// + /// Gets the parent operation ID, if any. + /// + public string? ParentId { get; } + + /// + /// Generates the next operation ID. The counter is pre-incremented so the + /// first ID is hash("1"), matching the reference SDKs. + /// + public string NextId() + { + var counter = ++_counter; + return HashOperationId(_prefix + counter.ToString(System.Globalization.CultureInfo.InvariantCulture)); + } + + /// + /// SHA-256 hashes and returns a 64-char lowercase + /// hex digest. Public so tests and child-context construction can reproduce + /// the same hashing logic. + /// + public static string HashOperationId(string rawId) + { + var bytes = Encoding.UTF8.GetBytes(rawId); + Span hash = stackalloc byte[32]; +#if NET8_0_OR_GREATER + SHA256.HashData(bytes, hash); +#else + using var sha = SHA256.Create(); + var computed = sha.ComputeHash(bytes); + computed.CopyTo(hash); +#endif + return ToHex(hash); + } + + private static string ToHex(ReadOnlySpan bytes) + { + const string Hex = "0123456789abcdef"; + var chars = new char[bytes.Length * 2]; + for (int i = 0; i < bytes.Length; i++) + { + chars[i * 2] = Hex[bytes[i] >> 4]; + chars[i * 2 + 1] = Hex[bytes[i] & 0xF]; + } + return new string(chars); + } + + /// + /// Creates a child generator scoped under an operation ID from this generator. + /// + public OperationIdGenerator CreateChild(string operationId) + { + return new OperationIdGenerator(operationId); + } + + /// + /// Resets the counter (used for testing only). + /// + internal void Reset() + { + _counter = 0; + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ReflectionJsonCheckpointSerializer.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ReflectionJsonCheckpointSerializer.cs new file mode 100644 index 000000000..f7a3d0572 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/ReflectionJsonCheckpointSerializer.cs @@ -0,0 +1,36 @@ +using System.Diagnostics.CodeAnalysis; +using System.Text.Json; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Default backed by reflection-based +/// . Constructed only by the reflection-overload +/// path of DurableContext.StepAsync; the constructor carries +/// so AOT/trimmed deployments +/// see the warning at the call site that picks this overload. +/// +internal sealed class ReflectionJsonCheckpointSerializer : ICheckpointSerializer +{ + [RequiresUnreferencedCode("Uses reflection-based JsonSerializer; not AOT-safe.")] + [RequiresDynamicCode("Uses reflection-based JsonSerializer; not AOT-safe.")] + public ReflectionJsonCheckpointSerializer() { } + + [UnconditionalSuppressMessage("Trimming", "IL2026", + Justification = "Reflection-based JsonSerializer call is acknowledged on the constructor.")] + [UnconditionalSuppressMessage("AOT", "IL3050", + Justification = "Reflection-based JsonSerializer call is acknowledged on the constructor.")] + public string Serialize(T value, SerializationContext context) + { + return JsonSerializer.Serialize(value); + } + + [UnconditionalSuppressMessage("Trimming", "IL2026", + Justification = "Reflection-based JsonSerializer call is acknowledged on the constructor.")] + [UnconditionalSuppressMessage("AOT", "IL3050", + Justification = "Reflection-based JsonSerializer call is acknowledged on the constructor.")] + public T Deserialize(string data, SerializationContext context) + { + return JsonSerializer.Deserialize(data)!; + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/StepOperation.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/StepOperation.cs new file mode 100644 index 000000000..54e52005d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/StepOperation.cs @@ -0,0 +1,304 @@ +using Microsoft.Extensions.Logging; +using SdkErrorObject = Amazon.Lambda.Model.ErrorObject; +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; +using SdkStepOptions = Amazon.Lambda.Model.StepOptions; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Durable step operation. Runs the user's function (with retry support), +/// persisting its result so subsequent invocations replay the cached value +/// without re-executing. +/// +/// +/// Replay branches — example: await ctx.StepAsync(ChargeCard, "charge") +/// +/// Fresh: no prior state → run func → emit SUCCEED → return. +/// SUCCEEDED: return cached result; func is NOT re-executed. +/// FAILED: re-throw the recorded exception. +/// PENDING (retry timer not yet fired): re-suspend without +/// running func; service re-invokes once NextAttemptTimestamp elapses. +/// STARTED + AtMostOncePerRetry: crash recovery — treat as a +/// failed attempt, route through retry strategy. +/// READY: service has post-PENDING re-invoked us; the retry +/// timer fired and the next attempt is up. Run it. +/// +/// Serialization is delegated to the supplied ; +/// the AOT-safe overloads of IDurableContext.StepAsync wire in a +/// user-supplied serializer, while the reflection overloads inject +/// . +/// +internal sealed class StepOperation : DurableOperation +{ + private readonly Func> _func; + private readonly StepConfig? _config; + private readonly ICheckpointSerializer _serializer; + private readonly ILogger _logger; + + public StepOperation( + string operationId, + string? name, + Func> func, + StepConfig? config, + ICheckpointSerializer serializer, + ILogger logger, + ExecutionState state, + TerminationManager termination, + string durableExecutionArn, + CheckpointBatcher? batcher = null) + : base(operationId, name, state, termination, durableExecutionArn, batcher) + { + _func = func; + _config = config; + _serializer = serializer; + _logger = logger; + } + + protected override string OperationType => OperationTypes.Step; + + protected override Task StartAsync(CancellationToken cancellationToken) + => ExecuteFunc(attemptNumber: 1, cancellationToken); + + protected override Task ReplayAsync(Operation existing, CancellationToken cancellationToken) + { + switch (existing.Status) + { + case OperationStatuses.Succeeded: + // Side-effecting code runs at most once: replay returns the + // cached result without invoking func. + return Task.FromResult(DeserializeResult(existing.StepDetails?.Result)); + + case OperationStatuses.Failed: + // Retries were exhausted or never configured — re-throw so the + // user's catch-block flow matches the original execution. + throw CreateStepException(existing); + + case OperationStatuses.Pending: + return ReplayPending(existing, cancellationToken); + + case OperationStatuses.Started: + return ReplayStarted(existing, cancellationToken); + + case OperationStatuses.Ready: + return ReplayReady(existing, cancellationToken); + + default: + // Unknown status — treat as fresh. + return ExecuteFunc(attemptNumber: 1, cancellationToken); + } + } + + /// + /// READY means the service has post-PENDING re-invoked us — the retry + /// timer fired and the step is eligible to run its next attempt. No + /// timer check is needed (the service has already decided we're up); + /// just advance the attempt counter and execute. Matches Java's + /// case READY -> executeStepLogic(attempt). + /// + private Task ReplayReady(Operation ready, CancellationToken cancellationToken) + { + var attemptNumber = (ready.StepDetails?.Attempt ?? 0) + 1; + return ExecuteFunc(attemptNumber, cancellationToken); + } + + /// + /// PENDING means a retry was scheduled (RETRY checkpoint). If + /// NextAttemptTimestamp is in the future, re-suspend; otherwise the timer + /// has fired and we run the next attempt. + /// + private Task ReplayPending(Operation pending, CancellationToken cancellationToken) + { + var nextAttemptTs = pending.StepDetails?.NextAttemptTimestamp; + var attemptNumber = (pending.StepDetails?.Attempt ?? 0) + 1; + + if (nextAttemptTs is { } scheduledMs && + DateTimeOffset.UtcNow.ToUnixTimeMilliseconds() < scheduledMs) + { + // Retry timer hasn't fired yet — re-suspend so we don't bill compute + // while the timer ticks. Service re-invokes once the timer elapses. + return Termination.SuspendAndAwait( + TerminationReason.RetryScheduled, $"retry:{Name ?? OperationId}"); + } + + return ExecuteFunc(attemptNumber, cancellationToken); + } + + /// + /// STARTED means a START checkpoint was written but no SUCCEED/FAIL exists. + /// For AtMostOncePerRetry this signals a crash mid-step — treat as failure + /// and route through retry. For AtLeastOncePerRetry just re-execute. + /// + private Task ReplayStarted(Operation started, CancellationToken cancellationToken) + { + var attemptNumber = (started.StepDetails?.Attempt ?? 0) + 1; + + if (_config?.Semantics == StepSemantics.AtMostOncePerRetry) + { + // Re-running func would risk a duplicate side effect (e.g. double + // charge). Treat the lost result as a failure; let the retry + // strategy decide whether to try again or give up. + var error = started.StepDetails?.Error; + var ex = error != null + ? new StepException(error.ErrorMessage ?? "Step failed on previous attempt") { ErrorType = error.ErrorType } + : new StepException("Step result lost during AtMostOncePerRetry replay"); + return HandleStepFailureAsync(ex, attemptNumber, cancellationToken); + } + + return ExecuteFunc(attemptNumber, cancellationToken); + } + + private async Task ExecuteFunc(int attemptNumber, CancellationToken cancellationToken) + { + cancellationToken.ThrowIfCancellationRequested(); + + // Emit a START checkpoint before running user code, unless we're already + // resuming a STARTED record (which means an earlier attempt wrote it). + // + // AtMostOncePerRetry: SYNC flush. If Lambda crashes before SUCCEED is + // flushed, ReplayStarted routes through retry instead of re-executing. + // A queued-but-unflushed START is indistinguishable from "never ran" if + // we die, so the sync flush is correctness-load-bearing here. + // + // AtLeastOncePerRetry (default): FIRE-AND-FORGET. Replay correctness + // doesn't depend on the START — SUCCEED alone is sufficient — so this + // is purely telemetry (attempt timing, retry count visible in history). + // Java/Python/JS SDKs all use the same pattern: one enqueue API, sync + // for AtMostOnce, async for AtLeastOnce. + if (State.GetOperation(OperationId)?.Status != OperationStatuses.Started) + { + var startUpdate = new SdkOperationUpdate + { + Id = OperationId, + Type = OperationTypes.Step, + Action = "START", + SubType = "Step", + Name = Name + }; + + if (_config?.Semantics == StepSemantics.AtMostOncePerRetry) + { + await EnqueueAsync(startUpdate, cancellationToken); + } + else + { + FireAndForget(EnqueueAsync(startUpdate, cancellationToken)); + } + } + + + try + { + var stepContext = new StepContext(OperationId, attemptNumber, _logger); + var result = await _func(stepContext); + + await EnqueueAsync(new SdkOperationUpdate + { + Id = OperationId, + Type = OperationTypes.Step, + Action = "SUCCEED", + SubType = "Step", + Name = Name, + Payload = SerializeResult(result) + }, cancellationToken); + + return result; + } + catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) + { + throw; + } + catch (Exception ex) + { + // Funnel into the retry/fail decision tree. May checkpoint RETRY and + // suspend (Pending), or checkpoint FAIL and rethrow to user. + return await HandleStepFailureAsync(ex, attemptNumber, cancellationToken); + } + } + + /// + /// Funnels a step failure into the retry/fail decision. May checkpoint + /// RETRY and suspend (Pending), or checkpoint FAIL and rethrow. + /// + private async Task HandleStepFailureAsync(Exception ex, int attemptNumber, CancellationToken cancellationToken) + { + var retryStrategy = _config?.RetryStrategy; + if (retryStrategy != null) + { + var decision = retryStrategy.ShouldRetry(ex, attemptNumber); + if (decision.ShouldRetry) + { + var delaySeconds = (int)Math.Max(1, Math.Ceiling(decision.Delay.TotalSeconds)); + await EnqueueAsync(new SdkOperationUpdate + { + Id = OperationId, + Type = OperationTypes.Step, + Action = "RETRY", + SubType = "Step", + Name = Name, + Error = ToSdkError(ex), + StepOptions = new SdkStepOptions { NextAttemptDelaySeconds = delaySeconds } + }, cancellationToken); + return await Termination.SuspendAndAwait( + TerminationReason.RetryScheduled, $"retry:{Name ?? OperationId}"); + } + } + + await EnqueueAsync(new SdkOperationUpdate + { + Id = OperationId, + Type = OperationTypes.Step, + Action = "FAIL", + SubType = "Step", + Name = Name, + Error = ToSdkError(ex) + }, cancellationToken); + + throw new StepException(ex.Message, ex) + { + ErrorType = ex.GetType().FullName + }; + } + + private T DeserializeResult(string? serialized) + { + if (serialized == null) return default!; + return _serializer.Deserialize(serialized, new SerializationContext(OperationId, DurableExecutionArn)); + } + + private string SerializeResult(T value) + => _serializer.Serialize(value, new SerializationContext(OperationId, DurableExecutionArn)); + + private static StepException CreateStepException(Operation failedOp) + { + var err = failedOp.StepDetails?.Error; + return new StepException(err?.ErrorMessage ?? "Step failed") + { + ErrorType = err?.ErrorType, + ErrorData = err?.ErrorData, + OriginalStackTrace = err?.StackTrace + }; + } + + private static SdkErrorObject ToSdkError(Exception ex) => new() + { + ErrorType = ex.GetType().FullName, + ErrorMessage = ex.Message, + StackTrace = ex.StackTrace?.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries).ToList() + }; + + /// + /// Discards a Task but observes any exception so it doesn't surface as an + /// UnobservedTaskException. Used for fire-and-forget START checkpoints + /// under AtLeastOncePerRetry semantics. The actual error still propagates + /// via CheckpointBatcher._terminalError: the next sync EnqueueAsync + /// or DrainAsync will rethrow with the original cause. + /// + private static void FireAndForget(Task task) + { + _ = task.ContinueWith( + static t => _ = t.Exception, + CancellationToken.None, + TaskContinuationOptions.OnlyOnFaulted | TaskContinuationOptions.ExecuteSynchronously, + TaskScheduler.Default); + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/TerminationManager.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/TerminationManager.cs new file mode 100644 index 000000000..5d61e611b --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/TerminationManager.cs @@ -0,0 +1,78 @@ +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// The reason the execution was terminated. +/// +internal enum TerminationReason +{ + WaitScheduled, + RetryScheduled, + CallbackPending, + InvokePending, + CheckpointFailed +} + +/// +/// The result of a termination signal. +/// +internal sealed class TerminationResult +{ + public required TerminationReason Reason { get; init; } + public string? Message { get; init; } + public Exception? Exception { get; init; } +} + +/// +/// Manages the suspension signal for durable execution. +/// Uses a TaskCompletionSource that resolves when the function should suspend. +/// Only the first Terminate() call wins; subsequent calls are ignored. +/// +internal sealed class TerminationManager +{ + private readonly TaskCompletionSource _tcs = new(TaskCreationOptions.RunContinuationsAsynchronously); + private int _terminated; + + /// + /// A Task that resolves when Terminate() is called. Used in Task.WhenAny + /// to race against user code. + /// + public Task TerminationTask => _tcs.Task; + + /// + /// Whether Terminate() has been called. + /// + public bool IsTerminated => Volatile.Read(ref _terminated) == 1; + + /// + /// Signals that the execution should suspend. Thread-safe; only the first + /// call has effect. + /// + /// true if this call triggered termination, false if already terminated. + public bool Terminate(TerminationReason reason, string? message = null, Exception? exception = null) + { + if (Interlocked.CompareExchange(ref _terminated, 1, 0) != 0) + return false; + + _tcs.TrySetResult(new TerminationResult + { + Reason = reason, + Message = message, + Exception = exception + }); + + return true; + } + + /// + /// Trips the termination signal and returns a Task that never completes. + /// This is the standard suspension idiom: the caller awaits the returned + /// Task, and 's Task.WhenAny + /// race picks up instead, returning Pending + /// to the service. The returned Task is abandoned and GC'd. + /// + public Task SuspendAndAwait(TerminationReason reason, string? message = null, Exception? exception = null) + { + Terminate(reason, message, exception); + return new TaskCompletionSource().Task; + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/UpperSnakeCaseEnumConverter.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/UpperSnakeCaseEnumConverter.cs new file mode 100644 index 000000000..9610ca5f4 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/UpperSnakeCaseEnumConverter.cs @@ -0,0 +1,64 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// Converts between UPPER_SNAKE_CASE wire format (e.g., CHAINED_INVOKE) +/// and PascalCase enum values (e.g., ChainedInvoke). +/// +/// +public sealed class UpperSnakeCaseEnumConverter : JsonConverter where T : struct, Enum +{ + /// + public override T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) + { + if (reader.TokenType == JsonTokenType.Null) + return default; + + var value = reader.GetString(); + if (value == null) + return default; + + // Convert UPPER_SNAKE_CASE to PascalCase for enum lookup + var pascalCase = SnakeToPascal(value); + + if (Enum.TryParse(pascalCase, ignoreCase: true, out var result)) + return result; + + // Fallback: try direct case-insensitive parse of the raw value + if (Enum.TryParse(value, ignoreCase: true, out result)) + return result; + + throw new JsonException($"Unable to parse '{value}' as {typeof(T).Name}."); + } + + /// + public override void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options) + { + writer.WriteStringValue(PascalToSnake(value.ToString())); + } + + private static string SnakeToPascal(string snake) + { + var parts = snake.Split('_'); + for (int i = 0; i < parts.Length; i++) + { + if (parts[i].Length > 0) + parts[i] = char.ToUpper(parts[i][0]) + parts[i][1..].ToLower(); + } + return string.Join("", parts); + } + + private static string PascalToSnake(string pascal) + { + var result = new System.Text.StringBuilder(); + for (int i = 0; i < pascal.Length; i++) + { + if (i > 0 && char.IsUpper(pascal[i])) + result.Append('_'); + result.Append(char.ToUpper(pascal[i])); + } + return result.ToString(); + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Internal/WaitOperation.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/WaitOperation.cs new file mode 100644 index 000000000..59254827d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Internal/WaitOperation.cs @@ -0,0 +1,91 @@ +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; +using SdkWaitOptions = Amazon.Lambda.Model.WaitOptions; + +namespace Amazon.Lambda.DurableExecution.Internal; + +/// +/// Durable wait operation. Suspends the workflow for a given duration without +/// consuming compute time; the service schedules a timer and re-invokes Lambda +/// when it fires. +/// +/// +/// Replay semantics — example: await ctx.WaitAsync(TimeSpan.FromHours(1)) +/// +/// Fresh: emit WAIT START → flush → suspend → service schedules timer. +/// Replay (SUCCEEDED): timer fired, return CompletedTask. +/// Replay (STARTED/PENDING): timer still ticking → re-suspend (or +/// short-circuit if the deadline already elapsed but SUCCEEDED hasn't +/// been stamped yet). +/// +/// See for the +/// suspension mechanics (Task.WhenAny race against TerminationManager). +/// +internal sealed class WaitOperation : DurableOperation +{ + private readonly int _waitSeconds; + + public WaitOperation( + string operationId, + string? name, + int waitSeconds, + ExecutionState state, + TerminationManager termination, + string durableExecutionArn, + CheckpointBatcher? batcher = null) + : base(operationId, name, state, termination, durableExecutionArn, batcher) + { + _waitSeconds = waitSeconds; + } + + protected override string OperationType => OperationTypes.Wait; + + protected override async Task StartAsync(CancellationToken cancellationToken) + { + // Sync-flush WAIT START before suspending — the service can't schedule + // a timer for a checkpoint it hasn't received. + await EnqueueAsync(new SdkOperationUpdate + { + Id = OperationId, + Type = OperationTypes.Wait, + Action = "START", + SubType = "Wait", + Name = Name, + WaitOptions = new SdkWaitOptions { WaitSeconds = _waitSeconds } + }, cancellationToken); + + return await Termination.SuspendAndAwait( + TerminationReason.WaitScheduled, $"wait:{Name ?? OperationId}"); + } + + protected override Task ReplayAsync(Operation existing, CancellationToken cancellationToken) + { + switch (existing.Status) + { + case OperationStatuses.Succeeded: + // Common post-timer case: service stamped the wait as SUCCEEDED + // and re-invoked Lambda. Workflow proceeds to the next step. + return Task.FromResult(null); + + case OperationStatuses.Started: + case OperationStatuses.Pending: + // Service hasn't marked the wait complete yet. Either the timer + // is still ticking, or the deadline elapsed but SUCCEEDED hasn't + // been stamped yet — treat elapsed deadlines as "done" to avoid + // a pointless extra round-trip. + var expiresAtMs = existing.WaitDetails?.ScheduledEndTimestamp; + if (expiresAtMs is { } ts && DateTimeOffset.UtcNow.ToUnixTimeMilliseconds() >= ts) + { + return Task.FromResult(null); + } + + // Timer still ticking — re-suspend without re-checkpointing. + // The original WAIT START is still authoritative. + return Termination.SuspendAndAwait( + TerminationReason.WaitScheduled, $"wait:{Name ?? OperationId}"); + + default: + throw new NonDeterministicExecutionException( + $"Wait operation '{Name ?? OperationId}' has unexpected status '{existing.Status}' on replay."); + } + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationInput.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationInput.cs new file mode 100644 index 000000000..35bc32ecd --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationInput.cs @@ -0,0 +1,53 @@ +using System.Text.Json.Serialization; +using Amazon.Lambda.DurableExecution.Internal; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// The service envelope input for a durable execution invocation. +/// This is what Lambda receives from the durable execution service. +/// +public sealed class DurableExecutionInvocationInput +{ + /// + /// The unique ARN identifying this durable execution. + /// + [JsonPropertyName("DurableExecutionArn")] + public required string DurableExecutionArn { get; set; } + + /// + /// Token for optimistic concurrency on checkpoint operations. + /// + [JsonPropertyName("CheckpointToken")] + public string? CheckpointToken { get; set; } + + /// + /// Previously checkpointed operation state for replay. Internal — consumed + /// only by DurableFunction.WrapAsync for replay correlation; user code + /// should never read or modify this. Marked + /// so System.Text.Json populates it during deserialization despite being internal + /// (framework needs it, but it's not part of the public API contract). + /// + [JsonPropertyName("InitialExecutionState")] + [JsonInclude] + internal InitialExecutionState? InitialExecutionState { get; set; } +} + +/// +/// The previously checkpointed execution state provided on replay invocations. +/// +internal sealed class InitialExecutionState +{ + /// + /// The list of operations from prior invocations. + /// + [JsonPropertyName("Operations")] + public IReadOnlyList? Operations { get; set; } + + /// + /// If present, indicates that more operations are available. Use this value + /// with GetDurableExecutionState to fetch the next page. + /// + [JsonPropertyName("NextMarker")] + public string? NextMarker { get; set; } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationOutput.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationOutput.cs new file mode 100644 index 000000000..602f0b245 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Models/DurableExecutionInvocationOutput.cs @@ -0,0 +1,29 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// The service envelope output returned by a durable execution invocation. +/// +public sealed class DurableExecutionInvocationOutput +{ + /// + /// The terminal status of this invocation. + /// + [JsonPropertyName("Status")] + [JsonConverter(typeof(UpperSnakeCaseEnumConverter))] + public required InvocationStatus Status { get; set; } + + /// + /// The serialized result (only present when Status is Succeeded). + /// + [JsonPropertyName("Result")] + public string? Result { get; set; } + + /// + /// Error details (only present when Status is Failed). + /// + [JsonPropertyName("Error")] + public ErrorObject? Error { get; set; } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Models/ErrorObject.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Models/ErrorObject.cs new file mode 100644 index 000000000..20acac47f --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Models/ErrorObject.cs @@ -0,0 +1,46 @@ +using System.Text.Json.Serialization; + +namespace Amazon.Lambda.DurableExecution; + +/// +/// Serializable error representation stored in checkpoint state. +/// +public sealed class ErrorObject +{ + /// + /// The fully-qualified exception type name. + /// + [JsonPropertyName("ErrorType")] + public string? ErrorType { get; set; } + + /// + /// The exception message. + /// + [JsonPropertyName("ErrorMessage")] + public string? ErrorMessage { get; set; } + + /// + /// Stack trace frames. + /// + [JsonPropertyName("StackTrace")] + public IReadOnlyList? StackTrace { get; set; } + + /// + /// Additional serialized error data. + /// + [JsonPropertyName("ErrorData")] + public string? ErrorData { get; set; } + + /// + /// Creates an ErrorObject from an exception. + /// + public static ErrorObject FromException(Exception exception) + { + return new ErrorObject + { + ErrorType = exception.GetType().FullName, + ErrorMessage = exception.Message, + StackTrace = exception.StackTrace?.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) + }; + } +} diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs b/Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs new file mode 100644 index 000000000..709341760 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs @@ -0,0 +1,108 @@ +using Amazon.Lambda.DurableExecution.Internal; +using Amazon.Lambda.Model; +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; +using SdkOperation = Amazon.Lambda.Model.Operation; + +namespace Amazon.Lambda.DurableExecution.Services; + +/// +/// Calls the real AWS Lambda Durable Execution APIs via the AWSSDK.Lambda client. +/// +internal sealed class LambdaDurableServiceClient +{ + private readonly IAmazonLambda _lambdaClient; + + public LambdaDurableServiceClient(IAmazonLambda lambdaClient) + { + _lambdaClient = lambdaClient; + } + + /// + /// Flushes pending checkpoint operations to the durable execution service. + /// + public async Task CheckpointAsync( + string durableExecutionArn, + string? checkpointToken, + IReadOnlyList pendingOperations, + CancellationToken cancellationToken = default) + { + if (pendingOperations.Count == 0) + return checkpointToken; + + var request = new CheckpointDurableExecutionRequest + { + DurableExecutionArn = durableExecutionArn, + CheckpointToken = checkpointToken ?? "", + Updates = pendingOperations is List list ? list : pendingOperations.ToList() + }; + + var response = await _lambdaClient.CheckpointDurableExecutionAsync(request, cancellationToken); + return response.CheckpointToken; + } + + /// + /// Fetches additional pages of execution state when the initial state is paginated. + /// + public async Task<(List Operations, string? NextMarker)> GetExecutionStateAsync( + string durableExecutionArn, + string? checkpointToken, + string marker, + CancellationToken cancellationToken = default) + { + var request = new GetDurableExecutionStateRequest + { + DurableExecutionArn = durableExecutionArn, + CheckpointToken = checkpointToken ?? "", + Marker = marker + }; + + var response = await _lambdaClient.GetDurableExecutionStateAsync(request, cancellationToken); + + var operations = new List(); + if (response.Operations != null) + { + foreach (var sdkOp in response.Operations) + { + operations.Add(MapFromSdkOperation(sdkOp)); + } + } + + return (operations, response.NextMarker); + } + + private static Internal.Operation MapFromSdkOperation(SdkOperation sdkOp) + { + return new Internal.Operation + { + Id = sdkOp.Id, + Type = sdkOp.Type, + Status = sdkOp.Status, + Name = sdkOp.Name, + ParentId = sdkOp.ParentId, + SubType = sdkOp.SubType, + StepDetails = sdkOp.StepDetails != null ? new Internal.StepDetails + { + Result = sdkOp.StepDetails.Result, + Error = sdkOp.StepDetails.Error != null ? new ErrorObject + { + ErrorType = sdkOp.StepDetails.Error.ErrorType, + ErrorMessage = sdkOp.StepDetails.Error.ErrorMessage + } : null, + Attempt = sdkOp.StepDetails.Attempt, + NextAttemptTimestamp = sdkOp.StepDetails.NextAttemptTimestamp.HasValue + ? new DateTimeOffset(sdkOp.StepDetails.NextAttemptTimestamp.Value, TimeSpan.Zero).ToUnixTimeMilliseconds() + : null + } : null, + WaitDetails = sdkOp.WaitDetails != null ? new Internal.WaitDetails + { + ScheduledEndTimestamp = sdkOp.WaitDetails.ScheduledEndTimestamp.HasValue + ? new DateTimeOffset(sdkOp.WaitDetails.ScheduledEndTimestamp.Value, TimeSpan.Zero).ToUnixTimeMilliseconds() + : null + } : null, + ExecutionDetails = sdkOp.ExecutionDetails != null ? new Internal.ExecutionDetails + { + InputPayload = sdkOp.ExecutionDetails.InputPayload + } : null + }; + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Amazon.Lambda.DurableExecution.AotPublishTest.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Amazon.Lambda.DurableExecution.AotPublishTest.csproj new file mode 100644 index 000000000..ec4d0ffd0 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Amazon.Lambda.DurableExecution.AotPublishTest.csproj @@ -0,0 +1,24 @@ + + + + Exe + net8.0 + enable + enable + true + true + full + false + true + IL2026,IL2067,IL2075,IL3050 + false + + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Program.cs b/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Program.cs new file mode 100644 index 000000000..af84aca8c --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.AotPublishTest/Program.cs @@ -0,0 +1,81 @@ +using System.Text.Json.Serialization; +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace Amazon.Lambda.DurableExecution.AotPublishTest; + +/// +/// AOT publish smoke check. This program must publish under NativeAOT with +/// zero IL2026/IL3050 warnings (promoted to errors by the csproj). It uses +/// the JsonSerializerContext overload of WrapAsync. +/// +public class Program +{ + public static async Task Main() + { + var serializer = new SourceGeneratorLambdaJsonSerializer(); + Func> handler = HandlerAsync; + await LambdaBootstrapBuilder + .Create(handler, serializer) + .Build() + .RunAsync(); + } + + public static Task HandlerAsync( + DurableExecutionInvocationInput input, ILambdaContext context) => + DurableFunction.WrapAsync( + WorkflowAsync, input, context, AotJsonContext.Default); + + private static async Task WorkflowAsync(OrderEvent input, IDurableContext context) + { + var validation = await context.StepAsync( + async (_) => + { + await Task.CompletedTask; + return new ValidationResult { IsValid = true }; + }, + new ValidationResultSerializer(), + name: "validate"); + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "delay"); + + return new OrderResult { Status = validation.IsValid ? "approved" : "rejected", OrderId = input.OrderId }; + } + + private sealed class ValidationResultSerializer : ICheckpointSerializer + { + public string Serialize(ValidationResult value, SerializationContext ctx) => + System.Text.Json.JsonSerializer.Serialize(value, AotJsonContext.Default.ValidationResult); + + public ValidationResult Deserialize(string data, SerializationContext ctx) => + System.Text.Json.JsonSerializer.Deserialize(data, AotJsonContext.Default.ValidationResult) + ?? new ValidationResult(); + } + + public class OrderEvent + { + public string? OrderId { get; set; } + } + + public class OrderResult + { + public string? Status { get; set; } + public string? OrderId { get; set; } + } + + public class ValidationResult + { + public bool IsValid { get; set; } + } +} + +[JsonSerializable(typeof(DurableExecutionInvocationInput))] +[JsonSerializable(typeof(DurableExecutionInvocationOutput))] +[JsonSerializable(typeof(Program.OrderEvent))] +[JsonSerializable(typeof(Program.OrderResult))] +[JsonSerializable(typeof(Program.ValidationResult))] +public partial class AotJsonContext : JsonSerializerContext +{ +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/Amazon.Lambda.DurableExecution.IntegrationTests.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/Amazon.Lambda.DurableExecution.IntegrationTests.csproj new file mode 100644 index 000000000..0ef2e561d --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/Amazon.Lambda.DurableExecution.IntegrationTests.csproj @@ -0,0 +1,43 @@ + + + + + + + $(DefaultPackageTargets) + enable + enable + false + true + $(NoWarn);NU1903;CS1591 + + + + + + + + + + + + + PreserveNewest + + + + + + + + + + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/DurableFunctionDeployment.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/DurableFunctionDeployment.cs new file mode 100644 index 000000000..b2ba4bb1a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/DurableFunctionDeployment.cs @@ -0,0 +1,492 @@ +using System.Text; +using System.Text.Json; +using Amazon; +using Amazon.ECR; +using Amazon.ECR.Model; +using Amazon.IdentityManagement; +using Amazon.IdentityManagement.Model; +using Amazon.Lambda; +using Amazon.Lambda.Model; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +/// +/// Builds, deploys, and invokes a single durable Lambda function for an integration test. +/// Manages the full lifecycle: IAM role, ECR repo, Docker image, Lambda function. +/// All resources are torn down on DisposeAsync. +/// +internal sealed class DurableFunctionDeployment : IAsyncDisposable +{ + private readonly ITestOutputHelper _output; + private readonly IAmazonLambda _lambdaClient; + private readonly IAmazonECR _ecrClient; + private readonly IAmazonIdentityManagementService _iamClient; + + private readonly string _functionName; + private readonly string _repoName; + private readonly string _roleName; + private string? _roleArn; + private string? _imageUri; + private bool _functionCreated; + private bool _ecrRepoCreated; + + public string FunctionName => _functionName; + public IAmazonLambda LambdaClient => _lambdaClient; + + private DurableFunctionDeployment(ITestOutputHelper output, string suffix) + { + _output = output; + _lambdaClient = new AmazonLambdaClient(RegionEndpoint.USEast1); + _ecrClient = new AmazonECRClient(RegionEndpoint.USEast1); + _iamClient = new AmazonIdentityManagementServiceClient(RegionEndpoint.USEast1); + + // Truncate the GUID (not the suffix) so CloudTrail entries stay readable. + // Keep the GUID short enough that the total stays well under 40 chars even for long suffixes. + static string ShortId() => Guid.NewGuid().ToString("N")[..Math.Min(8, 32)]; + _functionName = $"durable-integ-{suffix}-{ShortId()}"; + _repoName = $"durable-integ-{suffix}-{ShortId()}"; + _roleName = $"durable-integ-{suffix}-{ShortId()}"; + } + + public static async Task CreateAsync( + string testFunctionDir, + string scenarioSuffix, + ITestOutputHelper output) + { + var deployment = new DurableFunctionDeployment(output, scenarioSuffix); + try + { + await deployment.InitializeAsync(testFunctionDir); + } + catch + { + // Tear down anything that did get created (IAM role, ECR repo) so we + // don't leak resources when init fails part-way through. + await deployment.DisposeAsync(); + throw; + } + return deployment; + } + + private async Task InitializeAsync(string testFunctionDir) + { + // 1. Create IAM role + _output.WriteLine($"Creating IAM role: {_roleName}"); + var assumeRolePolicy = """ + { + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": {"Service": "lambda.amazonaws.com"}, + "Action": "sts:AssumeRole" + }] + } + """; + + var createRoleResponse = await _iamClient.CreateRoleAsync(new CreateRoleRequest + { + RoleName = _roleName, + AssumeRolePolicyDocument = assumeRolePolicy + }); + _roleArn = createRoleResponse.Role.Arn; + + await _iamClient.AttachRolePolicyAsync(new AttachRolePolicyRequest + { + RoleName = _roleName, + PolicyArn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" + }); + + await _iamClient.AttachRolePolicyAsync(new AttachRolePolicyRequest + { + RoleName = _roleName, + PolicyArn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicDurableExecutionRolePolicy" + }); + + // Wait for IAM propagation + await Task.Delay(TimeSpan.FromSeconds(10)); + + // 2. Create ECR repository + _output.WriteLine($"Creating ECR repository: {_repoName}"); + var createRepoResponse = await _ecrClient.CreateRepositoryAsync(new CreateRepositoryRequest + { + RepositoryName = _repoName + }); + _ecrRepoCreated = true; + var repositoryUri = createRepoResponse.Repository.RepositoryUri; + + // 3. Build and push Docker image + _output.WriteLine($"Building and pushing Docker image from {testFunctionDir}..."); + _imageUri = await BuildAndPushImage(testFunctionDir, repositoryUri); + _output.WriteLine($"Image pushed: {_imageUri}"); + + // 4. Create Lambda function + _output.WriteLine($"Creating Lambda function: {_functionName}"); + await _lambdaClient.CreateFunctionAsync(new CreateFunctionRequest + { + FunctionName = _functionName, + PackageType = PackageType.Image, + Role = _roleArn, + Code = new FunctionCode { ImageUri = _imageUri }, + Timeout = 30, + MemorySize = 256, + DurableConfig = new DurableConfig { ExecutionTimeout = 60 } + }); + _functionCreated = true; + + _output.WriteLine("Waiting for function to become Active..."); + await WaitForFunctionActive(); + } + + public async Task<(InvokeResponse Response, string ExecutionName)> InvokeAsync(string payload, string? executionName = null) + { + var name = executionName ?? $"integ-test-{Guid.NewGuid():N}"; + var response = await _lambdaClient.InvokeAsync(new InvokeRequest + { + FunctionName = _functionName, + Qualifier = "$LATEST", + Payload = payload, + DurableExecutionName = name + }); + return (response, name); + } + + /// + /// Polls ListDurableExecutionsByFunction until an execution with the given name appears. + /// Useful when the synchronous Invoke response gives no ARN (e.g., failed workflows return null). + /// + public async Task FindDurableExecutionArnByNameAsync(string executionName, TimeSpan timeout) + { + var deadline = DateTime.UtcNow + timeout; + var attempt = 0; + _output.WriteLine($"[FindArn] Starting search for execution name '{executionName}' on function '{_functionName}' (timeout: {timeout.TotalSeconds}s)"); + + while (DateTime.UtcNow < deadline) + { + attempt++; + try + { + var resp = await _lambdaClient.ListDurableExecutionsByFunctionAsync( + new ListDurableExecutionsByFunctionRequest + { + FunctionName = _functionName, + DurableExecutionName = executionName // server-side exact match + }); + + var count = resp.DurableExecutions?.Count ?? 0; + _output.WriteLine($"[FindArn] attempt {attempt}: List returned {count} executions"); + + if (count > 0) + { + foreach (var e in resp.DurableExecutions!) + { + _output.WriteLine($"[FindArn] - name='{e.DurableExecutionName}' status={e.Status} arn={e.DurableExecutionArn}"); + } + var match = resp.DurableExecutions.FirstOrDefault(e => e.DurableExecutionName == executionName); + if (match != null) + { + _output.WriteLine($"[FindArn] matched on attempt {attempt}"); + return match.DurableExecutionArn; + } + } + } + catch (Exception ex) + { + _output.WriteLine($"[FindArn] attempt {attempt} error (will retry): {ex.Message}"); + } + await Task.Delay(TimeSpan.FromSeconds(2)); + } + _output.WriteLine($"[FindArn] gave up after {attempt} attempts ({timeout.TotalSeconds}s)"); + return null; + } + + public async Task PollForCompletionAsync(string durableExecutionArn, TimeSpan timeout) + { + var deadline = DateTime.UtcNow + timeout; + + while (DateTime.UtcNow < deadline) + { + try + { + var resp = await _lambdaClient.GetDurableExecutionAsync( + new GetDurableExecutionRequest { DurableExecutionArn = durableExecutionArn }); + + var status = resp.Status?.ToString(); + if (status == "SUCCEEDED" || status == "FAILED" || + status == "TIMED_OUT" || status == "STOPPED") + { + return status; + } + } + catch (Exception ex) + { + _output.WriteLine($"Poll error (will retry): {ex.Message}"); + } + + await Task.Delay(TimeSpan.FromSeconds(2)); + } + + return "TIMEOUT"; + } + + public async Task GetExecutionAsync(string durableExecutionArn) + => await _lambdaClient.GetDurableExecutionAsync( + new GetDurableExecutionRequest { DurableExecutionArn = durableExecutionArn }); + + public async Task GetHistoryAsync(string durableExecutionArn, bool includeExecutionData = true) + => await _lambdaClient.GetDurableExecutionHistoryAsync( + new GetDurableExecutionHistoryRequest + { + DurableExecutionArn = durableExecutionArn, + IncludeExecutionData = includeExecutionData + }); + + /// + /// Repeatedly fetches history until is satisfied or the + /// timeout elapses. Needed because the history endpoint is eventually consistent — + /// the execution status can flip to SUCCEEDED before all events are indexed. + /// + public async Task WaitForHistoryAsync( + string durableExecutionArn, + Func predicate, + TimeSpan timeout, + bool includeExecutionData = true) + { + var deadline = DateTime.UtcNow + timeout; + GetDurableExecutionHistoryResponse? last = null; + var attempt = 0; + + while (DateTime.UtcNow < deadline) + { + attempt++; + try + { + last = await GetHistoryAsync(durableExecutionArn, includeExecutionData); + var eventCount = last.Events?.Count ?? 0; + var typeCounts = last.Events? + .GroupBy(e => e.EventType?.Value ?? "") + .Select(g => $"{g.Key}:{g.Count()}") + .OrderBy(s => s); + _output.WriteLine($"[WaitForHistory] attempt {attempt}: {eventCount} events [{string.Join(",", typeCounts ?? Enumerable.Empty())}]"); + if (predicate(last)) + { + DumpEvents(last); + return last; + } + } + catch (Exception ex) + { + _output.WriteLine($"[WaitForHistory] attempt {attempt} error (will retry): {ex.Message}"); + } + await Task.Delay(TimeSpan.FromSeconds(2)); + } + + _output.WriteLine($"[WaitForHistory] gave up after {attempt} attempts; returning last response with {last?.Events?.Count ?? 0} events"); + if (last != null) DumpEvents(last); + return last ?? throw new TimeoutException($"GetDurableExecutionHistory never succeeded within {timeout.TotalSeconds}s"); + } + + private void DumpEvents(GetDurableExecutionHistoryResponse history) + { + var events = history.Events ?? new List(); + _output.WriteLine($"[WaitForHistory] event dump ({events.Count} total):"); + for (int i = 0; i < events.Count; i++) + { + var e = events[i]; + _output.WriteLine($" [{i}] type={e.EventType?.Value ?? ""} name={e.Name ?? ""} ts={e.EventTimestamp:O}"); + } + } + + public string? ExtractDurableExecutionArn(string responsePayload) + { + try + { + var doc = JsonDocument.Parse(responsePayload); + if (doc.RootElement.TryGetProperty("durableExecutionArn", out var arnProp)) + return arnProp.GetString(); + } + catch { } + return null; + } + + private async Task WaitForFunctionActive() + { + for (int i = 0; i < 60; i++) + { + try + { + var config = await _lambdaClient.GetFunctionConfigurationAsync( + new GetFunctionConfigurationRequest { FunctionName = _functionName }); + if (config.State == State.Active) return; + if (config.State == State.Failed) + throw new Exception($"Function creation failed: {config.StateReasonCode} - {config.StateReason}"); + } + catch (ResourceNotFoundException) { } + await Task.Delay(TimeSpan.FromSeconds(2)); + } + throw new TimeoutException("Function did not become Active within 120 seconds"); + } + + private async Task BuildAndPushImage(string testFunctionDir, string repositoryUri) + { + var publishDir = Path.Combine(testFunctionDir, "bin", "publish"); + if (Directory.Exists(publishDir)) Directory.Delete(publishDir, true); + + await RunProcess("dotnet", + $"publish -c Release -r linux-x64 --self-contained true -o \"{publishDir}\"", + testFunctionDir); + + var imageTag = $"{repositoryUri}:latest"; + await RunProcess("docker", + $"build --platform linux/amd64 --provenance=false -t {imageTag} .", + testFunctionDir); + + var authResponse = await _ecrClient.GetAuthorizationTokenAsync(new GetAuthorizationTokenRequest()); + var authData = authResponse.AuthorizationData[0]; + var token = Encoding.UTF8.GetString(Convert.FromBase64String(authData.AuthorizationToken)); + var parts = token.Split(':'); + var registryUrl = authData.ProxyEndpoint; + + await RunProcess("docker", + $"login --username {parts[0]} --password-stdin {registryUrl}", + testFunctionDir, + stdin: parts[1]); + + await RunProcess("docker", $"push {imageTag}", testFunctionDir); + + return imageTag; + } + + private async Task RunProcess(string fileName, string arguments, string workingDir, string? stdin = null) + { + _output.WriteLine($"Running: {fileName} {arguments}"); + var psi = new System.Diagnostics.ProcessStartInfo + { + FileName = fileName, + Arguments = arguments, + WorkingDirectory = workingDir, + RedirectStandardOutput = true, + RedirectStandardError = true, + RedirectStandardInput = stdin != null, + UseShellExecute = false + }; + + var process = System.Diagnostics.Process.Start(psi)!; + + if (stdin != null) + { + await process.StandardInput.WriteAsync(stdin); + process.StandardInput.Close(); + } + + var stdoutTask = process.StandardOutput.ReadToEndAsync(); + var stderrTask = process.StandardError.ReadToEndAsync(); + + await Task.WhenAny( + process.WaitForExitAsync(), + Task.Delay(TimeSpan.FromMinutes(5))); + + if (!process.HasExited) + { + process.Kill(); + throw new TimeoutException($"{fileName} timed out after 5 minutes"); + } + + var stdout = await stdoutTask; + var stderr = await stderrTask; + + if (process.ExitCode != 0) + { + // Dump the FULL streams on failure — diagnosing build errors with + // truncated output is painful, and these only fire on test failure. + _output.WriteLine($"stdout: {stdout}"); + _output.WriteLine($"stderr: {stderr}"); + var detail = !string.IsNullOrWhiteSpace(stderr) ? stderr : stdout; + throw new Exception($"{fileName} failed (exit {process.ExitCode}): {detail}"); + } + + if (!string.IsNullOrWhiteSpace(stdout)) + _output.WriteLine($"stdout: {stdout[..Math.Min(stdout.Length, 1000)]}"); + } + + public async ValueTask DisposeAsync() + { + if (_functionCreated) + { + try + { + _output.WriteLine($"Deleting function: {_functionName}"); + await _lambdaClient.DeleteFunctionAsync(new DeleteFunctionRequest { FunctionName = _functionName }); + } + catch (Exception ex) { _output.WriteLine($"Cleanup error (function): {ex.Message}"); } + } + + if (_ecrRepoCreated) + { + try + { + _output.WriteLine($"Deleting ECR repository: {_repoName}"); + await _ecrClient.DeleteRepositoryAsync(new DeleteRepositoryRequest + { + RepositoryName = _repoName, + Force = true + }); + } + catch (Exception ex) { _output.WriteLine($"Cleanup error (ECR): {ex.Message}"); } + } + + if (_roleArn != null) + { + // Detach each policy independently — if one detach fails (e.g., the + // policy was never attached because init bailed out early) we still + // want to attempt the others and the final DeleteRole. + await TryDetachPolicy("arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"); + await TryDetachPolicy("arn:aws:iam::aws:policy/service-role/AWSLambdaBasicDurableExecutionRolePolicy"); + try + { + await _iamClient.DeleteRoleAsync(new DeleteRoleRequest { RoleName = _roleName }); + } + catch (Exception ex) { _output.WriteLine($"Cleanup error (IAM DeleteRole): {ex.Message}"); } + } + + async Task TryDetachPolicy(string policyArn) + { + try + { + await _iamClient.DetachRolePolicyAsync(new DetachRolePolicyRequest + { + RoleName = _roleName, + PolicyArn = policyArn + }); + } + catch (Exception ex) { _output.WriteLine($"Cleanup error (IAM Detach {policyArn}): {ex.Message}"); } + } + } + + public static string FindTestFunctionDir(string functionDirName) + { + var dir = AppContext.BaseDirectory; + while (dir != null) + { + var candidate = Path.Combine(dir, "TestFunctions", functionDirName); + if (Directory.Exists(candidate)) + return candidate; + + // Also check legacy "TestFunction" location for backwards compat + var legacy = Path.Combine(dir, functionDirName); + if (Directory.Exists(legacy) && File.Exists(Path.Combine(legacy, $"{functionDirName}.csproj"))) + return legacy; + + dir = Path.GetDirectoryName(dir); + } + + // Fallback: relative from test source directory + var fallback = Path.GetFullPath( + Path.Combine(AppContext.BaseDirectory, "..", "..", "..", "TestFunctions", functionDirName)); + if (Directory.Exists(fallback)) + return fallback; + + throw new DirectoryNotFoundException( + $"Could not find TestFunctions/{functionDirName}/ directory. Looked up from: {AppContext.BaseDirectory}"); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/LongerWaitTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/LongerWaitTest.cs new file mode 100644 index 000000000..bfc2913ed --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/LongerWaitTest.cs @@ -0,0 +1,65 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class LongerWaitTest +{ + private readonly ITestOutputHelper _output; + public LongerWaitTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task LongerWait_ExpiresAndCompletes() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("LongerWaitFunction"), + "longwait", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "long-wait-test"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(90)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Count(e => e.EventType == EventType.StepStarted) ?? 0) >= 2 + && (h.Events?.Count(e => e.StepSucceededDetails != null) ?? 0) >= 2 + && (h.Events?.Any(e => e.WaitSucceededDetails != null) ?? false), + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + Assert.Equal(2, events.Count(e => e.EventType == EventType.StepStarted)); + + // Steps before and after the wait both ran, with the post-wait step seeing + // the pre-wait step's value via replay. + var stepResults = events + .Where(e => e.StepSucceededDetails != null) + .Select(e => (Name: e.Name, Payload: e.StepSucceededDetails.Result?.Payload?.Trim('"'))) + .ToList(); + Assert.Equal(2, stepResults.Count); + Assert.Equal("before_wait", stepResults[0].Name); + Assert.Equal("started-long-wait-test", stepResults[0].Payload); + Assert.Equal("after_wait", stepResults[1].Name); + Assert.Equal("after_wait-started-long-wait-test", stepResults[1].Payload); + + // The wait was checkpointed for the configured 15-second duration. + var waitStarted = events.FirstOrDefault(e => e.WaitStartedDetails != null && e.Name == "long_wait"); + Assert.NotNull(waitStarted); + Assert.Equal(15, waitStarted!.WaitStartedDetails.Duration); + + // The wait spanned at least two invocations: one to schedule it and at + // least one to resume after the timer fires. + var invocations = events.Where(e => e.InvocationCompletedDetails != null).ToList(); + Assert.True( + invocations.Count >= 2, + $"Expected at least 2 InvocationCompleted events (suspend + resume), got {invocations.Count}"); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/MultipleStepsTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/MultipleStepsTest.cs new file mode 100644 index 000000000..6b0ae0bc7 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/MultipleStepsTest.cs @@ -0,0 +1,59 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class MultipleStepsTest +{ + private readonly ITestOutputHelper _output; + public MultipleStepsTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task MultipleSteps_AllCheckpointed() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("MultipleStepsFunction"), + "multi", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "chain"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(60)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + // History is eventually consistent — the execution can be SUCCEEDED before + // all events are indexed. Wait until we see all 5 step-succeeded events. + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Count(e => e.EventType == EventType.StepStarted) ?? 0) >= 5 + && (h.Events?.Count(e => e.StepSucceededDetails != null) ?? 0) >= 5, + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + Assert.Equal(5, events.Count(e => e.EventType == EventType.StepStarted)); + + // Each step ran exactly once (no replay-induced duplicates) in declaration order, + // and each step's output chained from the previous one. + var stepResults = events + .Where(e => e.StepSucceededDetails != null) + .Select(e => $"{e.Name}={e.StepSucceededDetails.Result?.Payload?.Trim('"')}") + .ToList(); + Assert.Equal( + new[] + { + "step_1=a-chain", + "step_2=a-chain-b", + "step_3=a-chain-b-c", + "step_4=a-chain-b-c-d", + "step_5=a-chain-b-c-d-e", + }, + stepResults); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ReplayDeterminismTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ReplayDeterminismTest.cs new file mode 100644 index 000000000..137bb28b8 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ReplayDeterminismTest.cs @@ -0,0 +1,70 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class ReplayDeterminismTest +{ + private readonly ITestOutputHelper _output; + public ReplayDeterminismTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task ReplayDeterminism_SameGuidAcrossInvocations() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("ReplayDeterminismFunction"), + "replay", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "replay-test"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(60)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + // History is eventually consistent — wait until both step-succeeded events are visible. + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Count(e => e.EventType == EventType.StepStarted) ?? 0) >= 2 + && (h.Events?.Count(e => e.StepSucceededDetails != null) ?? 0) >= 2, + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + Assert.Equal(2, events.Count(e => e.EventType == EventType.StepStarted)); + + // Each step succeeded exactly once — generate_id was NOT re-executed on replay + // (a duplicate would show up as two succeeded events for the same name). + var stepSucceededEvents = events.Where(e => e.StepSucceededDetails != null).ToList(); + Assert.Equal(2, stepSucceededEvents.Count); + Assert.Single(stepSucceededEvents.Where(e => e.Name == "generate_id")); + Assert.Single(stepSucceededEvents.Where(e => e.Name == "echo_id")); + + var generateEvent = stepSucceededEvents.First(e => e.Name == "generate_id"); + var echoEvent = stepSucceededEvents.First(e => e.Name == "echo_id"); + + var generatedGuid = generateEvent.StepSucceededDetails.Result?.Payload?.Trim('"'); + var echoedResult = echoEvent.StepSucceededDetails.Result?.Payload?.Trim('"'); + Assert.NotNull(generatedGuid); + Assert.NotNull(echoedResult); + Assert.True(Guid.TryParse(generatedGuid, out _), + $"generate_id should produce a valid GUID, got: {generatedGuid}"); + + // The echoed value matches the cached GUID — proves replay returned the + // checkpointed value rather than running generate_id again. + Assert.Equal($"echo:{generatedGuid}", echoedResult); + + // The boundary wait actually caused a suspend/resume cycle. + var waitStarted = events.FirstOrDefault(e => e.WaitStartedDetails != null && e.Name == "boundary_wait"); + Assert.NotNull(waitStarted); + var invocations = events.Where(e => e.InvocationCompletedDetails != null).ToList(); + Assert.True( + invocations.Count >= 2, + $"Expected at least 2 InvocationCompleted events (proves replay actually happened), got {invocations.Count}"); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/RetryTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/RetryTest.cs new file mode 100644 index 000000000..82be3d105 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/RetryTest.cs @@ -0,0 +1,78 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class RetryTest +{ + private readonly ITestOutputHelper _output; + public RetryTest(ITestOutputHelper output) => _output = output; + + /// + /// End-to-end retry: step throws on attempts 1 and 2, succeeds on attempt 3. + /// Validates that the service honors the RETRY checkpoint, schedules the + /// requested delay, and re-invokes the Lambda — none of which the unit + /// tests can prove (they fake state transitions in-memory). + /// + [Fact] + public async Task FlakyStep_RetriesAndSucceedsOnThirdAttempt() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("RetryFunction"), + "retry", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "x"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + // Initial invoke returns when the SDK suspends after the first failure. + // The execution continues asynchronously via service-driven re-invokes. + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + // Total expected wall time: 2s + 4s of retry delay + execution overhead. + // Allow generous headroom for service scheduling latency. + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(120)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Count(e => e.EventType == EventType.StepStarted) ?? 0) >= 3 + && (h.Events?.Any(e => e.StepSucceededDetails != null) ?? false), + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + // Three attempts ran (attempts 1, 2, 3). + Assert.Equal(3, events.Count(e => e.EventType == EventType.StepStarted)); + + // Two failed attempts recorded retry metadata; the final attempt succeeded. + Assert.Equal(2, events.Count(e => e.StepFailedDetails != null && e.Name == "flaky_step")); + var succeeded = events.SingleOrDefault(e => e.StepSucceededDetails != null && e.Name == "flaky_step"); + Assert.NotNull(succeeded); + Assert.Equal("\"ok on attempt 3\"", succeeded!.StepSucceededDetails.Result?.Payload); + + // The two recorded failure messages reflect the per-attempt exception. + var failures = events + .Where(e => e.StepFailedDetails != null && e.Name == "flaky_step") + .Select(e => e.StepFailedDetails.Error?.Payload?.ErrorMessage ?? string.Empty) + .ToList(); + Assert.Contains(failures, m => m.Contains("attempt 1")); + Assert.Contains(failures, m => m.Contains("attempt 2")); + + // Timing check: the service must have actually waited between attempts. + // With initialDelay=2s, backoffRate=2.0, no jitter: delays are 2s and 4s. + // The gap between the first and last StepStarted should be >= 6s. + var startedTimestamps = events + .Where(e => e.EventType == EventType.StepStarted && e.EventTimestamp.HasValue) + .OrderBy(e => e.EventTimestamp!.Value) + .Select(e => e.EventTimestamp!.Value) + .ToList(); + var totalGap = startedTimestamps[^1] - startedTimestamps[0]; + _output.WriteLine($"Time between first and last attempt: {totalGap.TotalSeconds:F1}s"); + Assert.True(totalGap >= TimeSpan.FromSeconds(6), + $"Service did not honor retry delays: {totalGap.TotalSeconds:F1}s gap (expected >= 6s)"); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepFailsTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepFailsTest.cs new file mode 100644 index 000000000..b51e26b2d --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepFailsTest.cs @@ -0,0 +1,54 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class StepFailsTest +{ + private readonly ITestOutputHelper _output; + public StepFailsTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task StepFails_PropagatesAsFailedStatus() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("StepFailsFunction"), + "stepfail", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "x"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + // Failed workflows return null payload to the Invoke caller. Locate the execution + // by name and verify the service marked it FAILED. + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(60)); + Assert.Equal("FAILED", status, ignoreCase: true); + + var execution = await deployment.GetExecutionAsync(arn!); + Assert.NotNull(execution.Error); + Assert.Contains("intentional failure", execution.Error.ErrorMessage); + + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Any(e => e.EventType == EventType.StepStarted) ?? false) + && (h.Events?.Any(e => e.StepFailedDetails != null) ?? false), + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + Assert.Equal(1, events.Count(e => e.EventType == EventType.StepStarted)); + + // The failing step recorded a StepFailed event with the exception message. + var stepFailed = events.FirstOrDefault(e => e.StepFailedDetails != null && e.Name == "fail_step"); + Assert.NotNull(stepFailed); + Assert.Contains("intentional failure", stepFailed!.StepFailedDetails.Error?.Payload?.ErrorMessage ?? string.Empty); + + // No step ever succeeded — the workflow body was unreachable past the throw. + Assert.Empty(events.Where(e => e.StepSucceededDetails != null)); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepWaitStepTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepWaitStepTest.cs new file mode 100644 index 000000000..05e2bfc72 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/StepWaitStepTest.cs @@ -0,0 +1,61 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class StepWaitStepTest +{ + private readonly ITestOutputHelper _output; + public StepWaitStepTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task StepWaitStep_CompletesViaService() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("StepWaitStepFunction"), + "stepwait", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "integ-test-123"}"""); + Assert.Equal(200, invokeResponse.StatusCode); + + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(60)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Count(e => e.EventType == EventType.StepStarted) ?? 0) >= 2 + && (h.Events?.Count(e => e.StepSucceededDetails != null) ?? 0) >= 2 + && (h.Events?.Any(e => e.WaitSucceededDetails != null) ?? false), + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + Assert.Equal(2, events.Count(e => e.EventType == EventType.StepStarted)); + + // Both steps ran in order and produced the expected chained outputs. + var stepResults = events + .Where(e => e.StepSucceededDetails != null) + .Select(e => (Name: e.Name, Payload: e.StepSucceededDetails.Result?.Payload?.Trim('"'))) + .ToList(); + Assert.Equal(2, stepResults.Count); + Assert.Equal("validate", stepResults[0].Name); + Assert.Equal("validated-integ-test-123", stepResults[0].Payload); + Assert.Equal("process", stepResults[1].Name); + Assert.Equal("processed-validated-integ-test-123", stepResults[1].Payload); + + // The wait was actually scheduled with the expected duration. + var waitStarted = events.FirstOrDefault(e => e.WaitStartedDetails != null && e.Name == "short_wait"); + Assert.NotNull(waitStarted); + Assert.Equal(3, waitStarted!.WaitStartedDetails.Duration); + var waitSucceeded = events.FirstOrDefault(e => e.WaitSucceededDetails != null && e.Name == "short_wait"); + Assert.NotNull(waitSucceeded); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Function.cs new file mode 100644 index 000000000..e73a6da7e --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/Function.cs @@ -0,0 +1,40 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + var step1 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"started-{input.OrderId}"; }, + name: "before_wait"); + + await context.WaitAsync(TimeSpan.FromSeconds(15), name: "long_wait"); + + var step2 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"after_wait-{step1}"; }, + name: "after_wait"); + + return new TestResult { Status = "completed", Data = step2 }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/LongerWaitFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/LongerWaitFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/LongerWaitFunction/LongerWaitFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Function.cs new file mode 100644 index 000000000..cc80e6afa --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/Function.cs @@ -0,0 +1,50 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + var step1 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"a-{input.OrderId}"; }, + name: "step_1"); + + var step2 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"{step1}-b"; }, + name: "step_2"); + + var step3 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"{step2}-c"; }, + name: "step_3"); + + var step4 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"{step3}-d"; }, + name: "step_4"); + + var step5 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"{step4}-e"; }, + name: "step_5"); + + return new TestResult { Status = "completed", Data = step5 }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/MultipleStepsFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/MultipleStepsFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MultipleStepsFunction/MultipleStepsFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Function.cs new file mode 100644 index 000000000..ce2a333b1 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/Function.cs @@ -0,0 +1,43 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + // Step 1 generates a fresh GUID. On replay, this MUST return the cached value. + var generatedId = await context.StepAsync( + async (_) => { await Task.CompletedTask; return Guid.NewGuid().ToString(); }, + name: "generate_id"); + + // Force a suspend/resume cycle to trigger replay + await context.WaitAsync(TimeSpan.FromSeconds(3), name: "boundary_wait"); + + // Step 2 echoes the GUID. After replay, it should see the SAME GUID from step 1. + var echoed = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"echo:{generatedId}"; }, + name: "echo_id"); + + return new TestResult { Status = "completed", Data = echoed }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/ReplayDeterminismFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/ReplayDeterminismFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayDeterminismFunction/ReplayDeterminismFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Function.cs new file mode 100644 index 000000000..9ebffdf11 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/Function.cs @@ -0,0 +1,49 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + var result = await context.StepAsync( + async (ctx) => + { + await Task.CompletedTask; + if (ctx.AttemptNumber < 3) + throw new InvalidOperationException($"flake on attempt {ctx.AttemptNumber}"); + return $"ok on attempt {ctx.AttemptNumber}"; + }, + name: "flaky_step", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromSeconds(2), + maxDelay: TimeSpan.FromSeconds(10), + backoffRate: 2.0, + jitter: JitterStrategy.None) + }); + + return new TestResult { Status = "completed", Data = result }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/RetryFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/RetryFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/RetryFunction/RetryFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Function.cs new file mode 100644 index 000000000..9aeeed2a2 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/Function.cs @@ -0,0 +1,38 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + await context.StepAsync( + async (_) => + { + await Task.CompletedTask; + throw new InvalidOperationException("intentional failure for integration test"); + }, + name: "fail_step"); + + return new TestResult { Status = "should_not_reach" }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/StepFailsFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/StepFailsFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepFailsFunction/StepFailsFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Function.cs new file mode 100644 index 000000000..5b6c291df --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/Function.cs @@ -0,0 +1,40 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + var step1 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"validated-{input.OrderId}"; }, + name: "validate"); + + await context.WaitAsync(TimeSpan.FromSeconds(3), name: "short_wait"); + + var step2 = await context.StepAsync( + async (_) => { await Task.CompletedTask; return $"processed-{step1}"; }, + name: "process"); + + return new TestResult { Status = "completed", Data = step2 }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/StepWaitStepFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/StepWaitStepFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/StepWaitStepFunction/StepWaitStepFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Dockerfile b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Dockerfile new file mode 100644 index 000000000..c1913d56a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Dockerfile @@ -0,0 +1,7 @@ +FROM public.ecr.aws/lambda/provided:al2023 + +RUN dnf install -y libicu + +COPY bin/publish/ ${LAMBDA_TASK_ROOT} + +ENTRYPOINT ["/var/task/bootstrap"] diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Function.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Function.cs new file mode 100644 index 000000000..54e4ab737 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/Function.cs @@ -0,0 +1,31 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace DurableExecutionTestFunction; + +public class Function +{ + public static async Task Main(string[] args) + { + var handler = new Function(); + var serializer = new DefaultLambdaJsonSerializer(); + using var handlerWrapper = HandlerWrapper.GetHandlerWrapper(handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(handlerWrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(TestEvent input, IDurableContext context) + { + await context.WaitAsync(TimeSpan.FromSeconds(5), name: "only_wait"); + return new TestResult { Status = "completed", Data = "wait_only" }; + } +} + +public class TestEvent { public string? OrderId { get; set; } } +public class TestResult { public string? Status { get; set; } public string? Data { get; set; } } diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/WaitOnlyFunction.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/WaitOnlyFunction.csproj new file mode 100644 index 000000000..6f5f657e4 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitOnlyFunction/WaitOnlyFunction.csproj @@ -0,0 +1,18 @@ + + + + net8.0 + Exe + true + bootstrap + enable + enable + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitOnlyTest.cs b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitOnlyTest.cs new file mode 100644 index 000000000..213ce0186 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitOnlyTest.cs @@ -0,0 +1,55 @@ +using System.Linq; +using System.Text; +using Amazon.Lambda.Model; +using Xunit; +using Xunit.Abstractions; + +namespace Amazon.Lambda.DurableExecution.IntegrationTests; + +public class WaitOnlyTest +{ + private readonly ITestOutputHelper _output; + public WaitOnlyTest(ITestOutputHelper output) => _output = output; + + [Fact] + public async Task WaitOnly_NoSteps() + { + await using var deployment = await DurableFunctionDeployment.CreateAsync( + DurableFunctionDeployment.FindTestFunctionDir("WaitOnlyFunction"), + "waitonly", _output); + + var (invokeResponse, executionName) = await deployment.InvokeAsync("""{"orderId": "wait-only"}"""); + var responsePayload = Encoding.UTF8.GetString(invokeResponse.Payload.ToArray()); + _output.WriteLine($"Response: {responsePayload}"); + + var arn = await deployment.FindDurableExecutionArnByNameAsync(executionName, TimeSpan.FromSeconds(60)); + Assert.NotNull(arn); + + var status = await deployment.PollForCompletionAsync(arn!, TimeSpan.FromSeconds(60)); + Assert.Equal("SUCCEEDED", status, ignoreCase: true); + + var history = await deployment.WaitForHistoryAsync( + arn!, + h => (h.Events?.Any(e => e.WaitSucceededDetails != null) ?? false), + TimeSpan.FromSeconds(60)); + var events = history.Events ?? new List(); + + // The wait was checkpointed and ran for the configured duration. + var waitStarted = events.FirstOrDefault(e => e.WaitStartedDetails != null && e.Name == "only_wait"); + Assert.NotNull(waitStarted); + Assert.Equal(5, waitStarted!.WaitStartedDetails.Duration); + + var waitSucceeded = events.FirstOrDefault(e => e.WaitSucceededDetails != null && e.Name == "only_wait"); + Assert.NotNull(waitSucceeded); + + // No step events: this workflow body contains only a wait. + Assert.Empty(events.Where(e => e.StepStartedDetails != null)); + + // The wait genuinely caused a suspend/resume, not an in-process delay: + // expect at least 2 invocations recorded (initial + resume after timer fires). + var invocations = events.Where(e => e.InvocationCompletedDetails != null).ToList(); + Assert.True( + invocations.Count >= 2, + $"Expected at least 2 InvocationCompleted events (initial + post-wait resume), got {invocations.Count}"); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/xunit.runner.json b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/xunit.runner.json new file mode 100644 index 000000000..b6de9b357 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/xunit.runner.json @@ -0,0 +1,6 @@ +{ + "$schema": "https://xunit.net/schema/current/xunit.runner.schema.json", + "parallelizeTestCollections": false, + "parallelizeAssembly": false, + "maxParallelThreads": 1 +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/Amazon.Lambda.DurableExecution.Tests.csproj b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/Amazon.Lambda.DurableExecution.Tests.csproj new file mode 100644 index 000000000..6fa422e0a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/Amazon.Lambda.DurableExecution.Tests.csproj @@ -0,0 +1,34 @@ + + + + + + $(DefaultPackageTargets) + Amazon.Lambda.DurableExecution.Tests + Amazon.Lambda.DurableExecution.Tests + true + ..\..\..\buildtools\public.snk + true + enable + enable + $(NoWarn);CS1591 + true + + + + + + + + + + + + + + + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/CheckpointBatcherTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/CheckpointBatcherTests.cs new file mode 100644 index 000000000..c81998eaa --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/CheckpointBatcherTests.cs @@ -0,0 +1,213 @@ +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class CheckpointBatcherTests +{ + private static SdkOperationUpdate Update(string id) => new() + { + Id = id, + Type = "STEP", + Action = "SUCCEED" + }; + + [Fact] + public async Task EnqueueAsync_AwaitsUntilBatchFlushes() + { + var flushedTokens = new List(); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + flushedTokens.Add(token); + return Task.FromResult("token-1"); + }); + + await batcher.EnqueueAsync(Update("0-step")); + + Assert.Equal(new string?[] { "token-0" }, flushedTokens); + Assert.Equal("token-1", batcher.CheckpointToken); + + await batcher.DrainAsync(); + } + + [Fact] + public async Task MultipleEnqueueAsync_BatchedWithinWindow() + { + var batches = new List(); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + batches.Add(ops.Count); + return Task.FromResult(token); + }, + new CheckpointBatcherConfig { FlushInterval = TimeSpan.FromMilliseconds(50) }); + + // Fire several enqueues concurrently and await all — they should + // coalesce into a single batch since FlushInterval > 0. + var tasks = Enumerable.Range(0, 5) + .Select(i => batcher.EnqueueAsync(Update($"{i}-step"))) + .ToArray(); + + await Task.WhenAll(tasks); + await batcher.DrainAsync(); + + Assert.Single(batches); + Assert.Equal(5, batches[0]); + } + + [Fact] + public async Task EnqueueAsync_OverflowOps_SplitsBatches() + { + var batches = new List(); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + batches.Add(ops.Count); + return Task.FromResult(token); + }, + new CheckpointBatcherConfig + { + MaxBatchOperations = 3, + FlushInterval = TimeSpan.FromMilliseconds(100) + }); + + var tasks = Enumerable.Range(0, 7) + .Select(i => batcher.EnqueueAsync(Update($"{i}-step"))) + .ToArray(); + + await Task.WhenAll(tasks); + await batcher.DrainAsync(); + + // 7 items, max 3 per batch → 3, 3, 1 (or some permutation summing to 7 + // with no batch over 3). + Assert.Equal(7, batches.Sum()); + Assert.All(batches, count => Assert.True(count <= 3)); + Assert.True(batches.Count >= 3); + } + + [Fact] + public async Task FlushAsync_Throws_PropagatesToAllAwaiters() + { + var failure = new InvalidOperationException("service unavailable"); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => Task.FromException(failure), + new CheckpointBatcherConfig { FlushInterval = TimeSpan.FromMilliseconds(50) }); + + var tasks = Enumerable.Range(0, 3) + .Select(i => batcher.EnqueueAsync(Update($"{i}-step"))) + .ToArray(); + + // Each awaiter should see the same exception. + foreach (var t in tasks) + { + var ex = await Assert.ThrowsAsync(() => t); + Assert.Equal("service unavailable", ex.Message); + } + } + + [Fact] + public async Task EnqueueAsync_AfterTerminalError_FailsFast() + { + var failure = new InvalidOperationException("kaboom"); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => Task.FromException(failure)); + + // First enqueue trips the terminal error. + await Assert.ThrowsAsync(() => batcher.EnqueueAsync(Update("0-step"))); + + // Subsequent enqueue should fail fast with the same exception. + var second = await Assert.ThrowsAsync(() => batcher.EnqueueAsync(Update("1-step"))); + Assert.Equal("kaboom", second.Message); + } + + [Fact] + public async Task DrainAsync_FlushesRemainingItems() + { + var totalFlushed = 0; + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + Interlocked.Add(ref totalFlushed, ops.Count); + return Task.FromResult(token); + }); + + // Fire enqueues without awaiting them individually. + var tasks = Enumerable.Range(0, 4) + .Select(i => batcher.EnqueueAsync(Update($"{i}-step"))) + .ToArray(); + + await batcher.DrainAsync(); + await Task.WhenAll(tasks); + + Assert.Equal(4, totalFlushed); + } + + [Fact] + public async Task DrainAsync_AfterTerminalError_Throws() + { + var failure = new InvalidOperationException("nope"); + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => Task.FromException(failure)); + + // Trip the terminal error. + await Assert.ThrowsAsync(() => batcher.EnqueueAsync(Update("0-step"))); + + // Drain should rethrow. + await Assert.ThrowsAsync(() => batcher.DrainAsync()); + } + + [Fact] + public async Task EnqueueAsync_AfterDispose_Throws() + { + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => Task.FromResult(token)); + + await batcher.DisposeAsync(); + + await Assert.ThrowsAnyAsync(() => batcher.EnqueueAsync(Update("0-step"))); + } + + [Fact] + public async Task CheckpointToken_UpdatesAfterEachFlush() + { + var counter = 0; + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + var next = $"token-{Interlocked.Increment(ref counter)}"; + return Task.FromResult(next); + }); + + await batcher.EnqueueAsync(Update("0-step")); + Assert.Equal("token-1", batcher.CheckpointToken); + + await batcher.EnqueueAsync(Update("1-step")); + Assert.Equal("token-2", batcher.CheckpointToken); + + await batcher.DrainAsync(); + } + + [Fact] + public async Task ConcurrentEnqueueAsync_AllComplete() + { + var totalFlushed = 0; + var batcher = new CheckpointBatcher("token-0", + (token, ops, ct) => + { + Interlocked.Add(ref totalFlushed, ops.Count); + return Task.FromResult(token); + }, + new CheckpointBatcherConfig { FlushInterval = TimeSpan.FromMilliseconds(20) }); + + var tasks = Enumerable.Range(0, 100) + .Select(i => Task.Run(() => batcher.EnqueueAsync(Update($"{i}-step")))) + .ToArray(); + + await Task.WhenAll(tasks); + await batcher.DrainAsync(); + + Assert.Equal(100, totalFlushed); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ConfigTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ConfigTests.cs new file mode 100644 index 000000000..f31586ea0 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ConfigTests.cs @@ -0,0 +1,15 @@ +using Amazon.Lambda.DurableExecution; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class ConfigTests +{ + [Fact] + public void SerializationContext_RecordEquality() + { + var ctx1 = new SerializationContext("op-1", "arn:aws:lambda:us-east-1:123:function:my-func"); + var ctx2 = new SerializationContext("op-1", "arn:aws:lambda:us-east-1:123:function:my-func"); + Assert.Equal(ctx1, ctx2); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableContextTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableContextTests.cs new file mode 100644 index 000000000..cc3b9a460 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableContextTests.cs @@ -0,0 +1,971 @@ +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Amazon.Lambda.TestUtilities; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class DurableContextTests +{ + /// Reproduces the Id that emits for the n-th root-level operation. + private static string IdAt(int position) => OperationIdGenerator.HashOperationId(position.ToString()); + + private static DurableContext CreateContext( + InitialExecutionState? initialState = null, + TerminationManager? terminationManager = null) + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(initialState); + var tm = terminationManager ?? new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + + return new DurableContext(state, tm, idGen, "arn:aws:lambda:us-east-1:123:durable-execution:test", lambdaContext); + } + + #region StepAsync Tests + + [Fact] + public async Task StepAsync_NewExecution_RunsFunction() + { + var context = CreateContext(); + var executed = false; + + var result = await context.StepAsync(async (_) => + { + executed = true; + await Task.CompletedTask; + return 42; + }, name: "my_step"); + + Assert.True(executed); + Assert.Equal(42, result); + } + + [Fact] + public async Task StepAsync_Replay_ReturnsCachedResult() + { + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "\"cached_value\"" } + } + } + }); + + var executed = false; + var result = await context.StepAsync(async (_) => + { + executed = true; + await Task.CompletedTask; + return "fresh_value"; + }, name: "cached_step"); + + Assert.False(executed); + Assert.Equal("cached_value", result); + } + + [Fact] + public async Task StepAsync_ReplayFailed_ThrowsStepException() + { + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Failed, + StepDetails = new StepDetails + { + Error = new ErrorObject + { + ErrorType = "System.TimeoutException", + ErrorMessage = "timed out" + } + } + } + } + }); + + var ex = await Assert.ThrowsAsync(() => + context.StepAsync(async (_) => { await Task.CompletedTask; return "x"; }, name: "bad_step")); + + Assert.Equal("System.TimeoutException", ex.ErrorType); + Assert.Equal("timed out", ex.Message); + } + + [Fact] + public async Task StepAsync_Throws_FailsWithStepException() + { + var context = CreateContext(); + var attempts = 0; + + await Assert.ThrowsAsync(() => + context.StepAsync(async (_) => + { + attempts++; + await Task.CompletedTask; + throw new InvalidOperationException("boom"); + }, name: "fail_step")); + + // No retry support yet — the step runs once. + Assert.Equal(1, attempts); + } + + [Fact] + public async Task StepAsync_WithStepContext_ReceivesMetadata() + { + var context = CreateContext(); + string? receivedOpId = null; + int receivedAttempt = 0; + Microsoft.Extensions.Logging.ILogger? receivedLogger = null; + + await context.StepAsync(async (step) => + { + receivedOpId = step.OperationId; + receivedAttempt = step.AttemptNumber; + receivedLogger = step.Logger; + await Task.CompletedTask; + return "done"; + }, name: "meta_step"); + + Assert.Equal(IdAt(1), receivedOpId); + Assert.Equal(1, receivedAttempt); + Assert.NotNull(receivedLogger); + } + + [Fact] + public async Task StepAsync_VoidOverload_Works() + { + var context = CreateContext(); + var executed = false; + + await context.StepAsync(async (_) => + { + executed = true; + await Task.CompletedTask; + }, name: "void_step"); + + Assert.True(executed); + } + + [Fact] + public async Task StepAsync_MultipleSteps_DeterministicIds() + { + var context = CreateContext(); + + var r1 = await context.StepAsync(async (_) => { await Task.CompletedTask; return "a"; }, name: "first"); + var r2 = await context.StepAsync(async (_) => { await Task.CompletedTask; return "b"; }, name: "second"); + var r3 = await context.StepAsync(async (_) => { await Task.CompletedTask; return "c"; }); + + Assert.Equal("a", r1); + Assert.Equal("b", r2); + Assert.Equal("c", r3); + } + + [Fact] + public async Task StepAsync_ComplexType_SerializesCorrectly() + { + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "{\"Name\":\"Alice\",\"Age\":30}" } + } + } + }); + + var result = await context.StepAsync( + async (_) => { await Task.CompletedTask; return new TestPerson { Name = "Bob", Age = 25 }; }, + name: "fetch"); + + Assert.Equal("Alice", result.Name); + Assert.Equal(30, result.Age); + } + + [Fact] + public async Task StepAsync_CustomSerializer_UsedForSerialization() + { + var serializer = new RecordingSerializer(); + var context = CreateContext(); + + var result = await context.StepAsync( + async (_) => { await Task.CompletedTask; return new TestPerson { Name = "Charlie", Age = 40 }; }, + serializer, + name: "with_custom"); + + Assert.Equal("Charlie", result.Name); + Assert.True(serializer.SerializeCalled); + Assert.False(serializer.DeserializeCalled); + } + + [Fact] + public void Logger_Defaults_ToNullLogger() + { + var context = CreateContext(); + Assert.NotNull(context.Logger); + } + + [Fact] + public void ExecutionContext_ExposesArn() + { + var context = CreateContext(); + Assert.Equal("arn:aws:lambda:us-east-1:123:durable-execution:test", context.ExecutionContext.DurableExecutionArn); + } + + [Fact] + public void LambdaContext_IsExposed() + { + var context = CreateContext(); + Assert.NotNull(context.LambdaContext); + } + + [Fact] + public async Task StepAsync_Replay_NullResult_ReturnsDefault() + { + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = null } + } + } + }); + + var result = await context.StepAsync( + async (_) => { await Task.CompletedTask; return "fresh"; }, + name: "no_result"); + + Assert.Null(result); + } + + [Fact] + public async Task StepAsync_CancelledToken_ThrowsOperationCanceled() + { + var context = CreateContext(); + using var cts = new CancellationTokenSource(); + cts.Cancel(); + + await Assert.ThrowsAnyAsync(() => + context.StepAsync( + async (_) => + { + cts.Token.ThrowIfCancellationRequested(); + await Task.CompletedTask; + return "unreachable"; + }, + name: "cancelled_step", + cancellationToken: cts.Token)); + } + + [Fact] + public async Task StepAsync_CustomSerializer_UsedForReplayDeserialization() + { + var serializer = new RecordingSerializer(); + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "Dana,55" } + } + } + }); + + var result = await context.StepAsync( + async (_) => { await Task.CompletedTask; return new TestPerson { Name = "ignored", Age = 0 }; }, + serializer, + name: "replay_step"); + + Assert.True(serializer.DeserializeCalled); + Assert.Equal("Dana", result.Name); + Assert.Equal(55, result.Age); + } + + #endregion + + #region WaitAsync Tests + + [Fact] + public async Task WaitAsync_SubSecond_ThrowsArgumentOutOfRange() + { + var context = CreateContext(); + + await Assert.ThrowsAsync(() => + context.WaitAsync(TimeSpan.FromMilliseconds(500))); + } + + [Fact] + public async Task WaitAsync_AboveOneYear_ThrowsArgumentOutOfRange() + { + var context = CreateContext(); + + await Assert.ThrowsAsync(() => + context.WaitAsync(TimeSpan.FromSeconds(31_622_401))); + } + + [Fact] + public async Task WaitAsync_NewExecution_SignalsTermination() + { + var tm = new TerminationManager(); + var context = CreateContext(terminationManager: tm); + + // WaitAsync should signal termination and return a never-completing task + var waitTask = context.WaitAsync(TimeSpan.FromSeconds(30), name: "my_wait"); + + // Give it a moment to execute + await Task.Delay(10); + + Assert.True(tm.IsTerminated); + Assert.False(waitTask.IsCompleted); + } + + [Fact] + public async Task WaitAsync_Elapsed_ContinuesImmediately() + { + var pastExpirationMs = DateTimeOffset.UtcNow.AddSeconds(-10).ToUnixTimeMilliseconds(); + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Wait, + Status = OperationStatuses.Pending, + WaitDetails = new WaitDetails { ScheduledEndTimestamp = pastExpirationMs } + } + } + }); + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "cooldown"); + // If we got here, the wait was correctly skipped + } + + [Fact] + public async Task WaitAsync_StartedButNotExpired_ResuspendsWithoutNewCheckpoint() + { + var futureExpirationMs = DateTimeOffset.UtcNow.AddSeconds(300).ToUnixTimeMilliseconds(); + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Wait, + Status = OperationStatuses.Pending, + WaitDetails = new WaitDetails { ScheduledEndTimestamp = futureExpirationMs } + } + } + }); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + var waitTask = context.WaitAsync(TimeSpan.FromSeconds(30), name: "pending_wait"); + + await Task.Delay(10); + + Assert.True(tm.IsTerminated); + Assert.False(waitTask.IsCompleted); + Assert.Empty(recorder.Flushed); + } + + [Fact] + public async Task WaitAsync_AlreadySucceeded_ContinuesImmediately() + { + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Wait, + Status = OperationStatuses.Succeeded + } + } + }); + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "done_wait"); + // Completed without blocking + } + + [Fact] + public async Task WaitAsync_UnknownStatus_ThrowsNonDeterministicException() + { + // Unrecognized status on a replayed wait checkpoint must surface as + // NonDeterministicExecutionException — silently re-emitting WAIT START + // would either fail at the service or duplicate work. + var context = CreateContext(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Wait, + Status = "TOTALLY_BOGUS_STATUS" + } + } + }); + + await Assert.ThrowsAsync(() => + context.WaitAsync(TimeSpan.FromSeconds(30), name: "mystery_wait")); + } + + #endregion + + #region End-to-end: Step + Wait + Step + + [Fact] + public async Task EndToEnd_StepWaitStep_FirstInvocation_SuspendsOnWait() + { + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var result = await DurableExecutionHandler.RunAsync( + state, tm, + async () => + { + await context.StepAsync(async (_) => { await Task.CompletedTask; return "fetched"; }, name: "fetch"); + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "delay"); + var final = await context.StepAsync(async (_) => { await Task.CompletedTask; return "processed"; }, name: "process"); + return final; + }); + + Assert.Equal(InvocationStatus.Pending, result.Status); + } + + [Fact] + public async Task EndToEnd_StepWaitStep_SecondInvocation_Completes() + { + var pastExpirationMs = DateTimeOffset.UtcNow.AddSeconds(-5).ToUnixTimeMilliseconds(); + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "\"fetched\"" } + }, + new() + { + Id = IdAt(2), + Type = OperationTypes.Wait, + Status = OperationStatuses.Pending, + WaitDetails = new WaitDetails { ScheduledEndTimestamp = pastExpirationMs } + } + } + }); + + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + var processExecuted = false; + + var result = await DurableExecutionHandler.RunAsync( + state, tm, + async () => + { + var fetched = await context.StepAsync(async (_) => { await Task.CompletedTask; return "fresh_fetch"; }, name: "fetch"); + Assert.Equal("fetched", fetched); // cached from replay + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "delay"); + // wait is elapsed, continues + + var final = await context.StepAsync(async (_) => + { + processExecuted = true; + await Task.CompletedTask; + return "processed"; + }, name: "process"); + return final; + }); + + Assert.Equal(InvocationStatus.Succeeded, result.Status); + Assert.Equal("processed", result.Result); + Assert.True(processExecuted); + } + + #endregion + + #region Non-Determinism Detection Tests + + [Fact] + public async Task StepAsync_ReplayTypeMismatch_ThrowsNonDeterministicException() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Wait, + Status = OperationStatuses.Succeeded + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var ex = await Assert.ThrowsAsync(async () => + await context.StepAsync( + async (_) => { await Task.CompletedTask; return "should not run"; }, + name: "my_op")); + + Assert.Contains("expected type 'STEP'", ex.Message); + Assert.Contains("found 'WAIT'", ex.Message); + } + + [Fact] + public async Task WaitAsync_ReplayTypeMismatch_ThrowsNonDeterministicException() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "\"hello\"" } + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var ex = await Assert.ThrowsAsync(async () => + await context.WaitAsync(TimeSpan.FromSeconds(10), name: "my_op")); + + Assert.Contains("expected type 'WAIT'", ex.Message); + Assert.Contains("found 'STEP'", ex.Message); + } + + [Fact] + public async Task StepAsync_ReplayNameMismatch_ThrowsNonDeterministicException() + { + // Simulate a scenario where the operation was stored with a different name + // than what the current code passes (e.g., service returned stale data). + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + Name = "old_name", + StepDetails = new StepDetails { Result = "\"old_result\"" } + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var ex = await Assert.ThrowsAsync(async () => + await context.StepAsync( + async (_) => { await Task.CompletedTask; return "new"; }, + name: "my_step")); + + Assert.Contains("expected name 'my_step'", ex.Message); + Assert.Contains("found 'old_name'", ex.Message); + } + + [Fact] + public async Task StepAsync_NoReplay_SkipsValidation() + { + var context = CreateContext(); + + var result = await context.StepAsync( + async (_) => { await Task.CompletedTask; return "ok"; }, + name: "anything"); + + Assert.Equal("ok", result); + } + + #endregion + + private class TestPerson + { + public string? Name { get; set; } + public int Age { get; set; } + } + + /// + /// AOT-friendly test serializer using a trivial format. Demonstrates that + /// passing an to the AOT-safe + /// StepAsync overload fully replaces the reflection-based + /// System.Text.Json path. + /// + private class RecordingSerializer : ICheckpointSerializer + { + public bool SerializeCalled { get; private set; } + public bool DeserializeCalled { get; private set; } + + public string Serialize(TestPerson value, SerializationContext context) + { + SerializeCalled = true; + return $"{value.Name},{value.Age}"; + } + + public TestPerson Deserialize(string data, SerializationContext context) + { + DeserializeCalled = true; + var inner = data.Replace("", "").Replace("", ""); + var parts = inner.Split(','); + return new TestPerson { Name = parts[0], Age = int.Parse(parts[1]) }; + } + } + + #region StepAsync Retry Tests + + [Fact] + public async Task StepAsync_FailsWithRetryStrategy_CheckpointsRetryAndSuspends() + { + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + var stepTask = context.StepAsync( + async (_) => { await Task.CompletedTask; throw new InvalidOperationException("transient"); }, + name: "flaky_step", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromSeconds(5), + jitter: JitterStrategy.None) + }); + + await Task.Delay(50); + + Assert.True(tm.IsTerminated); + Assert.False(stepTask.IsCompleted); + + // Fresh attempt 1 emits a fire-and-forget START (telemetry under + // AtLeastOncePerRetry), then a RETRY when the user code throws and + // the retry strategy decides to retry. + var checkpoints = recorder.Flushed; + Assert.Equal(2, checkpoints.Count); + Assert.Equal("START", checkpoints[0].Action); + Assert.Equal("RETRY", checkpoints[1].Action); + Assert.Equal(IdAt(1), checkpoints[1].Id); + Assert.Equal(5, checkpoints[1].StepOptions.NextAttemptDelaySeconds); + } + + [Fact] + public async Task StepAsync_FailsNoRetryStrategy_CheckpointsFail() + { + var context = CreateContext(); + + var ex = await Assert.ThrowsAsync(() => + context.StepAsync( + async (_) => { await Task.CompletedTask; throw new InvalidOperationException("permanent"); }, + name: "fail_step")); + + Assert.Equal("permanent", ex.Message); + } + + [Fact] + public async Task StepAsync_RetryExhausted_CheckpointsFail() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Pending, + StepDetails = new StepDetails + { + Attempt = 2, + NextAttemptTimestamp = DateTimeOffset.UtcNow.AddSeconds(-10).ToUnixTimeMilliseconds() + } + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + // Attempt 3 (last one) — should fail after this + var ex = await Assert.ThrowsAsync(() => + context.StepAsync( + async (_) => { await Task.CompletedTask; throw new InvalidOperationException("still failing"); }, + name: "exhaust_step", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential(maxAttempts: 3, jitter: JitterStrategy.None) + })); + + Assert.Equal("still failing", ex.Message); + + // Fresh attempt 3 emits a fire-and-forget START (telemetry under + // AtLeastOncePerRetry), then a FAIL after the retry strategy gives up. + var checkpoints = recorder.Flushed; + Assert.Equal(2, checkpoints.Count); + Assert.Equal("START", checkpoints[0].Action); + Assert.Equal("FAIL", checkpoints[1].Action); + } + + [Fact] + public async Task StepAsync_PendingWithFutureTimestamp_Suspends() + { + var futureMs = DateTimeOffset.UtcNow.AddSeconds(300).ToUnixTimeMilliseconds(); + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Pending, + StepDetails = new StepDetails + { + Attempt = 1, + NextAttemptTimestamp = futureMs + } + } + } + }); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + var stepTask = context.StepAsync( + async (_) => { await Task.CompletedTask; return "should not run"; }, + name: "pending_step", + config: new StepConfig { RetryStrategy = RetryStrategy.Default }); + + await Task.Delay(50); + + Assert.True(tm.IsTerminated); + Assert.False(stepTask.IsCompleted); + Assert.Empty(recorder.Flushed); + } + + [Fact] + public async Task StepAsync_PendingWithPastTimestamp_ReExecutes() + { + var pastMs = DateTimeOffset.UtcNow.AddSeconds(-10).ToUnixTimeMilliseconds(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Pending, + StepDetails = new StepDetails + { + Attempt = 1, + NextAttemptTimestamp = pastMs + } + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var result = await context.StepAsync( + async (ctx) => + { + await Task.CompletedTask; + Assert.Equal(2, ctx.AttemptNumber); + return "retry success"; + }, + name: "retry_step", + config: new StepConfig { RetryStrategy = RetryStrategy.Default }); + + Assert.Equal("retry success", result); + } + + [Fact] + public async Task StepAsync_ReadyReplay_AdvancesAttemptAndExecutes() + { + // READY = service has post-PENDING re-invoked us; the retry timer + // already fired so no timestamp check is needed. Just advance the + // attempt counter and run. Matches Java's case READY -> executeStepLogic. + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Ready, + StepDetails = new StepDetails { Attempt = 2 } + } + } + }); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext); + + var executed = false; + var result = await context.StepAsync( + async (ctx) => + { + executed = true; + Assert.Equal(3, ctx.AttemptNumber); + await Task.CompletedTask; + return "ok"; + }, + name: "ready_step", + config: new StepConfig { RetryStrategy = RetryStrategy.Default }); + + Assert.True(executed); + Assert.Equal("ok", result); + Assert.False(tm.IsTerminated); + Assert.Equal(ExecutionMode.Execution, state.Mode); + } + + [Fact] + public async Task StepAsync_AtMostOnce_FlushesStartBeforeExecution() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var tm = new TerminationManager(); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + IReadOnlyList? flushedAtFuncEntry = null; + + var result = await context.StepAsync( + async (_) => + { + flushedAtFuncEntry = recorder.Flushed.Select(o => o.Action.ToString()).ToArray(); + await Task.CompletedTask; + return "done"; + }, + name: "amo_step", + config: new StepConfig { Semantics = StepSemantics.AtMostOncePerRetry }); + + Assert.Equal("done", result); + + // START must be flushed before user func runs (AtMostOnce invariant). + Assert.NotNull(flushedAtFuncEntry); + Assert.Equal(new[] { "START" }, flushedAtFuncEntry); + + // After step returns, SUCCEED has also been flushed. + var actions = recorder.Flushed.Select(o => o.Action.ToString()).ToArray(); + Assert.Equal(new[] { "START", "SUCCEED" }, actions); + } + + [Fact] + public async Task StepAsync_AtMostOnce_StartedReplay_TriggersRetryHandler() + { + var tm = new TerminationManager(); + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Started + } + } + }); + var idGen = new OperationIdGenerator(); + var lambdaContext = new TestLambdaContext(); + var recorder = new RecordingBatcher(); + var context = new DurableContext(state, tm, idGen, "arn:test", lambdaContext, recorder.Batcher); + + var executed = false; + var stepTask = context.StepAsync( + async (_) => { executed = true; await Task.CompletedTask; return "should not run"; }, + name: "amo_replay", + config: new StepConfig + { + Semantics = StepSemantics.AtMostOncePerRetry, + RetryStrategy = RetryStrategy.Exponential(maxAttempts: 3, jitter: JitterStrategy.None) + }); + + await Task.Delay(50); + + Assert.False(executed); + Assert.True(tm.IsTerminated); + Assert.False(stepTask.IsCompleted); + + var checkpoints = recorder.Flushed; + Assert.Single(checkpoints); + Assert.Equal("RETRY", checkpoints[0].Action); + } + + #endregion +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableExecutionHandlerTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableExecutionHandlerTests.cs new file mode 100644 index 000000000..b5abc5882 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableExecutionHandlerTests.cs @@ -0,0 +1,137 @@ +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class DurableExecutionHandlerTests +{ + [Fact] + public async Task RunAsync_UserCodeCompletes_ReturnsSucceeded() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + await Task.Delay(1); + return "hello"; + }); + + Assert.Equal(InvocationStatus.Succeeded, result.Status); + Assert.Equal("hello", result.Result); + Assert.Null(result.Exception); + } + + [Fact] + public async Task RunAsync_UserCodeThrows_ReturnsFailed() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + await Task.Delay(1); + throw new InvalidOperationException("something broke"); + }); + + Assert.Equal(InvocationStatus.Failed, result.Status); + Assert.Equal("something broke", result.Message); + Assert.IsType(result.Exception); + } + + [Fact] + public async Task RunAsync_TerminationWins_ReturnsPending() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + // Simulate: user code hits a wait, signals termination, then blocks forever + termination.Terminate(TerminationReason.WaitScheduled, "waiting 30s"); + await new TaskCompletionSource().Task; // blocks forever + return "unreachable"; + }); + + Assert.Equal(InvocationStatus.Pending, result.Status); + Assert.Equal("waiting 30s", result.Message); + Assert.Null(result.Exception); + } + + [Fact] + public async Task RunAsync_TerminationWithException_ReturnsFailed() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + termination.Terminate( + TerminationReason.CheckpointFailed, + "checkpoint error", + new InvalidOperationException("service unavailable")); + await new TaskCompletionSource().Task; + return "unreachable"; + }); + + Assert.Equal(InvocationStatus.Failed, result.Status); + Assert.IsType(result.Exception); + } + + [Fact] + public async Task RunAsync_FastUserCode_BeatsTermination() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + // User code completes before termination is called + return 42; + }); + + Assert.Equal(InvocationStatus.Succeeded, result.Status); + Assert.Equal(42, result.Result); + } + + [Fact] + public async Task RunAsync_IntResult_WorksWithValueTypes() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + var termination = new TerminationManager(); + + var result = await DurableExecutionHandler.RunAsync( + state, + termination, + async () => + { + await Task.CompletedTask; + return 100; + }); + + Assert.Equal(InvocationStatus.Succeeded, result.Status); + Assert.Equal(100, result.Result); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableFunctionTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableFunctionTests.cs new file mode 100644 index 000000000..b624766eb --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableFunctionTests.cs @@ -0,0 +1,590 @@ +using System.Net; +using System.Text.Json; +using Amazon.Lambda; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Amazon.Lambda.TestUtilities; +using Amazon.Runtime; +using Xunit; +using Operation = Amazon.Lambda.DurableExecution.Internal.Operation; +using StepDetails = Amazon.Lambda.DurableExecution.Internal.StepDetails; +using WaitDetails = Amazon.Lambda.DurableExecution.Internal.WaitDetails; +using ExecutionDetails = Amazon.Lambda.DurableExecution.Internal.ExecutionDetails; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class DurableFunctionTests +{ + /// Reproduces the Id that emits for the n-th root-level operation. + private static string IdAt(int position) => OperationIdGenerator.HashOperationId(position.ToString()); + + private readonly IAmazonLambda _mockClient = new MockLambdaClient(); + + [Fact] + public async Task WrapAsync_FreshExecution_StepThenWait_ReturnsPending() + { + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:order-123", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-123\"}" } + } + } + } + }; + + var output = await DurableFunction.WrapAsync( + MyWorkflow, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Pending, output.Status); + } + + [Fact] + public async Task WrapAsync_ReplayWithElapsedWait_ReturnsSucceeded() + { + var pastExpirationMs = DateTimeOffset.UtcNow.AddSeconds(-5).ToUnixTimeMilliseconds(); + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:order-123", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-123\"}" } + }, + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "{\"IsValid\":true}" } + }, + new() + { + Id = IdAt(2), + Type = OperationTypes.Wait, + Status = OperationStatuses.Pending, + WaitDetails = new WaitDetails { ScheduledEndTimestamp = pastExpirationMs } + } + } + } + }; + + var output = await DurableFunction.WrapAsync( + MyWorkflow, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + Assert.NotNull(output.Result); + var result = JsonSerializer.Deserialize(output.Result!); + Assert.Equal("approved", result!.Status); + } + + [Fact] + public async Task WrapAsync_WorkflowThrows_ReturnsFailed() + { + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:fail-test", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"bad-order\"}" } + } + } + } + }; + + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => throw new InvalidOperationException("workflow error"), + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Failed, output.Status); + Assert.NotNull(output.Error); + Assert.Equal("workflow error", output.Error!.ErrorMessage); + Assert.Contains("InvalidOperationException", output.Error.ErrorType!); + } + + [Fact] + public async Task WrapAsync_VoidWorkflow_ReturnSucceeded() + { + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:void-test", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-1\"}" } + } + } + } + }; + + var executed = false; + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => + { + await ctx.StepAsync(async (_) => { await Task.CompletedTask; executed = true; }, name: "do_work"); + }, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + Assert.True(executed); + } + + [Fact] + public async Task WrapAsync_CheckpointsAreSentToService() + { + var mockClient = new MockLambdaClient(); + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:checkpoint-test", + CheckpointToken = "initial-token", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-1\"}" } + } + } + } + }; + + var output = await DurableFunction.WrapAsync( + MyWorkflow, + input, + new TestLambdaContext(), + mockClient); + + Assert.Equal(InvocationStatus.Pending, output.Status); + + // Each StepAsync emits a fire-and-forget START before user code runs + // (telemetry under AtLeastOncePerRetry). With FlushInterval = 0 the + // worker may flush the START on its own before SUCCEED arrives, so the + // exact batching of START vs SUCCEED is timing-dependent. Assert on + // the flat sequence of updates instead. + var allUpdates = mockClient.CheckpointCalls + .SelectMany(c => c.Updates) + .ToList(); + + // Expect: step START, step SUCCEED, wait START (in that order). + Assert.Equal(3, allUpdates.Count); + + Assert.Equal("STEP", allUpdates[0].Type); + Assert.Equal("START", allUpdates[0].Action); + Assert.Equal("validate", allUpdates[0].Name); + + Assert.Equal("STEP", allUpdates[1].Type); + Assert.Equal("SUCCEED", allUpdates[1].Action); + Assert.Equal("validate", allUpdates[1].Name); + Assert.NotNull(allUpdates[1].Payload); + + Assert.Equal("WAIT", allUpdates[2].Type); + Assert.Equal("START", allUpdates[2].Action); + Assert.Equal("delay", allUpdates[2].Name); + Assert.NotNull(allUpdates[2].WaitOptions); + Assert.Equal(30, allUpdates[2].WaitOptions.WaitSeconds); + + // The first call sends the initial checkpoint token. + Assert.Equal("arn:aws:lambda:us-east-1:123:durable-execution:checkpoint-test", mockClient.CheckpointCalls[0].DurableExecutionArn); + Assert.Equal("initial-token", mockClient.CheckpointCalls[0].CheckpointToken); + } + + [Fact] + public async Task WrapAsync_UserPayload_BindsCamelCaseToPascalCaseProperty() + { + // The wire payload uses camelCase ("orderId"), the user POCO uses PascalCase (OrderId). + // ExtractUserPayload must do case-insensitive binding so workflows can read input.OrderId. + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:case-test", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"abc-123\"}" } + } + } + } + }; + + string? observedOrderId = null; + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => + { + observedOrderId = evt.OrderId; + await Task.CompletedTask; + return new OrderResult { Status = "ok", OrderId = evt.OrderId }; + }, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + Assert.Equal("abc-123", observedOrderId); + } + + [Fact] + public async Task WrapAsync_NoExecutionOp_ReceivesDefaultPayload() + { + // No EXECUTION operation in the envelope — ExtractUserPayload returns default(TInput). + // Exercises the "loop falls through without finding EXECUTION" branch in DurableFunction.ExtractUserPayload. + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:no-exec", + InitialExecutionState = new InitialExecutionState + { + Operations = new List() + } + }; + + OrderEvent? observed = null; + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => + { + observed = evt; + await Task.CompletedTask; + return new OrderResult { Status = "ok" }; + }, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + Assert.Null(observed); // default(OrderEvent) for a reference type is null + } + + [Fact] + public async Task WrapAsync_PaginatedInitialState_HydratesAllPages() + { + // The service can return execution state across multiple pages — the first + // page comes inline on the invocation envelope (InitialExecutionState) and + // subsequent pages must be fetched via GetDurableExecutionState. Verify the + // pagination loop in WrapAsyncCore (DurableFunction.cs:160-167) walks every + // page so the workflow sees the full operation history on replay. + var arn = "arn:aws:lambda:us-east-1:123:durable-execution:paginated"; + + // Page 0 (in InitialExecutionState): EXECUTION op + step1 SUCCEEDED. + // Page 1 (fetched with marker "marker-1"): step2 SUCCEEDED, points to marker-2. + // Page 2 (fetched with marker "marker-2"): step3 SUCCEEDED, no NextMarker — loop exits. + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = arn, + CheckpointToken = "ckpt-0", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-1\"}" } + }, + new() + { + Id = IdAt(1), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "\"page-0-result\"" } + } + }, + NextMarker = "marker-1" + } + }; + + var mockClient = new MockLambdaClient + { + GetExecutionStateHandler = req => req.Marker switch + { + "marker-1" => new Amazon.Lambda.Model.GetDurableExecutionStateResponse + { + Operations = new List + { + new() + { + Id = IdAt(2), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new Amazon.Lambda.Model.StepDetails { Result = "\"page-1-result\"" } + } + }, + NextMarker = "marker-2" + }, + "marker-2" => new Amazon.Lambda.Model.GetDurableExecutionStateResponse + { + Operations = new List + { + new() + { + Id = IdAt(3), + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new Amazon.Lambda.Model.StepDetails { Result = "\"page-2-result\"" } + } + } + // NextMarker omitted -> loop terminates. + }, + _ => throw new InvalidOperationException($"Unexpected marker: {req.Marker}") + } + }; + + var observed = new List(); + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => + { + // All three steps must replay the cached results from the paginated state + // without re-executing — if the loop missed a page, the corresponding step + // would run fresh and append a different value to `observed`. + observed.Add(await ctx.StepAsync( + async (_) => { await Task.CompletedTask; return "fresh"; }, name: "step1")); + observed.Add(await ctx.StepAsync( + async (_) => { await Task.CompletedTask; return "fresh"; }, name: "step2")); + observed.Add(await ctx.StepAsync( + async (_) => { await Task.CompletedTask; return "fresh"; }, name: "step3")); + return new OrderResult { Status = "ok", OrderId = evt.OrderId }; + }, + input, + new TestLambdaContext(), + mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + + // Two GetDurableExecutionState calls — one per fetched page (page 0 was inline). + Assert.Equal(2, mockClient.GetExecutionStateCalls.Count); + Assert.Equal("marker-1", mockClient.GetExecutionStateCalls[0].Marker); + Assert.Equal(arn, mockClient.GetExecutionStateCalls[0].DurableExecutionArn); + Assert.Equal("ckpt-0", mockClient.GetExecutionStateCalls[0].CheckpointToken); + Assert.Equal("marker-2", mockClient.GetExecutionStateCalls[1].Marker); + + // The workflow saw replayed results from ALL three pages — none re-executed. + Assert.Equal(new[] { "page-0-result", "page-1-result", "page-2-result" }, observed); + + // No checkpoints were written: every step replayed from cache. + Assert.Empty(mockClient.CheckpointCalls); + } + + [Fact] + public async Task WrapAsync_NullInitialExecutionState_ReceivesDefaultPayload() + { + // No initial execution state at all. Same default-return branch in ExtractUserPayload. + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:null-state" + }; + + OrderEvent? observed = null; + var output = await DurableFunction.WrapAsync( + async (evt, ctx) => + { + observed = evt; + await Task.CompletedTask; + return new OrderResult { Status = "ok" }; + }, + input, + new TestLambdaContext(), + _mockClient); + + Assert.Equal(InvocationStatus.Succeeded, output.Status); + Assert.Null(observed); + } + + // ────────────────────────────────────────────────────────────────────── + // IsTerminalCheckpointError classification (mirrors CheckpointError in + // aws-durable-execution-sdk-python): + // 4xx (except 429) → terminal (Failed envelope) + // 429 / 5xx / no status → transient (escapes to host for Lambda retry) + // Carve-out: InvalidParameterValueException "Invalid Checkpoint Token" → transient + // + // Driven through CheckpointDurableExecution: a workflow that succeeds a single Step + // forces the batcher to flush, which is wrapped by the try/catch in WrapAsyncCore. + // ────────────────────────────────────────────────────────────────────── + + public static IEnumerable TerminalCheckpointErrorCases() => new[] + { + new object[] { MakeServiceException("ResourceNotFoundException", HttpStatusCode.NotFound, "ARN not found") }, + new object[] { MakeServiceException("AccessDeniedException", HttpStatusCode.Forbidden, "denied") }, + new object[] { MakeServiceException("KMSAccessDeniedException", HttpStatusCode.BadRequest, "kms denied") }, + new object[] { MakeServiceException("ValidationException", HttpStatusCode.BadRequest, "bad input") }, + new object[] { MakeServiceException("InvalidParameterValueException", HttpStatusCode.BadRequest, "Some other parameter") }, + }; + + [Theory] + [MemberData(nameof(TerminalCheckpointErrorCases))] + public async Task WrapAsync_CheckpointThrowsTerminal_ReturnsFailed(AmazonServiceException ex) + { + var input = MakeCheckpointInput(); + var mockClient = new MockLambdaClient { CheckpointThrows = ex }; + + var output = await DurableFunction.WrapAsync( + SingleStepWorkflow, input, new TestLambdaContext(), mockClient); + + Assert.Equal(InvocationStatus.Failed, output.Status); + Assert.NotNull(output.Error); + Assert.Equal(ex.Message, output.Error!.ErrorMessage); + } + + public static IEnumerable TransientCheckpointErrorCases() => new[] + { + // 5xx + new object[] { MakeServiceException("InternalServerError", HttpStatusCode.InternalServerError, "boom") }, + new object[] { MakeServiceException("ServiceUnavailable", HttpStatusCode.ServiceUnavailable, "down") }, + // 429 + new object[] { MakeServiceException("TooManyRequestsException", (HttpStatusCode)429, "throttled") }, + // No status (network / SDK-internal). HttpStatusCode default (0) → classifier treats < 400 as transient. + new object[] { MakeServiceException("RequestTimeout", 0, "timeout") }, + // Carve-out: stale checkpoint token is transient. + new object[] { MakeServiceException("InvalidParameterValueException", HttpStatusCode.BadRequest, "Invalid Checkpoint Token: stale") }, + }; + + [Theory] + [MemberData(nameof(TransientCheckpointErrorCases))] + public async Task WrapAsync_CheckpointThrowsTransient_PropagatesToHost(AmazonServiceException ex) + { + var input = MakeCheckpointInput(); + var mockClient = new MockLambdaClient { CheckpointThrows = ex }; + + var thrown = await Assert.ThrowsAsync(ex.GetType(), () => + DurableFunction.WrapAsync( + SingleStepWorkflow, input, new TestLambdaContext(), mockClient)); + + Assert.Same(ex, thrown); + } + + [Fact] + public async Task WrapAsync_HydrationThrows_AlwaysPropagatesToHost() + { + // State hydration is OUTSIDE the IsTerminalCheckpointError try/catch — every + // GetExecutionStateAsync failure escapes for Lambda retry, matching Python's + // GetExecutionStateError (an InvocationError). Use a 4xx that *would* be terminal + // if it came from a checkpoint flush to prove the path isn't classified. + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:hydrate-fail", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-1\"}" } + } + }, + NextMarker = "page-1" // force the hydration loop to run + } + }; + var ex = MakeServiceException("ResourceNotFoundException", HttpStatusCode.NotFound, "ARN gone"); + var mockClient = new MockLambdaClient { GetExecutionStateThrows = ex }; + + var thrown = await Assert.ThrowsAsync(() => + DurableFunction.WrapAsync( + MyWorkflow, input, new TestLambdaContext(), mockClient)); + + Assert.Same(ex, thrown); + } + + private static AmazonServiceException MakeServiceException(string code, HttpStatusCode status, string message) + { + return new AmazonServiceException(message, innerException: null, ErrorType.Unknown, code, requestId: "req-1", statusCode: status); + } + + private static DurableExecutionInvocationInput MakeCheckpointInput() => new() + { + DurableExecutionArn = "arn:aws:lambda:us-east-1:123:durable-execution:checkpoint-fail", + InitialExecutionState = new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "exec-0", + Type = OperationTypes.Execution, + Status = OperationStatuses.Started, + ExecutionDetails = new ExecutionDetails { InputPayload = "{\"orderId\":\"order-1\"}" } + } + } + } + }; + + private static async Task SingleStepWorkflow(OrderEvent input, IDurableContext context) + { + // One step succeed → forces a checkpoint flush, which the mock fails. + await context.StepAsync(async (_) => { await Task.CompletedTask; return "ok"; }, name: "s1"); + return new OrderResult { Status = "done" }; + } + + private static async Task MyWorkflow(OrderEvent input, IDurableContext context) + { + var validation = await context.StepAsync( + async (_) => { await Task.CompletedTask; return new ValidationResult { IsValid = true }; }, + name: "validate"); + + await context.WaitAsync(TimeSpan.FromSeconds(30), name: "delay"); + + return new OrderResult { Status = "approved", OrderId = input.OrderId }; + } + + private class OrderEvent + { + public string? OrderId { get; set; } + } + + private class OrderResult + { + public string? Status { get; set; } + public string? OrderId { get; set; } + } + + private class ValidationResult + { + public bool IsValid { get; set; } + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/EnumsTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/EnumsTests.cs new file mode 100644 index 000000000..1626f118a --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/EnumsTests.cs @@ -0,0 +1,39 @@ +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class EnumsTests +{ + [Fact] + public void InvocationStatus_HasExpectedValues() + { + Assert.Equal(0, (int)InvocationStatus.Succeeded); + Assert.Equal(1, (int)InvocationStatus.Failed); + Assert.Equal(2, (int)InvocationStatus.Pending); + } + + [Fact] + public void OperationTypes_HasExpectedConstants() + { + Assert.Equal("STEP", OperationTypes.Step); + Assert.Equal("WAIT", OperationTypes.Wait); + Assert.Equal("CALLBACK", OperationTypes.Callback); + Assert.Equal("CHAINED_INVOKE", OperationTypes.ChainedInvoke); + Assert.Equal("CONTEXT", OperationTypes.Context); + Assert.Equal("EXECUTION", OperationTypes.Execution); + } + + [Fact] + public void OperationStatuses_HasExpectedConstants() + { + Assert.Equal("STARTED", OperationStatuses.Started); + Assert.Equal("SUCCEEDED", OperationStatuses.Succeeded); + Assert.Equal("FAILED", OperationStatuses.Failed); + Assert.Equal("PENDING", OperationStatuses.Pending); + Assert.Equal("CANCELLED", OperationStatuses.Cancelled); + Assert.Equal("READY", OperationStatuses.Ready); + Assert.Equal("STOPPED", OperationStatuses.Stopped); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExceptionsTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExceptionsTests.cs new file mode 100644 index 000000000..7105849bb --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExceptionsTests.cs @@ -0,0 +1,68 @@ +using Amazon.Lambda.DurableExecution; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class ExceptionsTests +{ + [Fact] + public void DurableExecutionException_IsBaseException() + { + var ex = new DurableExecutionException("test error"); + Assert.IsAssignableFrom(ex); + Assert.Equal("test error", ex.Message); + } + + [Fact] + public void DurableExecutionException_WrapsInnerException() + { + var inner = new InvalidOperationException("inner"); + var ex = new DurableExecutionException("outer", inner); + Assert.Same(inner, ex.InnerException); + } + + [Fact] + public void DurableExecutionException_ParameterlessCtor() + { + var ex = new DurableExecutionException(); + Assert.IsAssignableFrom(ex); + } + + [Fact] + public void StepException_ParameterlessCtor() + { + var ex = new StepException(); + Assert.IsAssignableFrom(ex); + } + + [Fact] + public void StepException_MessageOnlyCtor() + { + var ex = new StepException("step blew up"); + Assert.Equal("step blew up", ex.Message); + } + + [Fact] + public void StepException_WithInnerException() + { + var inner = new InvalidOperationException("inner"); + var ex = new StepException("wrapped", inner); + Assert.Same(inner, ex.InnerException); + } + + [Fact] + public void StepException_HasErrorProperties() + { + var ex = new StepException("step failed") + { + ErrorType = "System.TimeoutException", + ErrorData = "operation timed out", + OriginalStackTrace = new[] { "at Foo.Bar()", "at Baz.Qux()" } + }; + + Assert.IsAssignableFrom(ex); + Assert.Equal("System.TimeoutException", ex.ErrorType); + Assert.Equal("operation timed out", ex.ErrorData); + Assert.Equal(2, ex.OriginalStackTrace!.Count); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExecutionStateTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExecutionStateTests.cs new file mode 100644 index 000000000..6500879c1 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExecutionStateTests.cs @@ -0,0 +1,231 @@ +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; +using Operation = Amazon.Lambda.DurableExecution.Internal.Operation; +using StepDetails = Amazon.Lambda.DurableExecution.Internal.StepDetails; +namespace Amazon.Lambda.DurableExecution.Tests; + +public class ExecutionStateTests +{ + private const string ExecutionInputId = "exec-input"; + + private static Operation ExecutionInputOp(string id = ExecutionInputId) => new() + { + Id = id, + Type = OperationTypes.Execution, + Status = OperationStatuses.Started + }; + + private static Operation StepOp(string id, string status, string? name = null) => new() + { + Id = id, + Type = OperationTypes.Step, + Status = status, + Name = name, + StepDetails = new StepDetails { Result = "true" } + }; + + [Fact] + public void LoadFromCheckpoint_NullState_NotReplaying() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + + Assert.False(state.IsReplaying); + Assert.Equal(0, state.CheckpointedOperationCount); + } + + [Fact] + public void LoadFromCheckpoint_EmptyOperations_NotReplaying() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState { Operations = new List() }); + + Assert.False(state.IsReplaying); + Assert.Equal(0, state.CheckpointedOperationCount); + } + + [Fact] + public void LoadFromCheckpoint_OnlyExecutionInputOp_NotReplaying() + { + // The service sends one EXECUTION-type op carrying the input payload + // even on the first invocation. That op is bookkeeping, not user + // history — it must not put us into replay mode. (Matches Python + // execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62.) + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List { ExecutionInputOp() } + }); + + Assert.False(state.IsReplaying); + Assert.Equal(1, state.CheckpointedOperationCount); + } + + [Fact] + public void LoadFromCheckpoint_WithReplayableOperations_IsReplaying() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + ExecutionInputOp(), + StepOp("0-fetch_user", OperationStatuses.Succeeded) + } + }); + + Assert.True(state.IsReplaying); + Assert.Equal(2, state.CheckpointedOperationCount); + } + + [Fact] + public void TrackReplay_FlipsOutOfReplay_OnceAllCompletedOpsVisited() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + ExecutionInputOp(), + StepOp("0", OperationStatuses.Succeeded), + StepOp("1", OperationStatuses.Succeeded), + } + }); + Assert.True(state.IsReplaying); + + state.TrackReplay("0"); + Assert.True(state.IsReplaying); // 1-of-2 completed ops visited + + state.TrackReplay("1"); + Assert.False(state.IsReplaying); // all completed ops visited → fresh + } + + [Fact] + public void TrackReplay_PendingOpDoesNotBlockTransition() + { + // A PENDING op (e.g. retry timer waiting) is not "completed" in the + // checkpoint sense — once the workflow has visited every terminally- + // completed op the SDK treats subsequent code as fresh. Matches Python's + // {SUCCEEDED, FAILED, CANCELLED, STOPPED, TIMED_OUT} terminal set. + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + ExecutionInputOp(), + StepOp("0", OperationStatuses.Succeeded), + StepOp("1", OperationStatuses.Pending), + } + }); + Assert.True(state.IsReplaying); + + state.TrackReplay("0"); + Assert.False(state.IsReplaying); + } + + [Fact] + public void TrackReplay_IsIdempotent() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + ExecutionInputOp(), + StepOp("0", OperationStatuses.Succeeded), + } + }); + + state.TrackReplay("0"); + Assert.False(state.IsReplaying); + + // Second call is a no-op. + state.TrackReplay("0"); + Assert.False(state.IsReplaying); + } + + [Fact] + public void TrackReplay_NoOpWhenNotReplaying() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + Assert.False(state.IsReplaying); + + state.TrackReplay("anything"); + Assert.False(state.IsReplaying); + } + + [Fact] + public void GetOperation_ReturnsCheckpointedRecord() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + StepOp("0-validate", OperationStatuses.Succeeded) + } + }); + + var op = state.GetOperation("0-validate"); + Assert.NotNull(op); + Assert.Equal(OperationStatuses.Succeeded, op!.Status); + } + + [Fact] + public void GetOperation_ReturnsNull_WhenNotFound() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(null); + + var op = state.GetOperation("0-nonexistent"); + Assert.Null(op); + } + + [Fact] + public void HasOperation_ReturnsTrueForExisting() + { + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List { StepOp("0-step_a", OperationStatuses.Succeeded) } + }); + + Assert.True(state.HasOperation("0-step_a")); + Assert.False(state.HasOperation("1-step_b")); + } + + [Fact] + public void GetOperation_ReturnsLatestRecord_WhenIdAppearsMultipleTimes() + { + // Wire format: when the service replays an envelope it includes the + // most recent record per ID. Java/Python/JS reference SDKs all key by + // ID alone and rely on the service to provide the authoritative record. + var state = new ExecutionState(); + state.LoadFromCheckpoint(new InitialExecutionState + { + Operations = new List + { + new() + { + Id = "0-payment", + Type = OperationTypes.Step, + Status = OperationStatuses.Started + }, + new() + { + Id = "0-payment", + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + StepDetails = new StepDetails { Result = "\"paid\"" } + } + } + }); + + var op = state.GetOperation("0-payment"); + Assert.NotNull(op); + Assert.Equal(OperationStatuses.Succeeded, op!.Status); + Assert.Equal("\"paid\"", op.StepDetails?.Result); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/LambdaDurableServiceClientTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/LambdaDurableServiceClientTests.cs new file mode 100644 index 000000000..2326f8544 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/LambdaDurableServiceClientTests.cs @@ -0,0 +1,202 @@ +using Amazon.Lambda.DurableExecution.Services; +using Amazon.Lambda.Model; +using SdkErrorObject = Amazon.Lambda.Model.ErrorObject; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class LambdaDurableServiceClientTests +{ + [Fact] + public async Task CheckpointAsync_EmptyOperations_NoApiCallReturnsToken() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + var token = await client.CheckpointAsync( + "arn:aws:lambda:us-east-1:123:durable-execution:e1", + "input-token", + Array.Empty()); + + Assert.Equal("input-token", token); + Assert.Empty(mockClient.CheckpointCalls); + } + + [Fact] + public async Task CheckpointAsync_NullCheckpointToken_SendsEmptyString() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + await client.CheckpointAsync( + "arn:aws:lambda:us-east-1:123:durable-execution:e1", + checkpointToken: null, + new[] + { + new OperationUpdate + { + Id = "0-step", + Type = "STEP", + Action = "SUCCEED", + SubType = "Step", + Name = "do_thing", + Payload = "\"ok\"" + } + }); + + var call = Assert.Single(mockClient.CheckpointCalls); + Assert.Equal("", call.CheckpointToken); + } + + [Fact] + public async Task CheckpointAsync_StepWithError_PropagatesError() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + await client.CheckpointAsync( + "arn:aws:lambda:us-east-1:123:durable-execution:e1", + "tok", + new[] + { + new OperationUpdate + { + Id = "0-bad", + Type = "STEP", + Action = "FAIL", + SubType = "Step", + Name = "bad", + Error = new SdkErrorObject + { + ErrorType = "System.TimeoutException", + ErrorMessage = "timed out", + ErrorData = "{\"detail\":\"x\"}", + StackTrace = new List { "at A.B()", "at C.D()" } + } + } + }); + + var call = Assert.Single(mockClient.CheckpointCalls); + var update = Assert.Single(call.Updates); + Assert.Equal("STEP", update.Type); + Assert.Equal("FAIL", update.Action); + Assert.NotNull(update.Error); + Assert.Equal("System.TimeoutException", update.Error.ErrorType); + Assert.Equal("timed out", update.Error.ErrorMessage); + Assert.Equal("{\"detail\":\"x\"}", update.Error.ErrorData); + Assert.Equal(2, update.Error.StackTrace.Count); + } + + [Fact] + public async Task CheckpointAsync_WaitWithOptions_PropagatesWaitOptions() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + await client.CheckpointAsync( + "arn", + "tok", + new[] + { + new OperationUpdate + { + Id = "0-wait", + Type = "WAIT", + Action = "START", + SubType = "Wait", + Name = "delay", + WaitOptions = new WaitOptions { WaitSeconds = 45 } + } + }); + + var update = mockClient.CheckpointCalls[0].Updates[0]; + Assert.NotNull(update.WaitOptions); + Assert.Equal(45, update.WaitOptions.WaitSeconds); + } + + [Fact] + public async Task CheckpointAsync_ParentIdAndPayload_ArePropagated() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + await client.CheckpointAsync( + "arn", + "tok", + new[] + { + new OperationUpdate + { + Id = "child-1", + ParentId = "parent-0", + Type = "STEP", + Action = "SUCCEED", + SubType = "Step", + Payload = "{\"a\":1}" + } + }); + + var update = mockClient.CheckpointCalls[0].Updates[0]; + Assert.Equal("parent-0", update.ParentId); + Assert.Equal("{\"a\":1}", update.Payload); + } + + [Fact] + public async Task CheckpointAsync_MultipleUpdates_AllForwarded() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + await client.CheckpointAsync( + "arn", + "tok", + new[] + { + new OperationUpdate + { + Id = "0-step", + Type = "STEP", + Action = "SUCCEED", + SubType = "Step", + Name = "validate" + }, + new OperationUpdate + { + Id = "1-wait", + Type = "WAIT", + Action = "START", + SubType = "Wait", + Name = "delay", + WaitOptions = new WaitOptions { WaitSeconds = 30 } + } + }); + + var call = Assert.Single(mockClient.CheckpointCalls); + Assert.Equal(2, call.Updates.Count); + Assert.Equal("STEP", call.Updates[0].Type); + Assert.Equal("WAIT", call.Updates[1].Type); + } + + [Fact] + public async Task CheckpointAsync_ReturnsNewToken() + { + var mockClient = new MockLambdaClient(); + var client = new LambdaDurableServiceClient(mockClient); + + var newToken = await client.CheckpointAsync( + "arn", + "old-token", + new[] + { + new OperationUpdate + { + Id = "0-x", + Type = "STEP", + Action = "SUCCEED" + } + }); + + // MockLambdaClient returns "token-1", "token-2", etc. + Assert.Equal("token-1", newToken); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/MockLambdaClient.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/MockLambdaClient.cs new file mode 100644 index 000000000..8df98a67d --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/MockLambdaClient.cs @@ -0,0 +1,65 @@ +using Amazon.Lambda; +using Amazon.Lambda.Model; +using Amazon.Runtime; + +namespace Amazon.Lambda.DurableExecution.Tests; + +/// +/// A mock that subclasses AmazonLambdaClient and overrides CheckpointDurableExecutionAsync +/// to avoid real API calls. Records checkpoint requests for test assertions. +/// +internal class MockLambdaClient : AmazonLambdaClient +{ + public List CheckpointCalls { get; } = new(); + public List GetExecutionStateCalls { get; } = new(); + + /// + /// Optional handler for calls. Tests + /// that exercise the paginated-state path can set this to control the response + /// for each page. + /// + public Func? GetExecutionStateHandler { get; set; } + + private int _tokenCounter; + + public MockLambdaClient() : base("fake-access-key", "fake-secret-key", Amazon.RegionEndpoint.USEast1) { } + + /// + /// Optional exception thrown by . Tests + /// that exercise checkpoint-error classification can set this to inject a specific + /// SDK exception on the orchestration-path drain. + /// + public Exception? CheckpointThrows { get; set; } + + /// + /// Optional exception thrown by . Tests + /// that exercise hydration-error classification can set this to inject a specific + /// SDK exception on the initial state-fetch path. + /// + public Exception? GetExecutionStateThrows { get; set; } + + public override Task CheckpointDurableExecutionAsync( + CheckpointDurableExecutionRequest request, + CancellationToken cancellationToken = default) + { + CheckpointCalls.Add(request); + if (CheckpointThrows != null) throw CheckpointThrows; + return Task.FromResult(new CheckpointDurableExecutionResponse + { + CheckpointToken = $"token-{++_tokenCounter}" + }); + } + + public override Task GetDurableExecutionStateAsync( + GetDurableExecutionStateRequest request, + CancellationToken cancellationToken = default) + { + GetExecutionStateCalls.Add(request); + if (GetExecutionStateThrows != null) throw GetExecutionStateThrows; + if (GetExecutionStateHandler != null) + { + return Task.FromResult(GetExecutionStateHandler(request)); + } + return Task.FromResult(new GetDurableExecutionStateResponse()); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ModelsTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ModelsTests.cs new file mode 100644 index 000000000..2b7d3489e --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/ModelsTests.cs @@ -0,0 +1,203 @@ +using System.Text.Json; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class ModelsTests +{ + [Fact] + public void Operation_PropertiesAssignable() + { + var op = new Operation + { + Id = "op-1", + Type = OperationTypes.Step, + Status = OperationStatuses.Succeeded, + Name = "fetch_user", + StepDetails = new StepDetails { Result = "{\"name\":\"Alice\"}" } + }; + + Assert.Equal("op-1", op.Id); + Assert.Equal(OperationTypes.Step, op.Type); + Assert.Equal(OperationStatuses.Succeeded, op.Status); + Assert.Equal("fetch_user", op.Name); + Assert.Equal("{\"name\":\"Alice\"}", op.StepDetails?.Result); + } + + [Fact] + public void Operation_WaitWithScheduledEndTimestamp() + { + var op = new Operation + { + Id = "op-2", + Type = OperationTypes.Wait, + Status = OperationStatuses.Pending, + Name = "cooldown", + WaitDetails = new WaitDetails + { + ScheduledEndTimestamp = 1767268830000L // 2026-01-01T12:00:30Z in ms + } + }; + + Assert.Equal(OperationTypes.Wait, op.Type); + Assert.Equal(1767268830000L, op.WaitDetails?.ScheduledEndTimestamp); + } + + [Fact] + public void ErrorObject_FromException() + { + var ex = new InvalidOperationException("something went wrong"); + var error = ErrorObject.FromException(ex); + + Assert.Equal("System.InvalidOperationException", error.ErrorType); + Assert.Equal("something went wrong", error.ErrorMessage); + } + + [Fact] + public void ErrorObject_RoundTripSerialization() + { + var error = new ErrorObject + { + ErrorType = "System.TimeoutException", + ErrorMessage = "timed out", + StackTrace = new[] { "at Foo.Bar()", "at Baz.Qux()" }, + ErrorData = "{\"key\":\"value\"}" + }; + + var json = JsonSerializer.Serialize(error); + var deserialized = JsonSerializer.Deserialize(json)!; + + Assert.Equal("System.TimeoutException", deserialized.ErrorType); + Assert.Equal("timed out", deserialized.ErrorMessage); + Assert.Equal(2, deserialized.StackTrace!.Count); + Assert.Equal("{\"key\":\"value\"}", deserialized.ErrorData); + } + + [Fact] + public void DurableExecutionInvocationInput_Deserialization() + { + var json = """ + { + "DurableExecutionArn": "arn:aws:lambda:us-east-1:123:durable-execution:abc", + "CheckpointToken": "token-1", + "InitialExecutionState": { + "Operations": [ + { + "Id": "exec-1", + "Type": "EXECUTION", + "Status": "STARTED", + "ExecutionDetails": { + "InputPayload": "{\"orderId\":\"order-123\",\"amount\":99.99}" + } + }, + { + "Id": "op-1", + "Type": "STEP", + "Status": "SUCCEEDED", + "Name": "validate", + "StepDetails": { + "Result": "true" + } + } + ] + } + } + """; + + var input = JsonSerializer.Deserialize(json)!; + + Assert.Equal("arn:aws:lambda:us-east-1:123:durable-execution:abc", input.DurableExecutionArn); + Assert.Equal("token-1", input.CheckpointToken); + Assert.NotNull(input.InitialExecutionState); + Assert.Equal(2, input.InitialExecutionState!.Operations!.Count); + + var stepOp = input.InitialExecutionState.Operations![1]; + Assert.Equal("op-1", stepOp.Id); + Assert.Equal(OperationTypes.Step, stepOp.Type); + Assert.Equal("true", stepOp.StepDetails?.Result); + + // The EXECUTION operation carries the user payload in ExecutionDetails.InputPayload. + var execOp = input.InitialExecutionState.Operations[0]; + Assert.Equal(OperationTypes.Execution, execOp.Type); + var payload = JsonSerializer.Deserialize(execOp.ExecutionDetails!.InputPayload!); + Assert.Equal("order-123", payload!.OrderId); + Assert.Equal(99.99m, payload.Amount); + } + + [Fact] + public void DurableExecutionInvocationInput_NoExecutionOp_HasNullPayload() + { + var input = new DurableExecutionInvocationInput + { + DurableExecutionArn = "arn:test" + }; + + // No InitialExecutionState means no EXECUTION operation and thus no user payload + Assert.Null(input.InitialExecutionState); + } + + [Fact] + public void DurableExecutionInvocationOutput_Succeeded() + { + var output = new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Succeeded, + Result = "{\"status\":\"approved\"}" + }; + + var json = JsonSerializer.Serialize(output); + var deserialized = JsonSerializer.Deserialize(json)!; + + Assert.Equal(InvocationStatus.Succeeded, deserialized.Status); + Assert.Equal("{\"status\":\"approved\"}", deserialized.Result); + } + + [Fact] + public void DurableExecutionInvocationOutput_Failed() + { + var output = new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Failed, + Error = new ErrorObject + { + ErrorMessage = "step failed", + ErrorType = "StepException" + } + }; + + var json = JsonSerializer.Serialize(output); + var deserialized = JsonSerializer.Deserialize(json)!; + + Assert.Equal(InvocationStatus.Failed, deserialized.Status); + Assert.NotNull(deserialized.Error); + Assert.Equal("step failed", deserialized.Error!.ErrorMessage); + Assert.Equal("StepException", deserialized.Error.ErrorType); + } + + [Fact] + public void DurableExecutionInvocationOutput_Pending() + { + var output = new DurableExecutionInvocationOutput + { + Status = InvocationStatus.Pending + }; + + var json = JsonSerializer.Serialize(output); + var deserialized = JsonSerializer.Deserialize(json)!; + + Assert.Equal(InvocationStatus.Pending, deserialized.Status); + Assert.Null(deserialized.Result); + Assert.Null(deserialized.Error); + } + + private class TestOrderEvent + { + [System.Text.Json.Serialization.JsonPropertyName("orderId")] + public string? OrderId { get; set; } + + [System.Text.Json.Serialization.JsonPropertyName("amount")] + public decimal Amount { get; set; } + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/OperationIdGeneratorTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/OperationIdGeneratorTests.cs new file mode 100644 index 000000000..6eb63551b --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/OperationIdGeneratorTests.cs @@ -0,0 +1,100 @@ +using System.Security.Cryptography; +using System.Text; +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class OperationIdGeneratorTests +{ + private static string Sha256Hex(string input) + { + using var sha = SHA256.Create(); + var bytes = sha.ComputeHash(Encoding.UTF8.GetBytes(input)); + var sb = new StringBuilder(bytes.Length * 2); + foreach (var b in bytes) sb.Append(b.ToString("x2")); + return sb.ToString(); + } + + [Fact] + public void NextId_ProducesSha256OfPositionString_StartingAtOne() + { + var gen = new OperationIdGenerator(); + Assert.Equal(Sha256Hex("1"), gen.NextId()); + Assert.Equal(Sha256Hex("2"), gen.NextId()); + Assert.Equal(Sha256Hex("3"), gen.NextId()); + } + + [Fact] + public void NextId_NameIsNotPartOfId() + { + // Name must not influence the deterministic ID — replays must still + // correlate after a step is renamed. The reference SDKs (Java/JS/Python) + // all keep Name in a separate field on OperationUpdate. + var gen = new OperationIdGenerator(); + Assert.Equal(Sha256Hex("1"), gen.NextId()); + Assert.Equal(Sha256Hex("2"), gen.NextId()); + } + + [Fact] + public void HashOperationId_IsStable() + { + Assert.Equal(Sha256Hex("hello"), OperationIdGenerator.HashOperationId("hello")); + Assert.Equal(Sha256Hex("1"), OperationIdGenerator.HashOperationId("1")); + } + + [Fact] + public void ChildGenerator_PrefixesPositionWithParentHash() + { + var gen = new OperationIdGenerator(); + var parentId = gen.NextId(); + var child = gen.CreateChild(parentId); + + Assert.Equal(Sha256Hex(parentId + "-1"), child.NextId()); + Assert.Equal(Sha256Hex(parentId + "-2"), child.NextId()); + } + + [Fact] + public void ChildGenerator_ParentIdProperty() + { + var gen = new OperationIdGenerator(); + Assert.Null(gen.ParentId); + + var child = new OperationIdGenerator("op-5"); + Assert.Equal("op-5", child.ParentId); + } + + [Fact] + public void MultipleChildren_IndependentCounters() + { + var child1 = new OperationIdGenerator("parent-1"); + var child2 = new OperationIdGenerator("parent-2"); + + Assert.Equal(Sha256Hex("parent-1-1"), child1.NextId()); + Assert.Equal(Sha256Hex("parent-2-1"), child2.NextId()); + Assert.Equal(Sha256Hex("parent-1-2"), child1.NextId()); + Assert.Equal(Sha256Hex("parent-2-2"), child2.NextId()); + } + + [Fact] + public void Deterministic_SameSequenceOnReplay() + { + var gen1 = new OperationIdGenerator(); + var ids1 = new[] { gen1.NextId(), gen1.NextId(), gen1.NextId() }; + + var gen2 = new OperationIdGenerator(); + var ids2 = new[] { gen2.NextId(), gen2.NextId(), gen2.NextId() }; + + Assert.Equal(ids1, ids2); + } + + [Fact] + public void Reset_RewindsCounter() + { + var gen = new OperationIdGenerator(); + gen.NextId(); + gen.NextId(); + gen.Reset(); + Assert.Equal(Sha256Hex("1"), gen.NextId()); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RecordingBatcher.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RecordingBatcher.cs new file mode 100644 index 000000000..8fe7b6d6d --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RecordingBatcher.cs @@ -0,0 +1,51 @@ +using Amazon.Lambda.DurableExecution.Internal; +using SdkOperationUpdate = Amazon.Lambda.Model.OperationUpdate; + +namespace Amazon.Lambda.DurableExecution.Tests; + +/// +/// Test helper: a that records every flushed +/// update without making any network calls. Tests construct one of these in +/// place of a real batcher to inspect what would have been sent to the service. +/// +internal sealed class RecordingBatcher +{ + private readonly List _flushed = new(); + private readonly List _flushBatchSizes = new(); + private readonly object _lock = new(); + + public CheckpointBatcher Batcher { get; } + + public RecordingBatcher(CheckpointBatcherConfig? config = null) + { + Batcher = new CheckpointBatcher("test-token", Flush, config); + } + + /// + /// Cumulative list of every update that has been flushed, in order. + /// + public IReadOnlyList Flushed + { + get { lock (_lock) return _flushed.ToArray(); } + } + + /// + /// One entry per batch flushed, recording the batch size. With + /// = Zero (default), + /// every produces one batch. + /// + public IReadOnlyList FlushBatchSizes + { + get { lock (_lock) return _flushBatchSizes.ToArray(); } + } + + private Task Flush(string? token, IReadOnlyList ops, CancellationToken ct) + { + lock (_lock) + { + _flushed.AddRange(ops); + _flushBatchSizes.Add(ops.Count); + } + return Task.FromResult(token); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RetryStrategyTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RetryStrategyTests.cs new file mode 100644 index 000000000..e5a277fb6 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/RetryStrategyTests.cs @@ -0,0 +1,202 @@ +using Amazon.Lambda.DurableExecution; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class RetryStrategyTests +{ + [Fact] + public void ExponentialDefault_RetriesUpToMaxAttempts() + { + var strategy = RetryStrategy.Default; + + // Attempts 1-5 should retry (maxAttempts=6 means 6 total attempts) + for (int i = 1; i < 6; i++) + { + var decision = strategy.ShouldRetry(new InvalidOperationException("fail"), i); + Assert.True(decision.ShouldRetry); + Assert.True(decision.Delay >= TimeSpan.FromSeconds(1)); + } + + // Attempt 6 should not retry (exhausted) + var lastDecision = strategy.ShouldRetry(new InvalidOperationException("fail"), 6); + Assert.False(lastDecision.ShouldRetry); + } + + [Fact] + public void None_NeverRetries() + { + var strategy = RetryStrategy.None; + + var decision = strategy.ShouldRetry(new Exception("fail"), 1); + Assert.False(decision.ShouldRetry); + } + + [Fact] + public void Transient_RetriesUpTo3Attempts() + { + var strategy = RetryStrategy.Transient; + + Assert.True(strategy.ShouldRetry(new Exception("fail"), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new Exception("fail"), 2).ShouldRetry); + Assert.False(strategy.ShouldRetry(new Exception("fail"), 3).ShouldRetry); + } + + [Fact] + public void Exponential_DelayIncreases() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 5, + initialDelay: TimeSpan.FromSeconds(2), + maxDelay: TimeSpan.FromSeconds(120), + backoffRate: 2.0, + jitter: JitterStrategy.None); + + var d1 = strategy.ShouldRetry(new Exception(), 1).Delay; + var d2 = strategy.ShouldRetry(new Exception(), 2).Delay; + var d3 = strategy.ShouldRetry(new Exception(), 3).Delay; + + // With no jitter: 2s, 4s, 8s (ceiling to whole seconds) + Assert.Equal(TimeSpan.FromSeconds(2), d1); + Assert.Equal(TimeSpan.FromSeconds(4), d2); + Assert.Equal(TimeSpan.FromSeconds(8), d3); + } + + [Fact] + public void Exponential_DelayCapsAtMax() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 10, + initialDelay: TimeSpan.FromSeconds(10), + maxDelay: TimeSpan.FromSeconds(30), + backoffRate: 3.0, + jitter: JitterStrategy.None); + + // Attempt 3: 10 * 3^2 = 90, capped to 30 + var decision = strategy.ShouldRetry(new Exception(), 3); + Assert.Equal(TimeSpan.FromSeconds(30), decision.Delay); + } + + [Fact] + public void Exponential_FullJitter_BoundedByDelay() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 5, + initialDelay: TimeSpan.FromSeconds(10), + maxDelay: TimeSpan.FromSeconds(100), + backoffRate: 2.0, + jitter: JitterStrategy.Full); + + // Run multiple times to check bounds + for (int i = 0; i < 50; i++) + { + var decision = strategy.ShouldRetry(new Exception(), 1); + Assert.True(decision.Delay >= TimeSpan.FromSeconds(1)); + Assert.True(decision.Delay <= TimeSpan.FromSeconds(10)); + } + } + + [Fact] + public void Exponential_HalfJitter_BoundedBetween50And100Percent() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 5, + initialDelay: TimeSpan.FromSeconds(10), + maxDelay: TimeSpan.FromSeconds(100), + backoffRate: 2.0, + jitter: JitterStrategy.Half); + + for (int i = 0; i < 50; i++) + { + var decision = strategy.ShouldRetry(new Exception(), 1); + Assert.True(decision.Delay >= TimeSpan.FromSeconds(5)); + Assert.True(decision.Delay <= TimeSpan.FromSeconds(10)); + } + } + + [Fact] + public void Exponential_RetryableExceptions_FiltersCorrectly() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableExceptions: new[] { typeof(TimeoutException), typeof(HttpRequestException) }); + + Assert.True(strategy.ShouldRetry(new TimeoutException(), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new HttpRequestException(), 1).ShouldRetry); + Assert.False(strategy.ShouldRetry(new InvalidOperationException(), 1).ShouldRetry); + } + + [Fact] + public void Exponential_RetryableExceptions_MatchesDerivedTypes() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableExceptions: new[] { typeof(IOException) }); + + Assert.True(strategy.ShouldRetry(new FileNotFoundException(), 1).ShouldRetry); + } + + [Fact] + public void Exponential_MessagePatterns_FiltersCorrectly() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableMessagePatterns: new[] { "timeout", "throttl", "5\\d{2}" }); + + Assert.True(strategy.ShouldRetry(new Exception("connection timeout"), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new Exception("request throttled"), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new Exception("HTTP 503"), 1).ShouldRetry); + Assert.False(strategy.ShouldRetry(new Exception("not found"), 1).ShouldRetry); + } + + [Fact] + public void Exponential_BothFilters_EitherMatches() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 3, + retryableExceptions: new[] { typeof(TimeoutException) }, + retryableMessagePatterns: new[] { "throttl" }); + + // Matches exception type + Assert.True(strategy.ShouldRetry(new TimeoutException("any message"), 1).ShouldRetry); + // Matches message pattern + Assert.True(strategy.ShouldRetry(new Exception("throttled"), 1).ShouldRetry); + // Matches neither + Assert.False(strategy.ShouldRetry(new InvalidOperationException("bad state"), 1).ShouldRetry); + } + + [Fact] + public void Exponential_NoFilters_RetriesAllExceptions() + { + var strategy = RetryStrategy.Exponential(maxAttempts: 3); + + Assert.True(strategy.ShouldRetry(new Exception("anything"), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new InvalidOperationException(), 1).ShouldRetry); + Assert.True(strategy.ShouldRetry(new OutOfMemoryException(), 1).ShouldRetry); + } + + [Fact] + public void Exponential_MinimumDelayIsOneSecond() + { + var strategy = RetryStrategy.Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromMilliseconds(100), + jitter: JitterStrategy.None); + + var decision = strategy.ShouldRetry(new Exception(), 1); + Assert.True(decision.Delay >= TimeSpan.FromSeconds(1)); + } + + [Fact] + public void FromDelegate_UsesProvidedFunction() + { + var strategy = RetryStrategy.FromDelegate((ex, attempt) => + attempt < 2 && ex is TimeoutException + ? RetryDecision.RetryAfter(TimeSpan.FromSeconds(5)) + : RetryDecision.DoNotRetry()); + + Assert.True(strategy.ShouldRetry(new TimeoutException(), 1).ShouldRetry); + Assert.False(strategy.ShouldRetry(new TimeoutException(), 2).ShouldRetry); + Assert.False(strategy.ShouldRetry(new Exception(), 1).ShouldRetry); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/TerminationManagerTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/TerminationManagerTests.cs new file mode 100644 index 000000000..a12ff4a6c --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/TerminationManagerTests.cs @@ -0,0 +1,88 @@ +using Amazon.Lambda.DurableExecution.Internal; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +public class TerminationManagerTests +{ + [Fact] + public async Task Terminate_ResolvesTerminationTask() + { + var manager = new TerminationManager(); + Assert.False(manager.IsTerminated); + + manager.Terminate(TerminationReason.WaitScheduled, "wait pending"); + + Assert.True(manager.IsTerminated); + var result = await manager.TerminationTask; + Assert.Equal(TerminationReason.WaitScheduled, result.Reason); + Assert.Equal("wait pending", result.Message); + } + + [Fact] + public void Terminate_OnlyFirstCallWins() + { + var manager = new TerminationManager(); + + var first = manager.Terminate(TerminationReason.WaitScheduled, "first"); + var second = manager.Terminate(TerminationReason.CallbackPending, "second"); + + Assert.True(first); + Assert.False(second); + } + + [Fact] + public async Task Terminate_FirstReasonIsPreserved() + { + var manager = new TerminationManager(); + + manager.Terminate(TerminationReason.CallbackPending, "callback"); + manager.Terminate(TerminationReason.WaitScheduled, "wait"); + + var result = await manager.TerminationTask; + Assert.Equal(TerminationReason.CallbackPending, result.Reason); + Assert.Equal("callback", result.Message); + } + + [Fact] + public async Task Terminate_WithException() + { + var manager = new TerminationManager(); + var ex = new Exception("checkpoint failed"); + + manager.Terminate(TerminationReason.CheckpointFailed, "error", ex); + + var result = await manager.TerminationTask; + Assert.Equal(TerminationReason.CheckpointFailed, result.Reason); + Assert.Same(ex, result.Exception); + } + + [Fact] + public async Task TerminationTask_WinsRaceAgainstNeverCompletingTask() + { + var manager = new TerminationManager(); + var neverCompletes = new TaskCompletionSource().Task; + + manager.Terminate(TerminationReason.WaitScheduled); + + var winner = await Task.WhenAny(neverCompletes, manager.TerminationTask); + Assert.Same(manager.TerminationTask, winner); + } + + [Fact] + public async Task ConcurrentTerminate_OnlyOneSucceeds() + { + var manager = new TerminationManager(); + var results = new bool[10]; + + var tasks = Enumerable.Range(0, 10).Select(i => Task.Run(() => + { + results[i] = manager.Terminate(TerminationReason.WaitScheduled, $"caller-{i}"); + })); + + await Task.WhenAll(tasks); + + Assert.Equal(1, results.Count(r => r)); + Assert.True(manager.IsTerminated); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/UpperSnakeCaseEnumConverterTests.cs b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/UpperSnakeCaseEnumConverterTests.cs new file mode 100644 index 000000000..7ac6df052 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/UpperSnakeCaseEnumConverterTests.cs @@ -0,0 +1,84 @@ +using System.Text.Json; +using System.Text.Json.Serialization; +using Amazon.Lambda.DurableExecution; +using Xunit; + +namespace Amazon.Lambda.DurableExecution.Tests; + +/// +/// Direct tests for UpperSnakeCaseEnumConverter via a sample enum, exercising +/// every branch (Read with multi-word value, Read with single word, Read with +/// null/unparsable, plus the Write path for outbound serialization). +/// +public class UpperSnakeCaseEnumConverterTests +{ + public enum Sample + { + None, + FooBar, + BazQuxQuux + } + + public class Holder + { + [JsonConverter(typeof(UpperSnakeCaseEnumConverter))] + public Sample Value { get; set; } + } + + [Theory] + [InlineData("\"FOO_BAR\"", Sample.FooBar)] + [InlineData("\"BAZ_QUX_QUUX\"", Sample.BazQuxQuux)] + [InlineData("\"NONE\"", Sample.None)] + public void Read_UpperSnakeCase_ReturnsExpectedEnum(string json, Sample expected) + { + var holder = JsonSerializer.Deserialize($"{{\"Value\":{json}}}")!; + Assert.Equal(expected, holder.Value); + } + + [Fact] + public void Read_NullValue_ReturnsDefault() + { + var holder = JsonSerializer.Deserialize("{\"Value\":null}")!; + Assert.Equal(Sample.None, holder.Value); + } + + [Fact] + public void Read_AlreadyPascalCase_ParsesCaseInsensitively() + { + // The converter first tries snake→pascal, then a raw case-insensitive parse. + // A camel-case input like "fooBar" hits the fallback path. + var holder = JsonSerializer.Deserialize("{\"Value\":\"fooBar\"}")!; + Assert.Equal(Sample.FooBar, holder.Value); + } + + [Fact] + public void Read_UnparsableValue_ThrowsJsonException() + { + // Unknown wire values must surface as JsonException rather than + // silently coercing to default(T) — otherwise an unrecognized + // service status would be indistinguishable from the zero value. + Assert.Throws(() => + JsonSerializer.Deserialize("{\"Value\":\"NOT_A_REAL_VALUE\"}")); + } + + [Fact] + public void Write_PascalCase_EmitsUpperSnake() + { + var json = JsonSerializer.Serialize(new Holder { Value = Sample.FooBar }); + Assert.Contains("\"FOO_BAR\"", json); + } + + [Fact] + public void Write_MultiWord_EmitsUpperSnake() + { + var json = JsonSerializer.Serialize(new Holder { Value = Sample.BazQuxQuux }); + Assert.Contains("\"BAZ_QUX_QUUX\"", json); + } + + [Fact] + public void Write_SingleWord_EmitsUpperWithoutUnderscores() + { + var json = JsonSerializer.Serialize(new Holder { Value = Sample.None }); + Assert.Contains("\"NONE\"", json); + } +} diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.runsettings b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.runsettings new file mode 100644 index 000000000..6c38b1258 --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.runsettings @@ -0,0 +1,15 @@ + + + + + + + cobertura + [Amazon.Lambda.DurableExecution]* + [Amazon.Lambda.DurableExecution.Tests]* + GeneratedCodeAttribute + + + + + diff --git a/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.sh b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.sh new file mode 100644 index 000000000..b953bd07e --- /dev/null +++ b/Libraries/test/Amazon.Lambda.DurableExecution.Tests/coverage.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +set -e +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT="$(cd "$HERE/../../.." && pwd)" +PROJ="$HERE/Amazon.Lambda.DurableExecution.Tests.csproj" +OUT="$HERE/TestResults" + +rm -rf "$OUT" +dotnet test "$PROJ" -c Release \ + --collect:"XPlat Code Coverage" \ + --settings "$HERE/coverage.runsettings" \ + --results-directory "$OUT" + +REPORT_FILE=$(find "$OUT" -name "coverage.cobertura.xml" -type f | head -1) +if [ -z "$REPORT_FILE" ]; then + echo "No coverage report found under $OUT" + exit 1 +fi + +reportgenerator \ + "-reports:$REPORT_FILE" \ + "-targetdir:$OUT/report" \ + "-reporttypes:Html;TextSummary" + +echo +echo "==================== Coverage Summary ====================" +cat "$OUT/report/Summary.txt" +echo "==========================================================" +echo "Full HTML report: $OUT/report/index.html" diff --git a/buildtools/build.proj b/buildtools/build.proj index 037c11f0a..0b80ec612 100644 --- a/buildtools/build.proj +++ b/buildtools/build.proj @@ -215,6 +215,7 @@ +