-
Notifications
You must be signed in to change notification settings - Fork 5
docs(power): clarify replay safety and step semantics #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -79,7 +79,7 @@ Load the appropriate reference file based on what the user is working on: | |
|
|
||
| - **Getting started**, **basic setup**, **example**, **ESLint**, or **Jest setup** -> see [getting-started.md](steering/getting-started.md) | ||
| - **Understanding replay model**, **determinism**, or **non-deterministic errors** -> see [replay-model-rules.md](steering/replay-model-rules.md) | ||
| - **Creating steps**, **atomic operations**, or **retry logic** -> see [step-operations.md](steering/step-operations.md) | ||
| - **Creating steps**, **step operations**, or **retry logic** -> see [step-operations.md](steering/step-operations.md) | ||
| - **Waiting**, **delays**, **callbacks**, **external systems**, or **polling** -> see [wait-operations.md](steering/wait-operations.md) | ||
| - **Parallel execution**, **map operations**, **batch processing**, or **concurrency** -> see [concurrent-operations.md](steering/concurrent-operations.md) | ||
| - **Error handling**, **retry strategies**, **saga pattern**, or **compensating transactions** -> see [error-handling.md](steering/error-handling.md) | ||
|
|
@@ -117,10 +117,11 @@ def handler(event: dict, context: DurableContext) -> dict: | |
|
|
||
| ### Critical Rules | ||
|
|
||
| 1. **All non-deterministic code MUST be in steps** (Date.now, Math.random, API calls) | ||
| 2. **Cannot nest durable operations** - use `runInChildContext` to group operations | ||
| 3. **Closure mutations are lost on replay** - return values from steps | ||
| 4. **Side effects outside steps repeat** - use `context.logger` (replay-aware) | ||
| 1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches) | ||
| 2. **Durable operation bodies are not guaranteed to be atomic** - prefer stable identity and idempotent behavior for external side effects; for non-idempotent steps, consider at-most-once-per-retry semantics with zero retries | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this repeats yet again content from before. it also refers to "durable operation bodies" but then describes semantics that are only available on step.
|
||
| 3. **Cannot nest durable operations** - use `runInChildContext` to group operations | ||
| 4. **Closure mutations are lost on replay** - return values from steps | ||
| 5. **Side effects outside durable operations repeat** - prefer `context.logger`; custom loggers may duplicate on replay | ||
|
|
||
| ### Python API Differences | ||
|
|
||
|
|
@@ -163,10 +164,11 @@ See here: https://docs.aws.amazon.com/lambda/latest/dg/durable-security.html | |
|
|
||
| When writing or reviewing durable function code, ALWAYS check for these replay model violations: | ||
|
|
||
| 1. **Non-deterministic code outside steps**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside steps | ||
| 2. **Nested durable operations in step functions**: Cannot call `context.step()`, `context.wait()`, or `context.invoke()` inside a step function — use `context.runInChildContext()` instead | ||
| 3. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead | ||
| 4. **Side effects outside steps that repeat on replay**: Use `context.logger` for logging (it is replay-aware and deduplicates automatically) | ||
| 1. **Non-deterministic code outside durable operations**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside durable operations | ||
| 2. **Non-atomic durable operation bodies**: Functions passed to `context.step()`, `waitForCallback()`, `waitForCondition()`, and `parallel()`/`map()` branches may be re-attempted before persistence is fully committed — prefer stable identity and idempotent external effects; for non-idempotent steps, use at-most-once-per-retry semantics with zero retries when duplicate execution is unacceptable | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this idea is getting repeated a lot. can we consolidate. If keeping this, maybe something like:
|
||
| 3. **Nested durable operations in step functions**: Cannot call `context.step()`, `context.wait()`, or `context.invoke()` inside a step function — use `context.runInChildContext()` instead | ||
| 4. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead | ||
| 5. **Side effects outside durable operations that repeat on replay**: Prefer `context.logger` because it is replay-aware and deduplicates automatically; custom loggers are allowed but may emit duplicates unless `context.logger` is configured to wrap them | ||
|
|
||
| When implementing or modifying tests for durable functions, ALWAYS verify: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -252,7 +252,7 @@ my-durable-function/ | |
|
|
||
| ## ESLint Plugin Setup | ||
|
|
||
| Install the ESLint plugin to catch common durable function mistakes at development time: | ||
| For TypeScript durable-function projects, use the ESLint plugin to catch common mistakes at development time: | ||
|
|
||
| ```bash | ||
| npm install --save-dev @aws/durable-execution-sdk-js-eslint-plugin | ||
|
|
@@ -304,6 +304,8 @@ export default [ | |
| - Incorrect usage of durable context outside handler | ||
| - Common replay model violations | ||
|
|
||
| Use the plugin by default in new TypeScript projects. It is a strong static guardrail, not a runtime guarantee, so equivalent enforcement is acceptable if your team already has it. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. verbose.
|
||
|
|
||
| ## Jest Configuration | ||
|
|
||
| **jest.config.js:** | ||
|
|
@@ -339,21 +341,21 @@ Add `aws-durable-execution-sdk-python-testing` to your dev/test dependencies in | |
|
|
||
| 1. **Write handler** with durable operations | ||
| 2. **Test locally** with `LocalDurableTestRunner` | ||
| 3. **Validate replay rules** (no non-deterministic code outside steps) | ||
| 3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about determinism outside of durable operations? the first sentence is negative (i.e don't do this) and the second is positive (do this), but this is not clear from the text. Is there a replay section this can link to instead? |
||
| 4. **Deploy** with qualified ARN (version or alias) | ||
| 5. **Monitor** execution state and logs | ||
|
|
||
| ### Python | ||
|
|
||
| 1. **Write handler** with `@durable_execution` decorator | ||
| 2. **Test locally** with `DurableFunctionTestRunner` and pytest | ||
| 3. **Validate replay rules** (no non-deterministic code outside steps) | ||
| 3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as above |
||
| 4. **Deploy** with qualified ARN (version or alias) | ||
| 5. **Monitor** execution state and logs | ||
|
|
||
| ## Key Concepts | ||
|
|
||
| - **Steps**: Atomic operations with automatic retry and checkpointing | ||
| - **Steps**: Persisted operations with automatic retry and checkpointing | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Executes business logic with automatic checkpointing and retry. Persist the result of an operation in a checkpoint. |
||
| - **Waits**: Suspend execution without compute charges (up to 1 year) | ||
| - **Child Contexts**: Group multiple durable operations | ||
| - **Callbacks**: Wait for external systems to respond | ||
|
|
@@ -368,13 +370,13 @@ When starting a new durable function project: | |
| - [ ] Install dependencies (`@aws/durable-execution-sdk-js`, testing & eslint packages) | ||
| - [ ] Create `jest.config.js` with ts-jest preset | ||
| - [ ] Configure `tsconfig.json` with proper module resolution | ||
| - [ ] Set up ESLint with durable execution plugin | ||
| - [ ] Set up ESLint with durable execution plugin (strongly recommended default for TypeScript) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unnecessarily repeating "strongly recommended" from above |
||
| - [ ] Create handler with `withDurableExecution` wrapper | ||
| - [ ] Write tests using `LocalDurableTestRunner` | ||
| - [ ] Use `skipTime: true` for fast test execution | ||
| - [ ] Verify TypeScript compilation: `npx tsc --noEmit` | ||
| - [ ] Run tests to confirm setup: `npm test` | ||
| - [ ] Review replay model rules (no non-deterministic code outside steps) | ||
| - [ ] Review replay model rules (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as above, not actually specifying if these are affirmative or negative rules |
||
|
|
||
| ### Python | ||
|
|
||
|
|
@@ -384,7 +386,7 @@ When starting a new durable function project: | |
| - [ ] Define step functions with `@durable_step` decorator | ||
| - [ ] Write tests using `DurableFunctionTestRunner` class | ||
| - [ ] Run tests: `pytest` | ||
| - [ ] Review replay model rules (no non-deterministic code outside steps) | ||
| - [ ] Review replay model rules (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as above |
||
|
|
||
| ## Error Scenarios | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,11 +21,11 @@ await context.wait({ seconds: 60 }); // Line 3: W | |
| const result = await context.step('process', async () => process(data)); // Line 5: Executes after wait | ||
| ``` | ||
|
|
||
| ## Rule 1: Deterministic Code Outside Steps | ||
| ## Rule 1: Deterministic Code Outside Durable Operations | ||
|
|
||
| **ALL code outside steps MUST produce the same result on every replay.** | ||
| **ALL code outside durable operations MUST produce the same result on every replay.** | ||
|
|
||
| ### ❌ WRONG - Non-Deterministic Outside Steps | ||
| ### ❌ WRONG - Non-Deterministic Outside Durable Operations | ||
|
|
||
| **TypeScript:** | ||
|
|
||
|
|
@@ -51,7 +51,7 @@ now = datetime.now() # Different datetime each time | |
| context.step(lambda _: save_data({"id": id}), name='save') | ||
| ``` | ||
|
|
||
| ### ✅ CORRECT - Non-Deterministic Inside Steps | ||
| ### ✅ CORRECT - Non-Deterministic Inside Durable Operations | ||
|
|
||
| **TypeScript:** | ||
|
|
||
|
|
@@ -75,7 +75,7 @@ now = context.step(lambda _: datetime.now(), name='get-date') | |
| context.step(lambda _: save_data({"id": id}), name='save') | ||
| ``` | ||
|
|
||
| ### Must Be In Steps | ||
| ### Must Be In Durable Operations | ||
|
|
||
| - `Date.now()`, `new Date()`, `time.time()`, `datetime.now()` | ||
| - `Math.random()`, `random.random()` | ||
|
|
@@ -86,7 +86,76 @@ context.step(lambda _: save_data({"id": id}), name='save') | |
| - Environment variable reads (if they can change) | ||
| - Any external system interaction | ||
|
|
||
| ## Rule 2: No Nested Durable Operations | ||
| Durable operations include `context.step(...)`, `waitForCallback(...)`, `waitForCondition(...)`, and branch/item functions passed to `context.parallel(...)` and `context.map(...)`. | ||
|
|
||
| ## Rule 2: Durable Operation Bodies Are Not Guaranteed To Be Atomic | ||
|
|
||
| **Functions passed to durable context APIs must assume the operation is not guaranteed to be atomic with respect to external side effects, and may be re-attempted before the durable runtime has fully recorded the result.** | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggestion: Code in durable operation must assume that it could re-run on replay, unless it is in a Step with an AT MOST ONCE execution guarantee. This means that external side-effects caused by such code could execute more than once. |
||
|
|
||
| This rule applies to: | ||
|
|
||
| - `context.step(...)` | ||
| - `waitForCallback(...)` submitters | ||
| - `waitForCondition(...)` check functions | ||
| - Branch/item functions used by `context.parallel(...)` and `context.map(...)` | ||
|
|
||
| ### What This Means | ||
|
|
||
| - Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not quite why it's "acceptable". suggestion: |
||
| - External side effects started from that body should still be safe under re-attempt whenever possible | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re-entrancy is strictly speaking the term described in this sentence. but the general thrust of some of the copy added is for idempotency: same result, no duplicate effects, running it N times has the same effect as running it once in general, yes, idempotency is a good design pattern to follow here. however, part of the point of checkpointing is to make provide the idempotency. so this sentence is recommending with "should" to avoid taking advantage of something durable functions provide as a key feature with AT MOST ONCE, which is deterministic checkpointing when wrapping non-idempotent code. |
||
| - If the side effect needs an identifier for idempotency, derive it from durable inputs/state or generate it once from durable state and reuse it | ||
| - If a **step** cannot be made idempotent and duplicate execution is unacceptable, use `StepSemantics.AtMostOncePerRetry` (TypeScript) or `StepSemantics.AT_MOST_ONCE_PER_RETRY` (Python) with retries disabled so the behavior is effectively zero-or-once rather than more than once | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. arguably the step is idempotent once it checkpoints. the inside of the step isn't. Also, now that we have Java, we should probably avoid listing per language (TypeSCript vs Python) each time, and instead reference the general concept and refer that to a single source of truth. |
||
|
|
||
| ### ❌ WRONG - Unstable External Identity Inside Durable Operation Body | ||
|
|
||
| **TypeScript:** | ||
|
|
||
| ```typescript | ||
| await context.step('start-export', async () => { | ||
| const jobId = `export-${Date.now()}`; | ||
| await exportClient.start({ jobId, orderId }); | ||
| }); | ||
| ``` | ||
|
|
||
| **Python:** | ||
|
|
||
| ```python | ||
| context.step( | ||
| lambda _: export_client.start({ | ||
| 'job_id': f'export-{time.time()}', | ||
| 'order_id': order_id | ||
| }), | ||
| name='start-export' | ||
| ) | ||
| ``` | ||
|
|
||
| ### ✅ CORRECT - Stable Identity Derived From Durable State | ||
|
|
||
| **TypeScript:** | ||
|
|
||
| ```typescript | ||
| const jobId = `export-${orderId}`; | ||
|
|
||
| await context.step('start-export', async () => { | ||
| await exportClient.start({ jobId, orderId }); | ||
| }); | ||
| ``` | ||
|
|
||
| **Python:** | ||
|
|
||
| ```python | ||
| job_id = f'export-{order_id}' | ||
|
|
||
| context.step( | ||
| lambda _: export_client.start({ | ||
| 'job_id': job_id, | ||
| 'order_id': order_id | ||
| }), | ||
| name='start-export' | ||
| ) | ||
| ``` | ||
|
|
||
| ## Rule 3: No Nested Durable Operations | ||
|
|
||
| **You CANNOT call durable operations inside a step function.** | ||
|
|
||
|
|
@@ -141,7 +210,7 @@ def process_child(child_ctx: DurableContext): | |
| context.run_in_child_context(func=process_child, name='process') | ||
| ``` | ||
|
|
||
| ## Rule 3: Closure Mutations Are Lost | ||
| ## Rule 4: Closure Mutations Are Lost | ||
|
|
||
| **Variables mutated inside steps are NOT preserved across replays.** | ||
|
|
||
|
|
@@ -188,9 +257,9 @@ counter = context.step(lambda _: counter + 1, name='increment') | |
| print(counter) # Correct value | ||
| ``` | ||
|
|
||
| ## Rule 4: Side Effects Outside Steps Repeat | ||
| ## Rule 5: Side Effects Outside Durable Operations Repeat | ||
|
|
||
| **Side effects outside steps happen on EVERY replay.** | ||
| **Side effects outside durable operations happen on EVERY replay.** | ||
|
|
||
| ### ❌ WRONG - Repeated Side Effects | ||
|
|
||
|
|
@@ -214,7 +283,7 @@ update_database(data) # Updates multiple times! | |
| context.step(lambda _: process(), name='process') | ||
| ``` | ||
|
|
||
| ### ✅ CORRECT - Side Effects In Steps | ||
| ### ✅ CORRECT - Replay-Aware Logging And Checkpointed Side Effects | ||
|
|
||
| **TypeScript:** | ||
|
|
||
|
|
@@ -239,6 +308,8 @@ context.step(process()) | |
|
|
||
| `context.logger` is replay-aware and safe to use anywhere. It automatically deduplicates logs across replays. | ||
|
|
||
| Custom loggers are still allowed. If you use a non-replay-aware logger outside durable operations, expect duplicate log entries on replay. If you want to keep an existing logging interface, configure `context.logger` to wrap that existing logger inside the durable handler. | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| ### Pitfall 1: Reading Environment Variables | ||
|
|
@@ -286,15 +357,38 @@ if (shouldTakePathA) { | |
| } | ||
| ``` | ||
|
|
||
| ### Pitfall 4: Assuming Durable Operation Bodies Are Atomic | ||
|
|
||
| ```typescript | ||
| // ❌ WRONG | ||
| await context.waitForCallback( | ||
| 'wait-payment', | ||
| async (callbackId) => { | ||
| const requestId = `payment-${Date.now()}`; | ||
| await paymentProvider.createPayment({ requestId, callbackId }); | ||
| } | ||
| ); | ||
|
|
||
| // ✅ CORRECT | ||
| const requestId = `payment-${orderId}`; | ||
| await context.waitForCallback( | ||
| 'wait-payment', | ||
| async (callbackId) => { | ||
| await paymentProvider.createPayment({ requestId, callbackId }); | ||
| } | ||
| ); | ||
| ``` | ||
|
|
||
| ## Debugging Replay Issues | ||
|
|
||
| If you see inconsistent behavior: | ||
|
|
||
| 1. **Check for non-deterministic code outside steps** | ||
| 2. **Verify no nested durable operations** | ||
| 3. **Look for closure mutations** | ||
| 4. **Search for side effects outside steps** | ||
| 5. **Use `context.logger` to trace execution flow** | ||
| 1. **Check for non-deterministic code outside durable operations** | ||
| 2. **Check durable operation bodies for non-atomic external side effects** | ||
| 3. **Verify no nested durable operations** | ||
| 4. **Look for closure mutations** | ||
| 5. **Search for side effects outside durable operations** | ||
| 6. **Use `context.logger` to trace execution flow** | ||
|
|
||
| ## Testing Replay Behavior | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passive voice, prefer active