Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions aws-lambda-durable-functions-power/POWER.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Load the appropriate reference file based on what the user is working on:

- **Getting started**, **basic setup**, **example**, **ESLint**, or **Jest setup** -> see [getting-started.md](steering/getting-started.md)
- **Understanding replay model**, **determinism**, or **non-deterministic errors** -> see [replay-model-rules.md](steering/replay-model-rules.md)
- **Creating steps**, **atomic operations**, or **retry logic** -> see [step-operations.md](steering/step-operations.md)
- **Creating steps**, **step operations**, or **retry logic** -> see [step-operations.md](steering/step-operations.md)
- **Waiting**, **delays**, **callbacks**, **external systems**, or **polling** -> see [wait-operations.md](steering/wait-operations.md)
- **Parallel execution**, **map operations**, **batch processing**, or **concurrency** -> see [concurrent-operations.md](steering/concurrent-operations.md)
- **Error handling**, **retry strategies**, **saga pattern**, or **compensating transactions** -> see [error-handling.md](steering/error-handling.md)
Expand Down Expand Up @@ -117,10 +117,11 @@ def handler(event: dict, context: DurableContext) -> dict:

### Critical Rules

1. **All non-deterministic code MUST be in steps** (Date.now, Math.random, API calls)
2. **Cannot nest durable operations** - use `runInChildContext` to group operations
3. **Closure mutations are lost on replay** - return values from steps
4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)
1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passive voice, prefer active

2. **Durable operation bodies are not guaranteed to be atomic** - prefer stable identity and idempotent behavior for external side effects; for non-idempotent steps, consider at-most-once-per-retry semantics with zero retries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this repeats yet again content from before.

it also refers to "durable operation bodies" but then describes semantics that are only available on step.

a step body may run successfully but fail to checkpoint, causing it to re-execute on replay. For steps where duplicate execution is unacceptable, use AtMostOncePerRetry with retries disabled

3. **Cannot nest durable operations** - use `runInChildContext` to group operations
4. **Closure mutations are lost on replay** - return values from steps
5. **Side effects outside durable operations repeat** - prefer `context.logger`; custom loggers may duplicate on replay

### Python API Differences

Expand Down Expand Up @@ -163,10 +164,11 @@ See here: https://docs.aws.amazon.com/lambda/latest/dg/durable-security.html

When writing or reviewing durable function code, ALWAYS check for these replay model violations:

1. **Non-deterministic code outside steps**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside steps
2. **Nested durable operations in step functions**: Cannot call `context.step()`, `context.wait()`, or `context.invoke()` inside a step function — use `context.runInChildContext()` instead
3. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead
4. **Side effects outside steps that repeat on replay**: Use `context.logger` for logging (it is replay-aware and deduplicates automatically)
1. **Non-deterministic code outside durable operations**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside durable operations
2. **Non-atomic durable operation bodies**: Functions passed to `context.step()`, `waitForCallback()`, `waitForCondition()`, and `parallel()`/`map()` branches may be re-attempted before persistence is fully committed — prefer stable identity and idempotent external effects; for non-idempotent steps, use at-most-once-per-retry semantics with zero retries when duplicate execution is unacceptable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idea is getting repeated a lot. can we consolidate.

If keeping this, maybe something like:

Durable operation bodies are not atomic: Code passed to context.step(),
waitForCallback(), waitForCondition(), and parallel()/map() branches can
succeed but fail to checkpoint, so the runtime may re-execute them on replay.
Prefer idempotent external side effects with stable identity. For steps where
duplicate execution is unacceptable, use AtMostOncePerRetry with retries
disabled.

3. **Nested durable operations in step functions**: Cannot call `context.step()`, `context.wait()`, or `context.invoke()` inside a step function — use `context.runInChildContext()` instead
4. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead
5. **Side effects outside durable operations that repeat on replay**: Prefer `context.logger` because it is replay-aware and deduplicates automatically; custom loggers are allowed but may emit duplicates unless `context.logger` is configured to wrap them

When implementing or modifying tests for durable functions, ALWAYS verify:

Expand Down
6 changes: 3 additions & 3 deletions aws-lambda-durable-functions-power/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ A Kiro power for building resilient, long-running multi-step applications and AI
## Overview

- **Replay Model Guidance** - Critical rules to avoid non-deterministic bugs
- **Step Operations** - Atomic operations with retry strategies
- **Step Operations** - Step patterns with retry strategies
- **Wait Operations** - Delays, callbacks, and polling patterns
- **Concurrent Operations** - Map and parallel execution with concurrency control
- **Error Handling** - Retry strategies, saga pattern, and compensating transactions
Expand Down Expand Up @@ -36,7 +36,7 @@ Kiro will load the appropriate steering files and guide you through:
### Critical Concepts

- **Replay Model** - How Lambda durable functions execute and replay
- **Determinism Rules** - What must be inside steps vs outside
- **Determinism Rules** - What must be inside durable operations vs outside
- **Qualified ARNs** - Why versions/aliases are required
- **Checkpoint Strategy** - When and how state is persisted

Expand Down Expand Up @@ -83,4 +83,4 @@ When you mention these keywords, Kiro will automatically load this power:

## License

This project is licensed under the Apache-2.0 License.
This project is licensed under the Apache-2.0 License.
24 changes: 13 additions & 11 deletions aws-lambda-durable-functions-power/steering/advanced-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,37 +105,39 @@ def handler(event: dict, context: DurableContext) -> str:
```typescript
import { StepSemantics } from '@aws/durable-execution-sdk-js';

// AtMostOncePerRetry (DEFAULT) - For idempotent operations
// Step executes at most once per retry attempt
// If step fails partway through, it won't re-execute the same attempt
// AtLeastOncePerRetry (DEFAULT) - For idempotent operations
// Step executes at least once per retry attempt
// If checkpointing fails after success, the step may re-execute on replay
await context.step(
'update-database',
async () => {
// This is idempotent - safe to retry
return await updateUserRecord(userId, data);
},
{ semantics: StepSemantics.AtMostOncePerRetry }
{ semantics: StepSemantics.AtLeastOncePerRetry }
);

// AtLeastOncePerRetry - For operations that can execute multiple times
// Step may execute multiple times per retry attempt
// Use when idempotency is handled externally
// AtMostOncePerRetry - For non-idempotent operations
// Step executes at most once per retry attempt
// Disable retries as well when duplicate execution is unacceptable
await context.step(
'send-notification',
async () => {
// External system handles deduplication
return await sendEmail(email, message);
},
{ semantics: StepSemantics.AtLeastOncePerRetry }
{
semantics: StepSemantics.AtMostOncePerRetry,
retryStrategy: () => ({ shouldRetry: false })
}
);
```

**When to use each:**

| Semantic | Use When | Example Operations |
| ----------------------- | ----------------------------- | ------------------------------------------------- |
| **AtMostOncePerRetry** | Operation is idempotent | Database updates, API calls with idempotency keys |
| **AtLeastOncePerRetry** | External deduplication exists | Queuing systems, event streams |
| **AtLeastOncePerRetry** | Operation is idempotent | Database updates, API calls with idempotency keys |
| **AtMostOncePerRetry** | Duplicate execution is unacceptable | Payments, one-time notifications, non-idempotent downstream calls |

## Completion Policies - Interaction and Combination

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,10 @@ const results = await context.map(

## Advanced Patterns

### Replay Safety in Branches

Functions passed to `context.parallel(...)` and `context.map(...)` are durable operation bodies. Treat branch and item functions the same way you treat step and wait bodies: they may be re-attempted before progress is fully persisted, so any external side effect they trigger must use stable identity and idempotent behavior. Derive identifiers from durable inputs such as item IDs, indexes, or prior durable state instead of `Date.now()`, randomness, or fresh UUIDs created inside the branch. See [replay-model-rules.md](replay-model-rules.md).

### Map with Callbacks

**TypeScript:**
Expand Down Expand Up @@ -416,3 +420,4 @@ const results = await context.map(
6. **Monitor concurrency limits** to avoid overwhelming systems
7. **Use child contexts** for complex per-item workflows
8. **Implement circuit breakers** for external service calls
9. **Use stable identity inside branches** when starting or addressing external work
16 changes: 9 additions & 7 deletions aws-lambda-durable-functions-power/steering/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ my-durable-function/

## ESLint Plugin Setup

Install the ESLint plugin to catch common durable function mistakes at development time:
For TypeScript durable-function projects, use the ESLint plugin to catch common mistakes at development time:

```bash
npm install --save-dev @aws/durable-execution-sdk-js-eslint-plugin
Expand Down Expand Up @@ -304,6 +304,8 @@ export default [
- Incorrect usage of durable context outside handler
- Common replay model violations

Use the plugin by default in new TypeScript projects. It is a strong static guardrail, not a runtime guarantee, so equivalent enforcement is acceptable if your team already has it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verbose.

Use the plugin by default in new TypeScript projects. It is a compile-time guardrail to catch common issues, but it is not a runtime guarantee.


## Jest Configuration

**jest.config.js:**
Expand Down Expand Up @@ -339,21 +341,21 @@ Add `aws-durable-execution-sdk-python-testing` to your dev/test dependencies in

1. **Write handler** with durable operations
2. **Test locally** with `LocalDurableTestRunner`
3. **Validate replay rules** (no non-deterministic code outside steps)
3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about determinism outside of durable operations? the first sentence is negative (i.e don't do this) and the second is positive (do this), but this is not clear from the text.

Is there a replay section this can link to instead?

4. **Deploy** with qualified ARN (version or alias)
5. **Monitor** execution state and logs

### Python

1. **Write handler** with `@durable_execution` decorator
2. **Test locally** with `DurableFunctionTestRunner` and pytest
3. **Validate replay rules** (no non-deterministic code outside steps)
3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

4. **Deploy** with qualified ARN (version or alias)
5. **Monitor** execution state and logs

## Key Concepts

- **Steps**: Atomic operations with automatic retry and checkpointing
- **Steps**: Persisted operations with automatic retry and checkpointing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executes business logic with automatic checkpointing and retry. Persist the result of an operation in a checkpoint.

- **Waits**: Suspend execution without compute charges (up to 1 year)
- **Child Contexts**: Group multiple durable operations
- **Callbacks**: Wait for external systems to respond
Expand All @@ -368,13 +370,13 @@ When starting a new durable function project:
- [ ] Install dependencies (`@aws/durable-execution-sdk-js`, testing & eslint packages)
- [ ] Create `jest.config.js` with ts-jest preset
- [ ] Configure `tsconfig.json` with proper module resolution
- [ ] Set up ESLint with durable execution plugin
- [ ] Set up ESLint with durable execution plugin (strongly recommended default for TypeScript)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessarily repeating "strongly recommended" from above

- [ ] Create handler with `withDurableExecution` wrapper
- [ ] Write tests using `LocalDurableTestRunner`
- [ ] Use `skipTime: true` for fast test execution
- [ ] Verify TypeScript compilation: `npx tsc --noEmit`
- [ ] Run tests to confirm setup: `npm test`
- [ ] Review replay model rules (no non-deterministic code outside steps)
- [ ] Review replay model rules (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, not actually specifying if these are affirmative or negative rules


### Python

Expand All @@ -384,7 +386,7 @@ When starting a new durable function project:
- [ ] Define step functions with `@durable_step` decorator
- [ ] Write tests using `DurableFunctionTestRunner` class
- [ ] Run tests: `pytest`
- [ ] Review replay model rules (no non-deterministic code outside steps)
- [ ] Review replay model rules (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above


## Error Scenarios

Expand Down
124 changes: 109 additions & 15 deletions aws-lambda-durable-functions-power/steering/replay-model-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ await context.wait({ seconds: 60 }); // Line 3: W
const result = await context.step('process', async () => process(data)); // Line 5: Executes after wait
```

## Rule 1: Deterministic Code Outside Steps
## Rule 1: Deterministic Code Outside Durable Operations

**ALL code outside steps MUST produce the same result on every replay.**
**ALL code outside durable operations MUST produce the same result on every replay.**

### ❌ WRONG - Non-Deterministic Outside Steps
### ❌ WRONG - Non-Deterministic Outside Durable Operations

**TypeScript:**

Expand All @@ -51,7 +51,7 @@ now = datetime.now() # Different datetime each time
context.step(lambda _: save_data({"id": id}), name='save')
```

### ✅ CORRECT - Non-Deterministic Inside Steps
### ✅ CORRECT - Non-Deterministic Inside Durable Operations

**TypeScript:**

Expand All @@ -75,7 +75,7 @@ now = context.step(lambda _: datetime.now(), name='get-date')
context.step(lambda _: save_data({"id": id}), name='save')
```

### Must Be In Steps
### Must Be In Durable Operations

- `Date.now()`, `new Date()`, `time.time()`, `datetime.now()`
- `Math.random()`, `random.random()`
Expand All @@ -86,7 +86,76 @@ context.step(lambda _: save_data({"id": id}), name='save')
- Environment variable reads (if they can change)
- Any external system interaction

## Rule 2: No Nested Durable Operations
Durable operations include `context.step(...)`, `waitForCallback(...)`, `waitForCondition(...)`, and branch/item functions passed to `context.parallel(...)` and `context.map(...)`.

## Rule 2: Durable Operation Bodies Are Not Guaranteed To Be Atomic

**Functions passed to durable context APIs must assume the operation is not guaranteed to be atomic with respect to external side effects, and may be re-attempted before the durable runtime has fully recorded the result.**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. what about at most once guarantee?

  2. what does "durable context APIs" mean? methods on the DurableContext? or the durable handler?

  3. "Functions" means something specific in coding, strictly speaking java doesn't have functions.

  4. Style: avoid passive

Suggestion: Code in durable operation must assume that it could re-run on replay, unless it is in a Step with an AT MOST ONCE execution guarantee. This means that external side-effects caused by such code could execute more than once.


This rule applies to:

- `context.step(...)`
- `waitForCallback(...)` submitters
- `waitForCondition(...)` check functions
- Branch/item functions used by `context.parallel(...)` and `context.map(...)`

### What This Means

- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not quite why it's "acceptable".

suggestion:
once code inside a durable step completes it saves to a checkpoint, and on subsequent replays the operation returns the saved result. in this way the result of non-deterministic code becomes deterministic on replay because the non-deterministic code does not re-run and the durable execution framework uses the checkpoint result instead.

- External side effects started from that body should still be safe under re-attempt whenever possible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-entrancy is strictly speaking the term described in this sentence.

but the general thrust of some of the copy added is for idempotency: same result, no duplicate effects, running it N times has the same effect as running it once

in general, yes, idempotency is a good design pattern to follow here.

however, part of the point of checkpointing is to make provide the idempotency. so this sentence is recommending with "should" to avoid taking advantage of something durable functions provide as a key feature with AT MOST ONCE, which is deterministic checkpointing when wrapping non-idempotent code.

- If the side effect needs an identifier for idempotency, derive it from durable inputs/state or generate it once from durable state and reuse it
- If a **step** cannot be made idempotent and duplicate execution is unacceptable, use `StepSemantics.AtMostOncePerRetry` (TypeScript) or `StepSemantics.AT_MOST_ONCE_PER_RETRY` (Python) with retries disabled so the behavior is effectively zero-or-once rather than more than once
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably the step is idempotent once it checkpoints.

the inside of the step isn't.

Also, now that we have Java, we should probably avoid listing per language (TypeSCript vs Python) each time, and instead reference the general concept and refer that to a single source of truth.


### ❌ WRONG - Unstable External Identity Inside Durable Operation Body

**TypeScript:**

```typescript
await context.step('start-export', async () => {
const jobId = `export-${Date.now()}`;
await exportClient.start({ jobId, orderId });
});
```

**Python:**

```python
context.step(
lambda _: export_client.start({
'job_id': f'export-{time.time()}',
'order_id': order_id
}),
name='start-export'
)
```

### ✅ CORRECT - Stable Identity Derived From Durable State

**TypeScript:**

```typescript
const jobId = `export-${orderId}`;

await context.step('start-export', async () => {
await exportClient.start({ jobId, orderId });
});
```

**Python:**

```python
job_id = f'export-{order_id}'

context.step(
lambda _: export_client.start({
'job_id': job_id,
'order_id': order_id
}),
name='start-export'
)
```

## Rule 3: No Nested Durable Operations

**You CANNOT call durable operations inside a step function.**

Expand Down Expand Up @@ -141,7 +210,7 @@ def process_child(child_ctx: DurableContext):
context.run_in_child_context(func=process_child, name='process')
```

## Rule 3: Closure Mutations Are Lost
## Rule 4: Closure Mutations Are Lost

**Variables mutated inside steps are NOT preserved across replays.**

Expand Down Expand Up @@ -188,9 +257,9 @@ counter = context.step(lambda _: counter + 1, name='increment')
print(counter) # Correct value
```

## Rule 4: Side Effects Outside Steps Repeat
## Rule 5: Side Effects Outside Durable Operations Repeat

**Side effects outside steps happen on EVERY replay.**
**Side effects outside durable operations happen on EVERY replay.**

### ❌ WRONG - Repeated Side Effects

Expand All @@ -214,7 +283,7 @@ update_database(data) # Updates multiple times!
context.step(lambda _: process(), name='process')
```

### ✅ CORRECT - Side Effects In Steps
### ✅ CORRECT - Replay-Aware Logging And Checkpointed Side Effects

**TypeScript:**

Expand All @@ -239,6 +308,8 @@ context.step(process())

`context.logger` is replay-aware and safe to use anywhere. It automatically deduplicates logs across replays.

Custom loggers are still allowed. If you use a non-replay-aware logger outside durable operations, expect duplicate log entries on replay. If you want to keep an existing logging interface, configure `context.logger` to wrap that existing logger inside the durable handler.

## Common Pitfalls

### Pitfall 1: Reading Environment Variables
Expand Down Expand Up @@ -286,15 +357,38 @@ if (shouldTakePathA) {
}
```

### Pitfall 4: Assuming Durable Operation Bodies Are Atomic

```typescript
// ❌ WRONG
await context.waitForCallback(
'wait-payment',
async (callbackId) => {
const requestId = `payment-${Date.now()}`;
await paymentProvider.createPayment({ requestId, callbackId });
}
);

// ✅ CORRECT
const requestId = `payment-${orderId}`;
await context.waitForCallback(
'wait-payment',
async (callbackId) => {
await paymentProvider.createPayment({ requestId, callbackId });
}
);
```

## Debugging Replay Issues

If you see inconsistent behavior:

1. **Check for non-deterministic code outside steps**
2. **Verify no nested durable operations**
3. **Look for closure mutations**
4. **Search for side effects outside steps**
5. **Use `context.logger` to trace execution flow**
1. **Check for non-deterministic code outside durable operations**
2. **Check durable operation bodies for non-atomic external side effects**
3. **Verify no nested durable operations**
4. **Look for closure mutations**
5. **Search for side effects outside durable operations**
6. **Use `context.logger` to trace execution flow**

## Testing Replay Behavior

Expand Down
Loading