|
| 1 | +# AWS Integration for Temporal Python SDK |
| 2 | + |
| 3 | +> ⚠️ **This package is currently at an experimental release stage.** ⚠️ |
| 4 | +
|
| 5 | +This package provides AWS integrations for the Temporal Python SDK, including an Amazon S3 driver for [external storage](../../../README.md#external-storage). |
| 6 | + |
| 7 | +## S3 Driver |
| 8 | + |
| 9 | +`S3StorageDriver` stores and retrieves Temporal payloads in Amazon S3. It accepts any `S3StorageDriverClient` implementation and a `bucket` — either a static name or a callable for dynamic per-payload selection. |
| 10 | + |
| 11 | +### Using the built-in aioboto3 client |
| 12 | + |
| 13 | +The SDK ships with an [`aioboto3`](https://github.com/terrycain/aioboto3)-based client. Install the extra to pull in its dependencies: |
| 14 | + |
| 15 | + python -m pip install "temporalio[aioboto3]" |
| 16 | + |
| 17 | +```python |
| 18 | +import aioboto3 |
| 19 | +import dataclasses |
| 20 | +from temporalio.client import Client |
| 21 | +from temporalio.contrib.aws.s3driver import S3StorageDriver |
| 22 | +from temporalio.contrib.aws.s3driver.aioboto3 import new_aioboto3_client |
| 23 | +from temporalio.converter import DataConverter, ExternalStorage |
| 24 | + |
| 25 | +session = aioboto3.Session() |
| 26 | +# Credentials and region are resolved automatically from the standard AWS credential |
| 27 | +# chain e.g. environment variables, ~/.aws/config, IAM instance profile, and so on. |
| 28 | +async with session.client("s3") as s3_client: |
| 29 | + driver = S3StorageDriver( |
| 30 | + client=new_aioboto3_client(s3_client), |
| 31 | + bucket="my-temporal-payloads", |
| 32 | + ) |
| 33 | + |
| 34 | + client = await Client.connect( |
| 35 | + "localhost:7233", |
| 36 | + data_converter=dataclasses.replace( |
| 37 | + DataConverter.default, |
| 38 | + external_storage=ExternalStorage(drivers=[driver]), |
| 39 | + ), |
| 40 | + ) |
| 41 | +``` |
| 42 | + |
| 43 | +### Custom S3 client implementations |
| 44 | + |
| 45 | +To use a different S3 library, subclass `S3StorageDriverClient` and implement `put_object`, `get_object`, and `object_exists`. The ABC has no external dependencies, so no AWS packages are required to import it. |
| 46 | + |
| 47 | +```python |
| 48 | +from temporalio.contrib.aws.s3driver import S3StorageDriverClient |
| 49 | + |
| 50 | +class MyS3Client(S3StorageDriverClient): |
| 51 | + async def put_object(self, *, bucket: str, key: str, data: bytes) -> None: ... |
| 52 | + async def object_exists(self, *, bucket: str, key: str) -> bool: ... |
| 53 | + async def get_object(self, *, bucket: str, key: str) -> bytes: ... |
| 54 | + |
| 55 | +driver = S3StorageDriver(client=MyS3Client(), bucket="my-temporal-payloads") |
| 56 | +``` |
| 57 | + |
| 58 | +### Key structure |
| 59 | + |
| 60 | +Payloads are stored under content-addressable keys derived from a SHA-256 hash of the serialized payload bytes, segmented by namespace and workflow/activity identifiers when serialization context is available, e.g.: |
| 61 | + |
| 62 | + v0/ns/my-namespace/wfi/my-workflow-id/d/sha256/<hash> |
| 63 | + |
| 64 | +### Notes |
| 65 | + |
| 66 | +* Any driver used to store payloads must also be configured on the component that retrieves them. If the client stores workflow inputs using this driver, the worker must include it in its `ExternalStorage.drivers` list to retrieve them. |
| 67 | +* The target S3 bucket must already exist; the driver will not create it. |
| 68 | +* Identical serialized bytes within the same namespace and workflow (or activity) share the same S3 object — the key is content-addressable within that scope. The same bytes used across different workflows or namespaces produce distinct S3 objects because the key includes the namespace and workflow/activity identifiers. |
| 69 | +* Only payloads at or above `ExternalStorage.payload_size_threshold` (default: 256 KiB) are offloaded; smaller payloads are stored inline. Set `ExternalStorage.payload_size_threshold` to `None` to offload every payload regardless of size. |
| 70 | +* `S3StorageDriver.max_payload_size` (default: 50 MiB) sets a hard upper limit on the serialized size of any single payload. A `ValueError` is raised at store time if a payload exceeds this limit. Increase it if your workflows produce payloads larger than 50 MiB. |
| 71 | +* Override `S3StorageDriver.driver_name` only when registering multiple `S3StorageDriver` instances with distinct configurations under the same `ExternalStorage.drivers` list. |
| 72 | + |
| 73 | +### Dynamic Bucket Selection |
| 74 | + |
| 75 | +To select the S3 bucket per payload, pass a callable as `bucket`: |
| 76 | + |
| 77 | +```python |
| 78 | +from temporalio.contrib.aws.s3driver import S3StorageDriver |
| 79 | +from temporalio.contrib.aws.s3driver.aioboto3 import new_aioboto3_client |
| 80 | + |
| 81 | +driver = S3StorageDriver( |
| 82 | + client=new_aioboto3_client(s3_client), |
| 83 | + bucket=lambda context, payload: ( |
| 84 | + "large-payloads" if payload.ByteSize() > 10 * 1024 * 1024 else "small-payloads" |
| 85 | + ), |
| 86 | +) |
| 87 | +``` |
| 88 | + |
| 89 | +### Required IAM permissions |
| 90 | + |
| 91 | +The AWS credentials used by your S3 client must have the following S3 permissions on the target bucket and its objects: |
| 92 | + |
| 93 | +```json |
| 94 | +{ |
| 95 | + "Effect": "Allow", |
| 96 | + "Action": [ |
| 97 | + "s3:PutObject", |
| 98 | + "s3:GetObject" |
| 99 | + ], |
| 100 | + "Resource": "arn:aws:s3:::my-temporal-payloads/*" |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +`s3:PutObject` is required by components that store payloads (typically the Temporal client and worker sending workflow/activity inputs), and `s3:GetObject` is required by components that retrieve them (typically workers and clients reading results). Components that only retrieve payloads do not need `s3:PutObject`, and vice versa. |
0 commit comments