feat: S3 object storage offloading for V3 bucket data by Sleepful · Pull Request #673 · powersync-ja/powersync-service

Sleepful · 2026-06-11T09:59:29Z

Summary

Offload BucketDataDocumentV3.ops[] arrays to object storage (S3), keeping only a metadata shell in MongoDB. The service reads S3 objects and streams ops to clients using the existing wire protocol — no protocol changes. Object storage is optional at configuration level; when not configured, ops remain inline in MongoDB as today.

Changes

ObjectStorage interface — put/get/delete contract, decouples storage backend from core logic
S3ObjectStorage — production implementation using @aws-sdk/client-s3 (static import)
MemoryObjectStorage — in-memory test double (no Docker/S3 needed in CI)
Config — optional object_storage: section in MongoStorageConfig with type s3
Write path — flushBucketData() uploads zstd-compressed BSON ops to S3, inserts metadata shell in MongoDB
Read path — getBucketDataBatchImpl() parallel pre-fetches S3 objects, patches doc.ops before existing decode loop
Compaction — compactSingleBucket() and clearBucketLeading() are S3-aware (read/write/cleanup old refs)
Injection — objectStorage? threaded through the full chain, all optional fields, zero breaking changes

Design Decisions

No inline threshold — all documents offload to S3 when configured. A general threshold is deferred to a follow-up.
S3 path format — bucket-data/<group>/<def>/<bucket>/<minOp>-<maxOp>
Zstd whole-object compression — entire BSON ops array compressed as one blob
Batch sizing — compressed_size * 3 heuristic keeps batch memory bounded for S3-backed docs

Manual Verification [TODO]

S3ObjectStorage is not exercised in CI. To manually validate before shipping:

Start MinIO: minio server /tmp/minio-data --console-address :9001
Configure object_storage with endpoint: http://localhost:9000 and S3 type
Write/read/compact via client API, verify with mc ls

forcePathStyle: true is already set when endpoint is present (required by MinIO).

Add failing tests in storage_s3_writing.test.ts that exercise the MemoryObjectStorage helper and confirm the S3 write path guard condition works. Thread the objectStorage option through the storage stack (MongoBucketStorage, MongoSyncBucketStorage, MongoBucketBatch, PersistedBatch) so it is available for future implementation. Model changes: make ops optional in BucketDataDocumentV3 to support storage_ref-only documents. Add StorageRef type and loadBucketDataDocument guard for empty ops. Add S3ObjectStorage config type and object_storage config field. Add @aws-sdk/client-s3 and @mongodb-js/zstd dependencies. Update existing compacting tests to use non-null assertions on ops since it is now optional.

Implement S3 offloading in PersistedBatchV3: BSON-serialize, zstd-compress and upload bucket data chunks to objectStorage. Insert metadata shells with storage_ref in MongoDB instead of inline ops. Update Phase 2b test assertions with non-null accessors now that the write path works. Add storage_s3_reading.test.ts with 3 failing tests for the S3 read path: round-trip write/read, missing S3 object handling, and mixed inline+S3 batch reads. All 3 must fail until the read path fetches from S3.

…test Pre-fetch and decompress S3 objects for storage_ref docs during getBucketDataBatch so ops from S3-backed documents are included in bucket data responses and size tracking. Add red test for S3-aware compaction (Phase 2d): verifies that compacted_state is populated correctly, S3 objects are cleaned, MongoDB docs are replaced, and read path survives compaction. This test fails because compactSingleBucket does not yet fetch ops from S3-backed storage_ref documents.

Compaction now pre-fetches S3-backed ops before decode, uploads new S3 objects after rechunking, and cleans up old storage_refs after transaction commit. Batch size calculation accounts for storage_ref.compressed_size. S3ObjectStorage implements the ObjectStorage interface using @aws-sdk/client-s3, wired through MongoStorageProvider when config specifies object_storage.type: s3.

- Align S3 path format: write and compact both use maxOp (_id.o) suffix (minOp-maxOp-maxOp), not minOp - Scale compaction batch size by compressed_size * 3 for S3-backed docs, matching the read path multiplier - clearBucketLeading(): upload CLEAR doc and boundary survivors to S3 when objectStorage is configured, with old ref cleanup after the transaction - Fix compaction test: allow S3 path reuse when op ranges don't change after dedup

- Remove dead `compression` field from StorageRef interface and all sites - Add comments explaining compressed_size * 3 heuristic for byte tracking - Simplify S3 paths from ${minOp}-${maxOp}-${maxOp} to ${minOp}-${maxOp} - Invert objectStorage guards: inline path first, S3 as else branch - loadBucketDataDocument() now throws on undefined ops (empty arrays still ok) - Set doc.ops = [] in S3 fetch error catch blocks for graceful skip

changeset-bot · 2026-06-11T09:59:49Z

⚠️ No Changeset found

Latest commit: e17627c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

rkistner

This looks quite promising, and I like the structure.

Some initial high-level comments:

Currently there are various places in the code doing the same compression/decompression and serialize/deserialize logic. Should we perhaps do this in a wrapper class for ObjectStorage? E.g. a BucketDataObjectStorage that wraps ObjectStorage and does that logic?
NodeJS now has built-in zstd support. But I haven't checked how the APIs and performance compares with @mongodb/zstd. Since we're already using @mongodb/zstd implicitly, that should be fine.
We do need a threshold for inlining ops directly in mongodb storage, before we can merge & release this: S3 has too much overhead for storing say individual 100-byte operations.

rkistner · 2026-06-11T10:17:07Z

      await session.endSession();
    }

+    // After commit: delete old S3 objects (best-effort)


Not a blocker for initial testing, but it could be problematic if we leave orphaned documents in the bucket indefinitely (either from the delete request failing, or from say a process crash/restart between the commit and the delete).

Is there some way we can ensure these are cleaned up eventually? Maybe persisting a "delete queue" in mongodb, or running a periodic cleanup job (maybe part of the compact job)?

rkistner · 2026-06-11T10:17:48Z

+      // Track sizes: for S3 docs multiply compressed_size by 3 as a rough
+      // decompressed estimate to keep chunk byte tracking bounded. Without a
+      // multiplier, metadata shells (~200 bytes) would let thousands of
+      // S3-backed docs pack into a single chunk before splitting.


We already have the size on the mongodb document - could we use that instead of the estimate?

rkistner · 2026-06-11T10:22:56Z

+                this.logger.warn(`Failed to fetch/decompress S3 object ${doc.storage_ref?.path}: ${err}`);
+                doc.ops = [];


This should be a hard error - setting doc.ops = [] may result in data inconsistencies.

Sleepful added 6 commits June 9, 2026 18:49

rkistner reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: S3 object storage offloading for V3 bucket data#673

feat: S3 object storage offloading for V3 bucket data#673
Sleepful wants to merge 6 commits into
compressed-bucket-storagefrom
s3-offloading

Sleepful commented Jun 11, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 11, 2026

Uh oh!

rkistner left a comment

Uh oh!

rkistner Jun 11, 2026

Uh oh!

rkistner Jun 11, 2026

Uh oh!

rkistner Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		this.logger.warn(`Failed to fetch/decompress S3 object ${doc.storage_ref?.path}: ${err}`);
		doc.ops = [];

Conversation

Sleepful commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design Decisions

Manual Verification [TODO]

Uh oh!

changeset-bot Bot commented Jun 11, 2026

⚠️ No Changeset found

Uh oh!

rkistner left a comment

Choose a reason for hiding this comment

Uh oh!

rkistner Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

rkistner Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

rkistner Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sleepful commented Jun 11, 2026 •

edited

Loading