feat(unreal): Dedicated worker thread, bounded queues, zero-copy diff pipeline, native table listeners#4951
Open
brougkr wants to merge 2 commits intoclockworklabs:masterfrom
Conversation
Replace the Unreal SDK inbound message path with a connection-owned worker thread, bounded raw and parsed queues, connection epoch guards, and deterministic lifecycle cleanup. This removes the previous fire-and-forget thread-pool preprocessing and reorder buffer, which could grow without backpressure and could allow stale async work to outlive a connection boundary. Move table preprocessing to message-scoped data, add zero-copy BSATN row parsing through DeserializeView, move-enabled FWithBsatn rows, and rvalue forwarding from remote tables into client caches. Cache apply now reuses shared row storage for diffs, validates structural invariants with checkf, and avoids spurious insert diffs on refcount bumps. Add compact uint64 primary-key apply support for generated FrameKey and BatchKey rows, native table listener bindings for reflection-free hot-path dispatch, multi-diff table broadcast ordering, and profiler scopes for inbound enqueue, worker preprocess, game-thread apply, cache apply, and broadcasts. Include the WebSocket native binary-message target and BSATN helper APIs required by the worker and zero-copy preprocessing path. The original plan listed only six files, but audit found those omitted files were compile dependencies for the copied implementation, so this commit keeps the PR source-complete rather than preserving a stale file count. Validation: ran git diff --check successfully; ran the required ./Scripts/full_rebuild.sh from /Users/brougkr/Documents/Unreal Projects/FACTIONS successfully, including SpacetimeDB publish, binding regeneration, and Unreal C++ build.
Remove FACTIONS-specific table policy from the generic Unreal SDK cache path. Direct native diffs now require explicit ETableCacheApplyMode::DirectNativeDiff, runtime B-tree index application is controlled through explicit cache policy hooks, and compact cache keys are driven by generated TCompactPrimaryKeyTraits specializations for schema-declared uint64 primary keys instead of FrameKey/BatchKey field-name inference. Make native listener dispatch lifetime-safe and explicit. Native listeners now store weak UObject owners plus an unregister identity key, expired owners are logged and removed before dispatch continues, and the default dispatch path invokes native listeners before dynamic delegates unless a registration explicitly opts into NativeOnly suppression for hot tables. Add decoded and parsed-memory backpressure. Gzip payloads whose declared decoded size exceeds the inbound cap are rejected before allocation, parsed queue accounting now uses estimated decoded/preprocessed memory instead of only raw payload bytes, parsed queue overload reports a protocol error, and raw inbound queue draining uses a read index with periodic compaction instead of front-removing every drain. Update public diff documentation to describe the lower-copy TSharedPtr-backed C++ diff representation and clarify that generated dynamic delegates remain value-reference based. This documents the direct FTableAppliedDiff source-shape change rather than claiming a purely additive or zero-copy API.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While developing an MMO-scale Unreal client with 1000+ players and 3000+ AI mobs in one area, I ran into inbound SDK bottlenecks that had to be removed to keep a 120 FPS frame budget. This PR replaces the Unreal SDK's inbound message pipeline with a dedicated worker thread, bounded queues with backpressure, a zero-copy diff apply path, and optional native table listeners.
No breaking API changes are intended. Existing
OnInsert,OnDelete, andOnUpdatedynamic multicast delegates continue to work unchanged for tables that do not opt into native listeners.This PR changes 11 files. The original local implementation plan listed 6 copied files, but audit found compile-time support dependencies in
Websocket.*,UEBSATNHelpers.h,UESpacetimeDB.h, andWithBsatn.h. Those support files are included here so the branch is source-complete rather than preserving a stale file count.Changes
1. Dedicated Inbound Worker Thread
Current: Each WebSocket message dispatches via
Async(EAsyncExecution::Thread), a fire-and-forget task on UE's global thread pool. The lambda captures the fullTArray<uint8>payload by value. ATMap<int32, FServerMessageType>reorder buffer sequences results back in order.Rationale: At high message rates, thread pool slots can saturate and compete with rendering, physics, audio, and other engine tasks. The reorder buffer can grow unbounded under head-of-line blocking.
TWeakObjectPtrlifetime checks verify object liveness, not connection identity, so stale async tasks from a destroyed connection can still arrive against a later session.Improvement: Adds a single connection-owned
FRunnablethread (FSpacetimeDbInboundWorker) withFEventsignaling. Complete binary messages are moved into a bounded raw queue. The worker drains FIFO, so no reorder buffer is needed.StopAndJoin()deterministically joins in-flight work during disconnect/error/teardown. Idle cost is zero-spin because the worker blocks onFEvent::Wait.2. Bounded Queues with Backpressure
Current:
PendingMessagesis an unboundedTArray. If the server produces messages faster than the game thread consumes them, the array grows without limit until memory pressure or OOM. There is no diagnostic at the failure boundary.Rationale: Sustained server spikes or client hitches should fail with actionable protocol diagnostics, not silent memory growth.
Improvement: Adds hard limits at both queue stages: raw queue
8192messages /128 MB, parsed queue8192messages /128 MB. Overflow queues a fatal protocol error with sequence ID, payload size, compression tag, queue depth, and queued bytes. Worker-side backpressure checks parsed-queue capacity before pulling additional raw messages.3. Configurable Per-Frame Apply Budget
Current:
FrameTick()moves all pending messages to a local array and processes all of them. If a hitch accumulates hundreds of messages, the next frame processes the entire backlog and can cascade the stall.Rationale: Games need explicit control over how much network work is applied per frame.
Improvement: Adds
FSpacetimeDBInboundApplyBudgetwithMaxMessagesPerFrame,MaxPayloadBytesPerFrame,MinMessagesPerFrame,SoftTimeBudgetMicros, andbDrainAllPendingMessages. An incrementalPendingMessageReadIndexavoids moving the entire pending array each frame. Compaction after 512 consumed messages prevents unbounded consumed-prefix growth.MakeDrainAllPendingMessages()provides a named drain-all configuration.4. Connection Epoch Guards
Current: There is no session identity on queued or async inbound work. After disconnect/reconnect, stale work from the prior connection can potentially apply to the new connection's cache.
Rationale: A reconnect without epoch guards can corrupt client cache state with rows from a previous session.
Improvement: Adds
InboundConnectionEpochand increments it on every worker start/stop. Raw and parsed messages carry the epoch.IsInboundEpochCurrentAndAccepting()rejects stale-epoch work at queue and worker boundaries.5. Full Lifecycle Cleanup
Current:
Disconnect()only disconnects the WebSocket. Error/close paths clear pending operations but not all inbound parsed/raw/preprocessed state.Rationale: Queued and preprocessed data must not survive connection boundaries.
Improvement:
Disconnect,HandleWSError,HandleWSClosed,HandleProtocolViolation, andBeginDestroystop/join the inbound worker and clear raw queues, parsed queues, indices, byte counters, protocol-error state, and active message-scoped preprocessing.6. Message-Scoped Preprocessed Data
Current: Preprocessed table data is stored in a persistent
TMap<FPreprocessedTableKey, ...>protected byPreprocessedDataMutex. Missing data falls back to re-parsing raw BSATN on the game thread.Rationale: The fallback double-deserializes bytes that were already parsed on the worker, can hide preprocessing bugs, and keeps a persistent map that can leak state across messages.
Improvement:
ActivePreprocessedTableDatapoints only to the currentFInboundParsedMessageduring apply. Missing preprocessing is acheckf, not a silent fallback. This removes the persistent preprocessing map and per-table mutex acquisition during game-thread apply.7. Zero-Copy Diff Data Structures
Current:
Every diff entry copies BSATN key bytes and copies the row by value.
TMapiteration is hash-table order.Rationale: For high-cardinality row diffs, key copies, row copies, and hash-map iteration add avoidable memory traffic and cache misses.
Improvement:
Diffs share row pointers from the table cache. This eliminates key copies from the broadcast diff and makes diff iteration contiguous.
DeriveUpdatesByPrimaryKeynow uses index-based matching with bit arrays and a single compaction pass instead of per-keyTMap::Remove()rehashing.8. Zero-Copy Cache Apply - Generic Path
Current:
ApplyDifftakes const copied containers. Insert rows are copied intoMakeShared<RowType>(Row). Index updates create new shared pointers for delete/index work rather than reusing the cached row pointer.Rationale: At thousands of rows per diff, redundant
MakeShared, row copies, key copies, and index pointer churn become a hot-path cost.Improvement:
ApplyDiffnow takesTArray<FWithBsatn<RowType>>&&and moves insert rows into cache storage. Index update lambdas reuse the existingTSharedPtrfrom cache. Diff containers are pre-reserved. Refcount and row validity are enforced withcheckfat cache boundaries.9. Compact Primary Key Cache Apply
Current: Tables with simple integer primary keys still pay full BSATN key storage and hashing costs.
Rationale: Rows keyed by generated
uint64fields can use an 8-byte native key instead of serializing and hashing the full BSATN row bytes.Improvement: Adds
TCompactPrimaryKeyTraits<RowType>with SFINAE detection for generatedFrameKeyandBatchKeyuint64fields. Eligible rows use compact 8-byte cache keys, inline update classification, andbPrimaryKeyUpdatesClassified = true. Ineligible rows use the generic path unchanged. The trait is intentionally extensible if maintainers prefer user-defined specializations later.10. Move-Semantic Forwarding at RemoteTable -> ClientCache
Current:
RemoteTable::BaseUpdatecopiesFWithBsatnarrays into new containers before forwarding toClientCache::ApplyDiff.Rationale: The worker has already materialized row+BSATN arrays. Rebuilding them at the table/cache boundary is unnecessary.
Improvement:
RemoteTable::BaseUpdatenow takes mutable arrays, forwards them withMoveTemp, and usesif constexprto route compact-primary-key rows toApplyDiffByPrimaryKey.11. Multi-Diff Table Update Handler
Current: Each table handler stores a single
FTableAppliedDiff<RowType> LastDiff. If one server message contains multiple updates for the same table, only the last diff survives for broadcast.Rationale: Multi-update-per-table messages must preserve ordered broadcasts.
Improvement: Adds
TArray<FTableAppliedDiff<RowType>> PendingDiffswithPendingDiffReadIndex. Multiple diffs are queued and broadcast in order.UpdateCachereturnsboolso empty diffs can skip broadcast work.12. Native Table Listeners
Current: All table broadcasts go through UE dynamic multicast delegates (
OnInsert,OnDelete,OnUpdate), which use reflection dispatch.Rationale: Reflection dispatch is expensive on high-cardinality table diffs. At 3000 entities with insert/update/delete-style callbacks, dynamic dispatch can consume a meaningful part of an 8.33 ms frame budget.
Improvement: Adds
FNativeTableListenerBindingand templated registration helpers for direct member-function dispatch. Existing dynamic multicast delegates remain supported. If native listeners are registered for a table, the table can bypassProcessEventfor that hot path.13. Broadcast Validation
Current: Diff rows are broadcast without structural validity checks. Mismatched update arrays are silently truncated with
FMath::Min.Rationale: Silent truncation can hide data corruption. A malformed diff should fail at the source.
Improvement: Adds
checkf(Row.IsValid())for every insert/delete/update row andcheckf(Diff.UpdateDeletes.Num() == Diff.UpdateInserts.Num())before update broadcast.14. Pre-Classified Update Diff Guard
Current:
DeriveUpdatesByPrimaryKeyalways runs the primary-key matching scan.Rationale: Compact-primary-key apply already classifies updates inline. Running the generic derivation again duplicates work.
Improvement:
ApplyDiffByPrimaryKeysetsbPrimaryKeyUpdatesClassified = true.DeriveUpdatesByPrimaryKeyearly-returns after checking update array pairing.15.
DeriveUpdatesByPrimaryKeyEarly-Out FixCurrent: The function only early-outs when deletes are empty.
Improvement: It now returns when deletes or inserts are empty. If there are inserts but no deletes, updates are impossible, so the map build and scan are skipped.
16. Lock-Free Registered Table Snapshot Reads
Current: Applying registered table updates locks
RegisteredTablesMutexper table lookup and allocates a local handler array per call.Improvement: Registration rebuilds
RegisteredTablesSnapshot, and apply reads that snapshot for lookup.TableUpdateHandlersScratchis reused to avoid per-frame heap churn.17. Ticking Mechanism Change
Current: Auto ticking uses
FTSTicker, outside the standard tickable-object visibility path.Change:
UDbConnectionBasenow implementsFTickableGameObjectwithTick,GetStatId,IsTickable, andIsTickableInEditor.Note: This is a trade-off.
FTSTickeravoids tickable object count inflation with multiple connections.FTickableGameObjectgives standard Unreal tick/profiler visibility. The default can be adjusted based on maintainer preference.18. CPU Profiler Instrumentation
Current: The inbound SDK path has little profiler visibility in Unreal Insights.
Improvement: Adds
TRACE_CPUPROFILER_EVENT_SCOPEmarkers for inbound enqueue, worker drain/preprocess, game-thread apply, table cache apply, and table broadcast. Per-message/table timing is used for soft-budget diagnostics.19. Named Budget Factory
Improvement: Adds
FSpacetimeDBInboundApplyBudget::MakeDrainAllPendingMessages()so callers can opt into old drain-all behavior explicitly.20. Spurious Insert Diff on Refcount Bumps
Current on recent upstream master:
ClientCache::ApplyDiffwas already fixed to avoid emitting insert diffs for subscription refcount bumps.This PR: Preserves that behavior in the rewritten generic and compact-primary-key paths. Existing entries with positive refcount increment the refcount and skip
Diff.Inserts.Support Files Included by Audit
The original implementation plan listed only the largest six files. During implementation audit, those files did not compile against current upstream without these source dependencies:
Connection/Websocket.handConnection/Websocket.cpp: native binary-message target and move dispatch intoUDbConnectionBase::HandleWSBinaryMessageOwned.BSATN/UEBSATNHelpers.h: message-scoped row preprocessing, query-row apply mode, row counts/byte counts, and zero-copy row-list parsing helpers.BSATN/UESpacetimeDB.h:DeserializeViewfor pointer/view based deserialization without creating temporary byte arrays.DBCache/WithBsatn.h: move constructor for row+BSATN ownership transfer.Performance Estimate
Conservative combined hot-path savings at full load (1000 players + 3000 mobs):
At 120 FPS (8.33 ms/frame), this represents roughly 36-60% of the frame budget in the intended high-load case. These are estimates from the motivating workload; measured payload sizes and profiler captures should supersede them where available.
Files Changed
sdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Private/Connection/DbConnectionBase.cppsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Private/Connection/DbConnectionBuilder.cppsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Private/Connection/Websocket.cppsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/BSATN/UEBSATNHelpers.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/BSATN/UESpacetimeDB.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/Connection/DbConnectionBase.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/Connection/Websocket.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/DBCache/ClientCache.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/DBCache/TableAppliedDiff.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/DBCache/WithBsatn.hsdks/unreal/src/SpacetimeDbSdk/Source/SpacetimeDbSdk/Public/Tables/RemoteTable.hValidation
git diff --check/Users/brougkr/Documents/Unreal Projects/FACTIONS/Scripts/full_rebuild.shOnInsert,OnDelete,OnUpdate) fire correctly in upstream maintainers' test projectsSpacetimeDB_*profiler scopesNotes
FrameKeyandBatchKeyuint64fields. The infrastructure can be extended to user-defined specializations if maintainers prefer a more general primary-key trait surface.