Process-global Verification Table with coordinated write-lock by kriszyp · Pull Request #526 · HarperFast/rocksdb-js

kriszyp · 2026-04-25T23:02:27Z

Summary

Adds a process-wide, lock-free Verification Table (VT) to the rocksdb-js native binding. The VT is a fixed-size array of std::atomic<uint64_t> slots addressed by hash(dbPtr, cfId, key). It solves two problems in Harper:

Stale cache detection ? per-thread JS caches can silently serve outdated records after another thread commits. verifyVersion / populateVersion give callers a fast O(1) way to confirm a cached record is still fresh without hitting RocksDB.
Blind optimistic retry ? IsBusy conflicts previously triggered quadratic-backoff retries with no coordination. coordinatedRetry: true parks the retrying JS promise on the VT slot of the conflicting key; it wakes automatically (via TSFN) the moment the blocker commits, eliminating unnecessary backoff.

Changes by phase

Phase 1 ? VerificationTable class: lock-free slot array, slotFor, verifyVersion, populateVersion, extractVersionFromValue, POPULATE_VERSION_FLAG / FRESH_VERSION_FLAG fast-path in getSync, RocksDatabase.config({ verificationTableEntries }), JS-side verifyVersion / populateVersion on NativeDatabase.

Phase 2 ? LockTracker struct: installed in the VT slot at putSync/removeSync time (not deferred to commit). Per-CF opt-in via verificationTable: true on the DB open options. releaseIntent() CASes the slot back to 0 after commit (success or IsBusy). store.ts now forwards verificationTable through to NativeDatabase.open().

Phase 3 ? coordinatedRetry: true on TransactionOptions. When IsBusy fires, the native complete callback checks savedSlots for an active lock and parks a TSFN Waiter on the tracker. LockTracker::wake() fires all waiters after releaseIntent(), resolving the commit promise with RETRY_NOW_VALUE instead of rejecting. database.ts loops immediately on RETRY_NOW.

Phase 4 ? LockTracker.dbPtr field + VerificationTable::cancelForDB(dbPtr): a full-table scan called from DBDescriptor::close() after all TransactionHandle closables have been closed. Defensive safety net ensuring no TSFN waiter can park forever if DB close races a pending commit. New stress-test/vt-stress.stress.test.ts with an 8-slot VT forces hash collisions on every write, covering ABA, concurrent contention, coordinatedRetry under collisions, and the cancelForDB lifecycle.

Initial Prompt

I would like to implement a cache verification and write lock/tracking mechanism. The goal is to improve read performance and reduce contention induced transaction retries. In our Harper application, we have used an LRU cache (https://www.npmjs.com/package/weak-lru-cache) to store deserialized objects from the RocksDB database. We would like to be able to access that LRU cache, and when there is a hit, verify that the entry in the cache is still the most recent in the database. We have many worker/threads, and the cache is specific to each one (since it is in JS isolated), so the JS doesn't know if there is a change in the database from another thread. So (after a cache hit in the JS LRU cache) I would to be able to call get/getSync, providing a version number (the version numbers are recorded in the JS LRU cache), and if that version number is verified by rocksdb-js to be fresh, than the get can return a flag/indicator indicating that the caller's cached JS object is indeed still fresh and can be used. If it is not fresh, than the normal process of retrieving the binary data and returning it is followed. Also in our Harper application, we use a lock-free transaction mechanism, where we use RocksDB's optimistic transaction, reading from a snapshot and writing to the transaction, also recording a list of writes. And if there is contention (another thread wrote to a record in the transaction), the optimistic transaction fails (IsBusy error), and we then replay/retry the transaction, re-executing all the writes into a new RocksDB transaction. However, with highly contentious writes, this could be problematic; just naively retrying the transaction could continue to lead to continued conflicts and repeated retries.

Proposed approach

I would like to propose using a fixed array, with hash-keyed access to values that represent the known fresh version or a lock indicator/reference. Using a fixed hashed-key array (a probalistic bloomier filter?), should give us a lock-free way to quickly check the array for the freshness of an entry, or assign the latest version number on a cache miss. I believe there should be a single cache array for the whole system, across all databases. It can be allocated at the same time as the block cache that is shared by all databases. Once created, it is fixed and access is fast. We can default to a size of 1MB. I would propose that the indexes are a hash of the database (name or pointer), column family name, and record key. I think this should give good distribution. Of course this is probabilistic, there can certainly be hash collisions, but that should just result in false positives on cache hits, and safely revert to the slow path of retrieval. Each entry should be a 64-bit word/number. When there is cache miss, the entry can be updated with the latest version number from the retrieved object, if there is no active writes for this entry. The version number is always the first (64-bit) word of the record and can be retrieved directly from the beginning of the record. By ensuring no actives writes (no open transactions that have a pending write to a record key that hashes to this index), we should be able to preserve the invariant that there is one single active version for any record whose key hashes to this index/entry (no other versions of a record in any pending transaction). Note that the version numbers are big-endian timestamps, so there is some predicatibility to their basic format (always positive, usually starts with 66 in this era, which could be useful for distinguishing from a write indicator that could use a flag/bit that won't match a positive version number). When a record is written in a transaction, and there are no other active writes, we need to update the cache entry to an indicator that the record is now being written (perhaps a a pointer to a lock tracking structure is recorded within the cache entry), and caching a version number is no longer permitted until all writes that hash to this entry are committed. There can be multiple writes to the same record and multiple writes to records that hash to this entry, which needs to be properly tracked. In addition, I believe this provides a means for being able to more accurately notify of when a transaction can be safely retried. So on the first attempt at a transaction, all writes will try to acquire a "lock" on for each record (atomically updating the cache entry to a write status, if there is no existing write status), and we will track these writes for follow-up work. If the transaction commits succesfully, then we do not need to retry. If the transaction fails, that's because there was contention. Rather than immediately calling the transaction callback/error handler with an IsBusy error, we should wait until we can acquire locks on all the entries that need to be written, and then call the callback handler once all the locks are in place. We should then be able to retry the transaction safely, replaying all the writes again, with pre-existing locks. We should probably use a separate signal (in the commit callback) than an IsBusy error, more explicitly indicating that a retry should now proceed. When a transaction commits, naturally we will need to remove the locks and follow-up with the work of finding any transactions that are waiting for their turn to acquire the locks, acquire those locks and notify any such transactions that are waiting to retry, that they can retry now. Presumably we want to unlock and write locks when the transaction is in conflict and waiting until it can acquire all the locks at one (synchronized) to avoid multiple part sets of locks creating deadlock potential. Feel free to suggest different ways of handling this if there are better approaches.

Test coverage

12 new unit tests in test/lock-tracker.test.ts (Phases 2 & 3)
4 new stress tests in stress-test/vt-stress.stress.test.ts (Phase 4)
All 438 existing tests continue to pass

Adds a lock-free std::atomic<uint64_t>[] Verification Table (VT) to the native binding so JS threads can cheaply verify record-cache freshness without touching RocksDB. - VerificationTable: fixed-size slot array keyed by hash(db_ptr, cf_id, key); slot holds 0 (empty) or a float64 version bit-pattern (sign bit 0, leaving bit 63 free for Phase 2 lock tags) - Database::VerifyVersion / PopulateVersion: new native methods exposed as db.verifyVersion(key, version) and db.populateVersion(key, version) on the JS NativeDatabase type - GetSync fast-path: optional 4th arg expectedVersion; returns FRESH_VERSION_FLAG sentinel on slot match; POPULATE_VERSION_FLAG on flags seeds the slot from the first 8 bytes of the read value - DBSettings: lazy VT materialization with random seed; config() accepts verificationTableEntries (frozen after first materialize) - TypeScript: verifyVersion/populateVersion on Store and RocksDatabase; FRESH_VERSION_FLAG and POPULATE_VERSION_FLAG exported from constants - 12 new tests in test/verification-table.test.ts; all 431 tests pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n commit Stamps VT slots as "write in flight" during optimistic transaction commit so concurrent readers see vtIsLock() and fall through to RocksDB instead of serving stale cached versions. Also fixes a pre-existing race: the commit complete callback was unconditionally resetting state→Pending after IsBusy, but close() may have already set state→Aborted and nulled txn. Guard the reset to only apply when state is still Committing, preventing a null-txn Rollback crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pt-in Moves VT lock installation from commit time (write-batch iteration) to putSync/removeSync time so the slot is invalidated the moment a key enters the transaction's write buffer, closing the window where a cached read could observe a stale version after a write but before commit. Adds `verificationTable: true` per-DB open option (NativeDatabaseOptions → DBHandle::enableVerificationTable) so only opted-in column families participate, keeping secondary-index CFs out of the VT. Fixes a pre-existing race in the async commit complete callback where `state = Pending` was set unconditionally after IsBusy, overwriting an Aborted state set by close(), leading to Rollback() on a null txn. Guards the reset: only overwrite Committing → Pending. All 435 existing tests pass; 4 new Phase 2 lock-tracker tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-25T23:05:14Z

📊 Benchmark Results

get-sync.bench.ts

getSync() > random keys - small key size (100 records)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	23.51K ops/sec	42.53	41.07	632.218	0.113	117,565
🥈 rocksdb	2	12.53K ops/sec	79.81	77.59	22,583.183	0.895	62,650

getSync() > sequential keys - small key size (100 records)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	27.55K ops/sec	36.30	35.29	712.302	0.104	137,730
🥈 rocksdb	2	12.28K ops/sec	81.43	79.32	497.507	0.048	61,405

ranges.bench.ts

getRange() > small range (100 records, 50 range)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	26.23K ops/sec	38.12	36.49	751.12	0.152	131,174
🥈 rocksdb	2	3.67K ops/sec	272.195	238.636	2,517.548	0.541	18,370

realistic-load.bench.ts

Realistic write load with workers > write variable records with transaction log

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	187.54 ops/sec	5,332.184	64.50	129,571.255	35.21	384
🥈 lmdb	2	26.74 ops/sec	37,402.043	48.03	1,186,200.434	136.683	64.00

transaction-log.bench.ts

Transaction log > read 100 iterators while write log with 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	35.16K ops/sec	28.44	13.54	14,194.142	0.594	175,783
🥈 lmdb	2	445.07 ops/sec	2,246.849	139.106	13,711.637	1.33	2,226

Transaction log > read one entry from random position from log with 1000 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	687.67K ops/sec	1.45	1.26	3,879.424	0.167	3,438,370
🥈 lmdb	2	456.01K ops/sec	2.19	1.16	8,293.407	0.518	2,280,055

worker-put-sync.bench.ts

putSync() > random keys - small key size (100 records, 10 workers)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	854.22 ops/sec	1,170.658	1,018.065	1,853.01	0.352	1,709
🥈 lmdb	2	1.15 ops/sec	872,425.5	775,558.689	1,013,399.678	5.59	10.00

worker-transaction-log.bench.ts

Transaction log with workers > write log with 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	17.97K ops/sec	55.65	30.26	542.543	0.494	35,939
🥈 lmdb	2	825.91 ops/sec	1,210.782	89.63	18,079.573	5.80	1,652

Results from commit 70f4ddd

…OW signal When coordinatedRetry: true on TransactionOptions, an IsBusy conflict at commit time is resolved (not rejected) with RETRY_NOW_VALUE instead of propagating a TransactionIsBusyError. The native layer checks whether any VT slot our transaction locked is now held by a concurrent transaction; if so it parks a TSFN-based Waiter on that tracker and fires resolve(RETRY_NOW) only after the conflicting write-intent releases, eliminating blind backoff. Key mechanics: - LockTracker gains woken flag + mutex-protected wakeCallbacks vector with addWakeCallback() / wake() methods - releaseIntent() calls t->wake() after CAS-zeroing the slot, notifying any parked waiters before decrementing refcount - Execute lambda saves lockedVTSlots to CommitState.savedSlots before releaseIntent() clears them on IsBusy - Complete callback's IsBusy+coordinatedRetry path: iterates savedSlots, finds active vtIsLock trackers, creates a one-shot TSFN (retryNowCallJs / retryNowFinalize) and registers it as a wake callback; if already woken (tracker released between execute and complete), fires TSFN immediately - If no active lock found, resolve(RETRY_NOW) is called directly on the JS thread from the complete callback (no TSFN overhead) - database.ts transaction() loop: on RETRY_NOW return value, continues immediately without backoff - RETRY_NOW constant exported from transaction.ts for Harper integration All 438 tests pass; 3 new Phase 3 correctness tests added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds uintptr_t dbPtr to LockTracker so cancelForDB can identify which slots belong to a closing DB. DBDescriptor::close() calls cancelForDB after the closables loop as a defensive final pass: if any VT lock survives the normal TransactionHandle::close() → releaseIntent() → wake() path, cancelForDB CASes it to 0 and fires wake() to unpark any TSFN waiters, preventing a waiter from parking forever after DB close. New stress-test/vt-stress.stress.test.ts exercises all VT paths under an 8-slot table, which forces every write to a collision bucket, giving full coverage of the ABA check, concurrent lock contention, coordinatedRetry under collision, and the cancelForDB lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Transaction::Get now accepts argv[4] as expectedVersion, computing the VT slot and forwarding it to TransactionHandle::get() for both the block-cache fast path and the async libuv path. - Database::Get already does the same (updated earlier); this brings parity for NativeTransaction callers. - store.get(): 4th param changed from txnId to StoreGetOptions so expectedVersion flows naturally alongside the transaction id. Passes expectedVersion to both the ONLY_IF_IN_MEMORY_CACHE getSync fast-path and the async context.get() call; propagates FRESH_VERSION_FLAG without clobbering VALUE_BUFFER.end. - store.getSync(): passes options.expectedVersion as 4th arg and guards the FRESH_VERSION_FLAG sentinel from the VALUE_BUFFER assignment path. - GetOptions gains an expectedVersion field. getBinary/getBinaryFast pass the full options object (including expectedVersion) instead of only the txnId. Return types widened to include number for the FRESH sentinel. Decode paths in get()/getSync() guard against FRESH so the sentinel is never passed to the decoder. - load-binding.ts: NativeDatabase.get and NativeTransaction.get type signatures updated to include expectedVersion; resolve callback widens to Buffer | number. FRESH_VERSION_FLAG exported as a standalone value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add `parseExpectedVersion` and `vtSlotFor` inline helpers to database.h. `parseExpectedVersion(env, arg, out)` consolidates the repeated typeof → get_double → memcpy → vtIsLock/zero check that appeared in 6 functions. `vtSlotFor(dbHandle, vt, key)` consolidates the dbPtr + cfId + slotFor 3-liner that appeared in the same 6 places. Applied in Database::Get, Database::GetSync, Database::VerifyVersion, Database::PopulateVersion, Transaction::Get, and Transaction::GetSync. Removes ~35 lines of duplicated boilerplate with no behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add GetOptions.populateVersion which ORs POPULATE_VERSION_FLAG into the native getSync/get flags, letting the native layer auto-seed the VT slot in the same call rather than requiring a separate populateVersion() call. Also merge expectedVersion into caller options so the transaction snapshot is preserved on VT cache misses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Handle::get Thread vtSlot, hasExpectedVersion, expectedVersion, and wantsPopulate through the TransactionHandle::get signature so that block-cache hits also run the VT fast-path (FRESH signal) and auto-populate logic, matching the behaviour of the disk-read async path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp and others added 3 commits April 25, 2026 15:05

kriszyp and others added 2 commits April 25, 2026 17:23

kriszyp changed the title ~~Feature/verification table~~ Process-global Verification Table with coordinated write-lock Apr 26, 2026

kriszyp and others added 5 commits April 25, 2026 19:51

Retry on busy for saving shared structure

98e09cd

kriszyp mentioned this pull request Apr 28, 2026

Retry with sentinel instead of error #531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process-global Verification Table with coordinated write-lock#526

Process-global Verification Table with coordinated write-lock#526
kriszyp wants to merge 10 commits into
mainfrom
feature/verification-table

kriszyp commented Apr 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kriszyp commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes by phase

Initial Prompt

Proposed approach

Test coverage

Uh oh!

github-actions Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Benchmark Results

get-sync.bench.ts

getSync() > random keys - small key size (100 records)

getSync() > sequential keys - small key size (100 records)

ranges.bench.ts

getRange() > small range (100 records, 50 range)

realistic-load.bench.ts

Realistic write load with workers > write variable records with transaction log

transaction-log.bench.ts

Transaction log > read 100 iterators while write log with 100 byte records

Transaction log > read one entry from random position from log with 1000 100 byte records

worker-put-sync.bench.ts

putSync() > random keys - small key size (100 records, 10 workers)

worker-transaction-log.bench.ts

Transaction log with workers > write log with 100 byte records

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kriszyp commented Apr 25, 2026 •

edited

Loading

github-actions Bot commented Apr 25, 2026 •

edited

Loading