Skip to content

fix: harden leader verification and optimize scans; add MVCC/route co…#315

Merged
bootjp merged 10 commits intomainfrom
feature/refactor
Feb 17, 2026
Merged

fix: harden leader verification and optimize scans; add MVCC/route co…#315
bootjp merged 10 commits intomainfrom
feature/refactor

Conversation

@bootjp
Copy link
Owner

@bootjp bootjp commented Feb 17, 2026

…verage

Copilot AI review requested due to automatic review settings February 17, 2026 07:27
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @bootjp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the robustness and performance of the distributed key-value store. It centralizes and strengthens the leader verification logic to ensure operations are always directed to the active leader, preventing inconsistencies. Significant optimizations have been applied to key scanning mechanisms, particularly for Redis-like pattern matching and MVCC store iterations, to improve query efficiency. Additionally, list deletion now occurs in batches, and comprehensive test coverage has been added across critical components to validate these improvements and new functionalities.

Highlights

  • Hardened Leader Verification: Introduced a centralized verifyRaftLeader helper function to robustly check Raft leader status, including handling nil Raft instances, and integrated it across various components like Coordinator, LeaderProxy, ShardStore, and ShardedCoordinator to prevent operations on stale or non-existent leaders.
  • Optimized Redis Key Pattern Scanning: Refactored the localKeysPattern function in the Redis adapter to significantly improve the efficiency of key scanning, especially for patterns involving wildcards. This includes new functions for determining scan bounds and accurately matching keys, ensuring internal list keys are also correctly identified and filtered.
  • Improved MVCC Store Scan Performance: Optimized the underlying MVCC (Multi-Version Concurrency Control) store's scan mechanism by refactoring how it iterates through key versions. This change streamlines the process of finding the latest visible version of a key, leading to more efficient data retrieval during scans.
  • Batched List Deletion: Implemented batched deletion for Redis lists, allowing large lists to be deleted in chunks rather than a single, potentially long-running operation. This improves performance and reduces resource consumption during list removal.
  • Expanded Test Coverage: Added extensive new test cases for distribution server routing, Redis key pattern matching, leader verification logic, Hybrid Logical Clock (HLC) behavior, leader-routed store functionality, and MVCC store concurrency and snapshot/restore operations, enhancing the overall reliability of the system.
Changelog
  • adapter/distribution_server_test.go
    • Added tests for GetRoute to verify route hit and miss scenarios.
    • Added tests for GetTimestamp to ensure monotonic timestamp generation.
  • adapter/redis.go
    • Added listDeleteBatchSize and maxByteValue constants.
    • Refactored localKeysPattern to use new patternScanBounds and listPatternScanBounds for optimized scanning.
    • Introduced mergeScannedKeys helper to consolidate key scanning logic.
    • Added patternScanBounds function to determine scan ranges for user-defined patterns.
    • Added listPatternScanBounds function to map user patterns to internal list key ranges.
    • Added prefixScanEnd function to calculate the exclusive end bound for a given prefix.
    • Added matchesAsteriskPattern function for robust wildcard pattern matching.
    • Modified collectUserKeys to filter keys based on the provided pattern.
    • Updated deleteList to perform batched deletions of list items, improving performance for large lists.
  • adapter/redis_keys_pattern_test.go
    • Added new test file for Redis key pattern matching and scanning.
    • Implemented stubAdapterCoordinator for testing purposes.
    • Added TestPatternScanBounds to verify pattern range calculations.
    • Added TestListPatternScanBounds to verify list pattern range calculations.
    • Added TestMatchesAsteriskPattern to test wildcard pattern matching logic.
    • Added TestCollectUserKeys_FiltersByPattern to ensure user keys are filtered correctly by pattern.
    • Added TestLocalKeysPattern_FindsListKeysForUserPrefix to verify list key discovery with user prefixes.
  • kv/coordinator.go
    • Updated VerifyLeader method to use the new verifyRaftLeader helper function.
  • kv/coordinator_leader_test.go
    • Added new test file for Coordinator leader verification.
    • Added TestCoordinateVerifyLeader_LeaderReturnsNil to test successful leader verification.
    • Added TestCoordinateVerifyLeader_NilRaftReturnsLeaderNotFound to test error handling for nil Raft instance.
  • kv/hlc_test.go
    • Added new test file for Hybrid Logical Clock (HLC).
    • Added TestHLCNextIsMonotonic to verify monotonic timestamp generation.
    • Added TestHLCObserveDoesNotRegress to ensure HLC does not regress on observation.
    • Added TestHLCNextAfterObserveLogicalOverflow to test HLC behavior after logical overflow.
  • kv/leader_proxy.go
    • Updated Commit method to use verifyRaftLeader for robust leader check.
    • Updated Abort method to use verifyRaftLeader for robust leader check.
  • kv/leader_routed_store_test.go
    • Added new test file for LeaderRoutedStore.
    • Implemented stubLeaderCoordinator and fakeRawKVServer for testing leader routing.
    • Added TestLeaderRoutedStore_UsesLocalStoreWhenLeaderVerified to confirm local store usage when leader.
    • Added TestLeaderRoutedStore_ProxiesReadsWhenFollower to verify read proxying to leader.
    • Added TestLeaderRoutedStore_ReturnsLeaderNotFoundWhenNoLeaderAddr to test error handling when no leader address is available.
  • kv/raft_leader.go
    • Added new file containing verifyRaftLeader helper function for centralized Raft leader verification.
  • kv/shard_store.go
    • Updated isVerifiedRaftLeader to use verifyRaftLeader.
    • Updated LatestCommitTS to use verifyRaftLeader for leader check.
  • kv/sharded_coordinator.go
    • Updated VerifyLeader method to use verifyRaftLeader and improved error handling for missing groups.
    • Updated VerifyLeaderForKey method to use verifyRaftLeader and improved error handling for missing groups.
  • kv/sharded_coordinator_leader_test.go
    • Added new test file for ShardedCoordinator leader verification.
    • Added TestShardedCoordinatorVerifyLeader_LeaderReturnsNil to test successful leader verification for sharded coordinator.
    • Added TestShardedCoordinatorVerifyLeader_MissingGroup to test error handling for missing shard groups.
  • store/lsm_store.go
    • Removed seekToVisibleVersion and skipToNextUserKey helper functions.
    • Introduced decodeIterKey to safely decode key and version from an iterator.
    • Added scanCurrentUserKey to efficiently find the latest visible version for a given user key during scans.
    • Refactored collectScanResults to use the new scanCurrentUserKey for optimized MVCC scanning.
  • store/mvcc_store.go
    • Added a read lock to readTS method to ensure thread safety when accessing lastCommitTS.
  • store/mvcc_store_concurrency_test.go
    • Added new test file for MVCC store concurrency.
    • Added TestMVCCStore_ScanConcurrentWithWrites to verify scan operations behave correctly under concurrent write load.
  • store/mvcc_store_snapshot_test.go
    • Added new test file for MVCC store snapshot and restore functionality.
    • Added TestMVCCStore_SnapshotRestoreRoundTrip to verify snapshot and restore integrity.
    • Added TestMVCCStore_RestoreRejectsInvalidChecksum to test checksum validation during restore.
    • Added TestMVCCStore_ApplyMutations_WriteConflict to test write conflict detection during mutation application.
    • Added TestMVCCStore_ApplyMutations_UnknownOp to test error handling for unknown mutation operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable improvements, including hardening Raft leader verification by centralizing the logic, optimizing scan performance in the LSM store, and significantly increasing test coverage for MVCC, routing, and HLC components. The race condition fix in mvccStore is a critical improvement.

However, I've identified a critical issue in the deleteList implementation in adapter/redis.go where the introduction of batched deletes breaks the atomicity of the DEL command for lists, violating MVCC principles. This could lead to data corruption. Please see my detailed comment on this.

adapter/redis.go Outdated
Comment on lines 1129 to 1160
for {
readTS := r.readTS()
kvs, scanErr := r.store.ScanAt(ctx, start, end, listDeleteBatchSize, readTS)
if scanErr != nil {
return errors.WithStack(scanErr)
}
if len(kvs) == 0 {
break
}

ops := make([]*kv.Elem[kv.OP], 0, len(kvs)+1)
for _, kvp := range kvs {
ops = append(ops, &kv.Elem[kv.OP]{Op: kv.Del, Key: kvp.Key})
}
// delete meta last
ops = append(ops, &kv.Elem[kv.OP]{Op: kv.Del, Key: listMetaKey(key)})
ops := make([]*kv.Elem[kv.OP], 0, len(kvs))
for _, kvp := range kvs {
ops = append(ops, &kv.Elem[kv.OP]{Op: kv.Del, Key: kvp.Key})
}

// ensure meta bounds consistent even if scan missed (in case of empty list)
_ = meta
group := &kv.OperationGroup[kv.OP]{IsTxn: true, Elems: ops}
if _, dispatchErr := r.coordinator.Dispatch(ctx, group); dispatchErr != nil {
return errors.WithStack(dispatchErr)
}
if len(kvs) < listDeleteBatchSize {
break
}
}

group := &kv.OperationGroup[kv.OP]{IsTxn: true, Elems: ops}
group := &kv.OperationGroup[kv.OP]{
IsTxn: true,
Elems: []*kv.Elem[kv.OP]{
{Op: kv.Del, Key: listMetaKey(key)},
},
}
_, err = r.coordinator.Dispatch(ctx, group)
return errors.WithStack(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This implementation of deleteList breaks the atomicity of the DEL command for lists, which is a critical regression. The previous implementation correctly performed the deletion in a single transaction.

The new batched approach introduces two major issues:

  1. Loss of Atomicity: By splitting the deletion into multiple transactions, the operation is no longer atomic. A failure during the process will leave the list in a corrupt, partially-deleted state.
  2. Inconsistent Snapshot Reads: readTS is fetched inside the loop, meaning each batch is scanned at a different point-in-time. This can lead to inconsistencies.

To preserve correctness, this operation must be atomic. I recommend reverting to a single-transaction approach. If transaction size is a concern, it should be addressed at a lower level (e.g., with atomic range deletes in the storage engine) rather than compromising command atomicity.

Here is a suggestion that restores the atomic behavior:

	readTS := r.readTS()
	kvs, err := r.store.ScanAt(ctx, start, end, math.MaxInt, readTS)
	if err != nil {
		return errors.WithStack(err)
	}

	ops := make([]*kv.Elem[kv.OP], 0, len(kvs)+1)
	for _, kvp := range kvs {
		ops = append(ops, &kv.Elem[kv.OP]{Op: kv.Del, Key: kvp.Key})
	}
	// delete meta last
	ops = append(ops, &kv.Elem[kv.OP]{Op: kv.Del, Key: listMetaKey(key)})

	group := &kv.OperationGroup[kv.OP]{IsTxn: true, Elems: ops}
	_, err = r.coordinator.Dispatch(ctx, group)
	return errors.WithStack(err)
References
  1. In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens raft leader verification across coordinators/stores, optimizes scan-based operations (notably Pebble/LSM scans and Redis KEYS-pattern scanning), and adds targeted test coverage for MVCC snapshots/concurrency and leader-routing behavior.

Changes:

  • Centralize and harden raft leader verification via a shared verifyRaftLeader helper, with new tests for nil/missing-group cases.
  • Refactor Pebble (LSM) scan implementation and improve Redis KEYS pattern scanning + list-key discovery.
  • Add MVCC snapshot/restore, mutation error-path, concurrency, HLC, and leader-routed-store tests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
store/mvcc_store_snapshot_test.go Adds MVCC snapshot/restore + ApplyMutations error-path tests.
store/mvcc_store_concurrency_test.go Adds a concurrent Scan vs Put regression test.
store/mvcc_store.go Makes readTS() race-safe via RLock.
store/lsm_store.go Refactors Pebble scan logic to reduce seeks and simplify iteration.
kv/sharded_coordinator_leader_test.go Adds ShardedCoordinator leader verification tests (leader nil/missing group).
kv/sharded_coordinator.go Switches leader verification to the shared helper.
kv/shard_store.go Uses shared helper for leader verification checks.
kv/raft_leader.go Introduces verifyRaftLeader helper (nil-safe, stack-wrapped errors).
kv/leader_routed_store_test.go Adds tests for local-vs-proxied reads and missing leader address behavior.
kv/leader_proxy.go Uses shared helper before serving leader-only txn ops.
kv/hlc_test.go Adds monotonicity/observe-overflow behavior tests for HLC.
kv/coordinator_leader_test.go Adds Coordinator leader verification tests (including nil raft).
kv/coordinator.go Switches leader verification to the shared helper.
adapter/redis_keys_pattern_test.go Adds tests for pattern bounds, matching, and list key discovery.
adapter/redis.go Improves KEYS pattern scanning bounds + list namespace scans; batches list deletion scans/deletes.
adapter/distribution_server_test.go Adds DistributionServer route/timestamp monotonicity tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 309 to 315
if out == nil && version <= ts {
kv, err := s.processFoundValue(iter, userKey, ts)
if err != nil {
return nil, err
}
out = kv
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScanAt for Pebble store can return incorrect values when a tombstone/expired version is the latest visible version at the read timestamp. In scanCurrentUserKey, out stays nil when processFoundValue returns nil (tombstone/expired), so the loop keeps scanning older versions and may incorrectly return an earlier live value that should be masked by the tombstone/expiry. Track that the first version <= ts has been evaluated (even if it yields nil) and stop considering older versions for that userKey (advance to the next userKey).

Copilot uses AI. Check for mistakes.
adapter/redis.go Outdated
Comment on lines 1144 to 1147
group := &kv.OperationGroup[kv.OP]{IsTxn: true, Elems: ops}
if _, dispatchErr := r.coordinator.Dispatch(ctx, group); dispatchErr != nil {
return errors.WithStack(dispatchErr)
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batching list item deletions into multiple Dispatch calls makes list deletion non-atomic: if a later batch fails after earlier batches succeeded, the list meta remains (deleted only after the loop) and the list can be left partially deleted/inconsistent (meta length no longer matches items). Consider deleting/invalidating the meta first (so the list becomes logically absent) and then best-effort cleaning up items, or otherwise ensuring the delete is atomic/retriable.

Copilot uses AI. Check for mistakes.
@bootjp bootjp requested a review from Copilot February 17, 2026 12:38
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable improvements, including hardening leader verification by centralizing the logic, optimizing scan performance in the LSM store, and significantly increasing test coverage for MVCC, routing, and Redis adapter functionality. The changes are well-implemented and enhance the system's correctness and robustness. I've identified one high-severity issue in the Redis adapter related to ensuring consistent snapshots for the KEYS command, for which I've provided a detailed comment and suggestion.

Comment on lines 391 to 396
start, end := patternScanBounds(pattern)
keyset := map[string][]byte{}

readTS := r.readTS()
keys, err := r.store.ScanAt(context.Background(), start, nil, math.MaxInt, readTS)
if err != nil {
return nil, errors.WithStack(err)
mergeScannedKeys := func(scanStart, scanEnd []byte) error {
readTS := r.readTS()
keys, err := r.store.ScanAt(context.Background(), scanStart, scanEnd, math.MaxInt, readTS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The KEYS command should operate on a consistent snapshot of the data. In the current implementation of localKeysPattern, r.readTS() is called inside the mergeScannedKeys closure. Since mergeScannedKeys is called multiple times for a single KEYS command, each scan could potentially use a different timestamp if writes are occurring concurrently. This can lead to inconsistent results.

To ensure atomicity and a consistent view, the read timestamp should be fetched once at the beginning of localKeysPattern and reused for all subsequent scans within that function.

Suggested change
start, end := patternScanBounds(pattern)
keyset := map[string][]byte{}
readTS := r.readTS()
keys, err := r.store.ScanAt(context.Background(), start, nil, math.MaxInt, readTS)
if err != nil {
return nil, errors.WithStack(err)
mergeScannedKeys := func(scanStart, scanEnd []byte) error {
readTS := r.readTS()
keys, err := r.store.ScanAt(context.Background(), scanStart, scanEnd, math.MaxInt, readTS)
start, end := patternScanBounds(pattern)
keyset := map[string][]byte{}
readTS := r.readTS()
mergeScannedKeys := func(scanStart, scanEnd []byte) error {
keys, err := r.store.ScanAt(context.Background(), scanStart, scanEnd, math.MaxInt, readTS)
References
  1. In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view. While KEYS is not a transaction, it's a single command that should present a consistent view of the database, so the principle of using a single snapshot timestamp applies.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bootjp bootjp requested a review from Copilot February 17, 2026 13:07
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

@bootjp bootjp requested review from Copilot and removed request for Copilot February 17, 2026 13:08
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a set of significant improvements to the system's correctness, robustness, and performance. Key changes include hardening leader verification logic by centralizing it into a reusable helper, which is now used consistently across coordinators and proxies. The Redis KEYS command implementation has been substantially improved to correctly handle various wildcard patterns and to include list keys in results, along with performance optimizations for prefix-based scans. Critical correctness fixes have been made to the MVCC storage layers, including resolving a race condition in the in-memory mvccStore and fixing scan logic in the lsm_store to correctly handle tombstones. The PR also adds a wealth of test coverage for these changes, including concurrency tests, which greatly increases confidence in the stability of the system. Overall, this is an excellent set of changes that materially improves the quality of the codebase. Additionally, a specific comment regarding potential high memory usage in deleteList for large lists has been noted, suggesting a more scalable range deletion approach or a clarifying comment.

Comment on lines +1140 to 1143
kvs, err := r.store.ScanAt(ctx, start, end, math.MaxInt, startTS)
if err != nil {
return errors.WithStack(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation of deleteList scans all items of a list into memory before creating a transaction to delete them. For very large lists, this could lead to high memory consumption and potentially OOM errors.

A more scalable approach would be to use a range deletion at the storage layer if the underlying store supports it. For example, PebbleDB has a DeleteRange operation. This would avoid loading all keys into memory.

If implementing a range delete is not immediately feasible, it would be good to add a comment here noting the potential for high memory usage with large lists.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a number of significant improvements across the codebase. The hardening of leader verification by centralizing the logic and adding nil checks is a great step towards a more robust system. The scan optimizations in both the Redis adapter and the LSM store are impressive, not only improving performance but also fixing potential correctness issues with MVCC visibility. The addition of comprehensive test coverage, including concurrency and snapshot tests, greatly increases confidence in the stability of the system. Finally, the fix for the data race on lastCommitTS in mvccStore is a critical improvement for concurrency safety. Overall, this is an excellent set of changes.

Comment on lines 208 to 212
func (s *mvccStore) readTS() uint64 {
s.mtx.RLock()
defer s.mtx.RUnlock()
return s.lastCommitTS
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a critical fix to prevent a data race. On 32-bit architectures, reading a 64-bit integer (lastCommitTS) is not an atomic operation and can result in a torn read if another goroutine is writing to it concurrently. Adding the read lock ensures that the read is atomic and prevents this race condition.

@bootjp bootjp requested a review from Copilot February 17, 2026 13:37
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial pull request that brings significant improvements to leader verification, scan performance and correctness, and MVCC behavior. The introduction of the verifyRaftLeader helper hardens the system against panics from nil Raft instances. The optimizations and correctness fixes for Redis KEYS pattern matching and list deletion are excellent. The refactoring of the LSM store's scan logic to correctly handle tombstones is a critical correctness fix. I also appreciate the addition of comprehensive tests, which cover complex scenarios like cross-shard transactions and concurrency, greatly increasing confidence in these changes. I found one potential issue with timestamp handling during transaction rollbacks, which I've commented on.

return pb.Phase_NONE, 0, errors.Wrapf(ErrTxnLocked, "key: %s (timestamp overflow)", string(key))
}
return pb.Phase_ABORT, abortTS, nil
return pb.Phase_ABORT, cleanupTSWithNow(lock.StartTS, cleanupNow), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The previous implementation had a defensive check for timestamp overflow when calculating the abort timestamp. This check seems to have been removed. If lock.StartTS is math.MaxUint64, cleanupTSWithNow could return a timestamp smaller than lock.StartTS, which would violate the FSM invariant that the resolution timestamp must be greater than the start timestamp. It's safer to reintroduce the check to prevent this potential issue.

		abortTS := cleanupTSWithNow(lock.StartTS, cleanupNow)
		if abortTS <= lock.StartTS {
			return pb.Phase_NONE, 0, errors.Wrapf(ErrTxnLocked, "key: %s (timestamp overflow)", string(key))
		}
		return pb.Phase_ABORT, abortTS, nil

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +885 to +891
func cleanupTSWithNow(startTS, now uint64) uint64 {
next := startTS + 1
if now > next {
return now
}
return next
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanupTSWithNow can overflow when startTS == ^uint64(0) (next wraps to 0) and then returns now, which may be <= startTS. That can violate the FSM invariant abortTS > startTS (see fsm.go abort/commit checks) and cause lock-resolution aborts during scans to fail unexpectedly. Consider using the existing overflow-safe abortTSFrom(startTS, 0) (or add an explicit startTS==max guard) and return an error/0 when no strictly-greater timestamp is representable, similar to the previous defensive check logic.

Copilot uses AI. Check for mistakes.
Comment on lines 294 to 323
func (s *pebbleStore) scanCurrentUserKey(iter *pebble.Iterator, userKey []byte, ts uint64) (*KVPair, error) {
var out *KVPair
evaluatedVisible := false

for iter.Valid() {
curKey, version, ok := decodeIterKey(iter)
if !ok {
if !iter.Next() {
return out, nil
}
continue
}
if !bytes.Equal(curKey, userKey) {
return out, nil
}

// Only the newest version visible at ts should determine scan visibility.
// If that version is tombstoned/expired, older versions must stay masked.
if !evaluatedVisible && version <= ts {
kv, err := s.processFoundValue(iter, userKey, ts)
if err != nil {
return nil, err
}
out = kv
evaluatedVisible = true
}

if !iter.Next() {
return out, nil
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanCurrentUserKey continues iterating through all remaining versions for the current userKey even after it has already evaluated the newest visible version at ts. For keys with many historical versions this turns ScanAt into O(versions-per-key) work and can be a significant regression compared to fast-forwarding to the next user key once visibility has been decided. After setting evaluatedVisible=true, consider seeking directly to the next user key (or reintroducing a helper like the previous skipToNextUserKey) so scans don't have to walk older versions unnecessarily.

Copilot uses AI. Check for mistakes.
@bootjp bootjp requested a review from Copilot February 17, 2026 14:04
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several significant improvements to correctness, performance, and robustness. Key changes include hardening leader verification by centralizing the logic, fixing a critical data race in the in-memory MVCC store, and ensuring atomic list deletions in the Redis adapter by correctly setting transaction timestamps. The scan performance of the LSM store is also notably optimized, along with a correctness fix for handling tombstones. The addition of extensive test coverage, including for concurrency and distributed transactions, is a valuable contribution to the stability of the codebase.

_ = meta

group := &kv.OperationGroup[kv.OP]{IsTxn: true, Elems: ops}
group := &kv.OperationGroup[kv.OP]{IsTxn: true, StartTS: startTS, Elems: ops}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a critical correctness fix. By adding StartTS to the OperationGroup, you ensure that all delete operations for the list items and metadata occur within a single atomic transaction at a consistent snapshot timestamp. This adheres to MVCC principles and prevents partial deletions or inconsistencies.

References
  1. In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

Comment on lines +209 to 211
s.mtx.RLock()
defer s.mtx.RUnlock()
return s.lastCommitTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a critical fix to prevent a data race. The lastCommitTS field is written under a write lock in alignCommitTS, so reading it must be protected by a read lock. Without this lock, concurrent reads and writes could lead to reading a partially written value or other undefined behavior. The new test TestMVCCStore_ScanConcurrentWithWrites is great for catching this type of issue.

return pb.Phase_COMMIT, state.commitTS, nil
case txnStatusRolledBack:
abortTS := abortTSFrom(lock.StartTS, state.commitTS)
abortTS := cleanupTSWithNow(lock.StartTS, cleanupNow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using cleanupTSWithNow to generate the abort timestamp is a good improvement for robustness. It uses the current wall time (cleanupNow) to ensure the abort timestamp is recent, which helps prevent potential timestamp conflicts when cleaning up old or expired transaction locks. The previous approach of using startTS + 1 was correct but could be brittle if new transactions were started with timestamps very close to the aborted one.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +410 to +414
// User-key bounded scans like "foo*" do not naturally include internal list
// keys ("!lst|..."), so scan list namespaces separately with mapped bounds.
if start != nil || end != nil {
metaStart, metaEnd := listPatternScanBounds(store.ListMetaPrefix, pattern)
if err := mergeScannedKeys(metaStart, metaEnd); err != nil {
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localKeysPattern now does extra scans to include internal list namespaces for wildcard patterns like foo*, but exact-match patterns (no *) are routed to localKeysExact, which only checks the raw user key and will never return existing list keys (stored under !lst|...). This makes KEYS mylist inconsistent with EXISTS mylist and with wildcard results. Consider updating the exact-match path to detect list keys (e.g., by checking ListMetaKey(pattern)), or routing exact patterns through the same list-aware collection logic.

Copilot uses AI. Check for mistakes.
Comment on lines 286 to 290
func decodeIterKey(iter *pebble.Iterator) ([]byte, uint64, bool) {
key, version := decodeKey(iter.Key())
if key == nil {
return nil, 0, false
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pebbleStore persists lastCommitTS under the special key _meta_last_commit_ts, which is not MVCC-encoded and whose value is only 8 bytes. The scan iterator path decodes every Pebble key via decodeKey and may pass this meta key into processFoundValue, which will fail decodeValue (expects at least 9 bytes) or expose an internal key during full-range scans (e.g., ScanAt(nil, nil, ...)). Consider moving this metadata out of the user keyspace (e.g., separate prefix + explicit skip in scans, or a dedicated Pebble property), or ensure meta keys use the same key/value encoding and are filtered from user-facing scans.

Copilot uses AI. Check for mistakes.
@bootjp bootjp requested a review from Copilot February 17, 2026 14:34
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to leader verification, scan performance, and test coverage. The leader verification logic has been centralized into a verifyRaftLeader helper, improving consistency. Scan operations are now more efficient and correct, especially for Redis KEYS patterns and list types. The added tests, particularly for MVCC, routing, and distributed transactions, greatly enhance the robustness of the system. A critical concurrency bug in mvccStore has also been fixed by correctly synchronizing access to lastCommitTS. Overall, these are excellent changes that harden the system and improve its performance. I've left a few minor suggestions to reduce code duplication.

Comment on lines 705 to 729
if st, ok := t.listStates[string(cmd.Args[1])]; ok {
if st.metaExists {
st.purge = true
st.purgeMeta = st.meta
}
st.deleted = true
st.appends = nil
return redisResult{typ: resultInt, integer: 1}, nil
}
isList, err := t.server.isListKeyAt(context.Background(), cmd.Args[1], t.startTS)
if err != nil {
return redisResult{}, err
} else if isList {
}
if isList {
st, err := t.loadListState(cmd.Args[1])
if err != nil {
return redisResult{}, err
}
if st.metaExists {
st.purge = true
st.purgeMeta = st.meta
}
st.deleted = true
st.appends = nil
return redisResult{typ: resultInt, integer: 1}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for staging a list deletion is duplicated. The block at lines 706-711 is nearly identical to the one at 723-728. Consider extracting this logic into a helper function within applyDel to improve maintainability and reduce code duplication.

Comment on lines 354 to 360
func (c *ShardedCoordinator) VerifyLeader() error {
g, ok := c.groups[c.defaultGroup]
if !ok || g.Raft == nil {
if !ok {
return errors.WithStack(ErrLeaderNotFound)
}
return errors.WithStack(g.Raft.VerifyLeader().Error())
return verifyRaftLeader(g.Raft)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for getting a group and checking for its existence is duplicated in VerifyLeaderForKey. Consider extracting it into a helper function to avoid duplication.

Comment on lines 379 to 385
func (c *ShardedCoordinator) VerifyLeaderForKey(key []byte) error {
g, ok := c.groupForKey(key)
if !ok || g.Raft == nil {
if !ok {
return errors.WithStack(ErrLeaderNotFound)
}
return errors.WithStack(g.Raft.VerifyLeader().Error())
return verifyRaftLeader(g.Raft)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for getting a group and checking for its existence is duplicated in VerifyLeader. Consider extracting it into a helper function to avoid duplication.

@bootjp bootjp requested a review from Copilot February 17, 2026 15:37
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements across leader verification, scan performance, and transactional correctness, particularly for Redis list operations. The introduction of verifyRaftLeader centralizes and hardens leadership checks, which is a great enhancement for system stability. The scan optimizations in the Redis adapter and LSM store are well-thought-out. The fixes for transaction handling, including the race condition in mvccStore and the atomicity of list deletions, are crucial for correctness. The extensive test coverage added in this PR, including concurrency and edge case tests, is commendable and greatly increases confidence in the changes. I've found one potential race condition in adapter/redis.go and have provided a suggestion to address it. Overall, this is a high-quality contribution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bootjp bootjp merged commit d8ce7a9 into main Feb 17, 2026
13 checks passed
@bootjp bootjp deleted the feature/refactor branch February 17, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant