feat(cf): add cf.add command by nagisa-kunhah · Pull Request #3481 · apache/kvrocks

nagisa-kunhah · 2026-05-04T13:15:53Z

task id: #3351

Copilot

Pull request overview

This PR introduces initial CuckooFilter support to KVrocks (task #3351) by adding the CF.RESERVE and CF.ADD commands, along with a bucket-per-key RocksDB storage implementation and accompanying unit tests.

Changes:

Add a new Redis metadata type (kRedisCuckooFilter) and CuckooChainMetadata encoding/decoding.
Implement a bucket-based cuckoo filter chain (redis::CuckooChain) with Reserve and Add operations.
Register new commands (cf.reserve, cf.add) and add a comprehensive C++ unit test suite.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/cppunit/types/cuckoo_filter_test.cc	Adds unit tests for CF reserve/add behavior and cuckoo filter helper functions.
src/types/redis_cuckoo_chain.h	Declares the cuckoo chain DB wrapper and CF defaults.
src/types/redis_cuckoo_chain.cc	Implements CF.RESERVE/CF.ADD logic using per-bucket RocksDB keys plus kick-out/expand paths.
src/types/cuckoo_filter.h	Adds cuckoo filter hash/fingerprint/alt-hash helpers and bucket-count calculation.
src/storage/redis_metadata.h	Introduces the new Redis type and `CuckooChainMetadata` structure.
src/storage/redis_metadata.cc	Implements encode/decode and capacity calculation for `CuckooChainMetadata`.
src/commands/commander.h	Adds a new command category for CuckooFilter commands.
src/commands/cmd_cuckoo_filter.cc	Implements and registers `cf.reserve` and `cf.add` command handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jihuayu · 2026-05-05T05:21:50Z

Hi @nagisa-kunhah. Thank you for your contribution.
In this PR, we have introduced a new data structure. This is a significant update that includes additions to the storage architecture. Could you describe the reasoning behind this design? We would like to understand your design philosophy and key decision points.

Additionally, please follow our AI policy: https://kvrocks.apache.org/community/contributing#guidelines-for-ai-assisted-contributions. Please let us know which AI tools and models you used, as this will help us better review the code.

nagisa-kunhah · 2026-05-06T16:45:40Z

@jihuayu Hi, thank you for the review. I have summarized the design and implementation details below to explain the layering, responsibilities, and key storage decisions.

1 Design Overview

The design follows Kvrocks' existing layered architecture. Conceptually, the change can be divided into four layers:

Command layer
Type layer
Metadata/storage encoding layer
RocksDB persistence layer

1.1 Command layer

The command layer introduces RedisBloom-compatible Cuckoo Filter commands, such as CF.RESERVE, CF.ADD, CF.EXISTS, and CF.MEXISTS. Following the existing architecture, each command is implemented as a Commander subclass and registered through the existing command registration mechanism.

1.2 Type layer

The type layer introduces the new CuckooChain abstraction to represent each logical Cuckoo Filter.

The reason for using a chain abstraction is that Cuckoo Filters are not easy to resize in place: the bucket positions depend on the current bucket count, so resizing a filter would require rebuilding existing data. Instead, the design appends new sub-filters when expansion is needed. This model is inspired by RedisBloom's scalable Cuckoo Filter design.

CuckooChain implements the high-level operations for a logical Cuckoo Filter, including RESERVE, ADD, EXISTS, MEXISTS, and related key-level behavior.

To keep the type layer separated from the algorithm details, CuckooFilter is introduced as an internal helper. It provides Cuckoo Filter-specific calculations used by CuckooChain, while CuckooChain remains responsible for the Redis/Kvrocks-facing behavior.

1.3 Metadata/storage encoding layer

At the metadata/storage encoding layer, the implementation adds CuckooChainMetadata to store the state and parameters for each logical Cuckoo Filter. This metadata extends the existing Kvrocks metadata model, so common key-level fields such as type, size, expiration, and version are still handled consistently with other data types.

The Cuckoo-specific fields describe the filter chain, including the number of sub-filters, base capacity, bucket size, expansion factor, and insertion iteration limit. This keeps the logical filter state compact while leaving the actual bucket contents in subkeys.

1.4 RocksDB persistence layer

At the persistence layer, the implementation reuses Kvrocks' existing metadata/subkey model. The logical key metadata is stored as metadata, and individual buckets are stored as internal subkeys. This follows the same general pattern used by other complex Redis data structures in Kvrocks.

The metadata entry and bucket entries are written through RocksDB write batches when they need to be updated together. This keeps the logical filter state and the modified bucket data consistent without introducing a separate persistence path for Cuckoo Filter.

2 Logical Structure

A logical Cuckoo Filter is associated with one user key. Internally, it is represented as a chain of sub-filters rather than a single resizable filter.

The relationship between the main concepts is:

The logical filter is the user-visible Cuckoo Filter associated with a Redis key.
A sub-filter is one filter segment in the chain. Expansion appends new sub-filters to the chain.
Each sub-filter contains num_buckets buckets, and num_buckets is rounded to a power of two.
A bucket belongs to one sub-filter and contains a fixed number of slots.
A slot stores one fingerprint. A zero value means the slot is empty.
A fingerprint is a compact representation derived from the item hash.

3 Implementation Details

3.1 Data Layout

3.1.1 Metadata

Each logical Cuckoo Filter key has one CuckooChainMetadata entry. The metadata describes the filter-level state and configuration, while the actual bucket data is stored separately.

The metadata is stored in the metadata column family. The RocksDB key is the namespace-prefixed logical Redis key, represented as ns_key in the implementation.

Conceptually:

metadata CF:
  <ns_key> -> CuckooChainMetadata

The metadata contains the following fields:

Field Name	Data Type	Size (Bytes)	Description
`size`	`uint64_t`	8	The total number of items recorded in the entire filter chain. This field is inherited from the base `Metadata`.
`expire`	`uint64_t`	8	The expiration timestamp of the logical key. This field is inherited from the base `Metadata`.
`version`	`uint64_t`	8	The metadata version used to separate current subkeys from stale subkeys. This field is inherited from the base `Metadata`.
`n_filters`	`uint16_t`	2	The number of sub-filters in the chain.
`expansion`	`uint16_t`	2	The growth factor used when a new sub-filter is appended.
`base_capacity`	`uint64_t`	8	The capacity of the first sub-filter. The capacity of later sub-filters is derived from this value and `expansion`.
`bucket_size`	`uint8_t`	1	The number of fingerprint slots each bucket can hold.
`max_iterations`	`uint16_t`	2	The maximum number of relocation attempts during insertion.
`num_deleted_items`	`uint64_t`	8	The number of deleted items recorded by the filter.

This metadata belongs to the logical Cuckoo Filter as a whole. It is not stored per bucket or per sub-filter.

3.1.2 Bucket Storage

The bucket data is stored as internal subkeys under the logical Cuckoo Filter key. Each bucket is identified by both a filter_index and a bucket_index.

The bucket subkey is constructed from:

filter_index: identifies which sub-filter in the chain the bucket belongs to.
bucket_index: identifies the bucket inside that sub-filter.

The bucket value is a fixed-size byte array whose length is bucket_size. Each byte stores one fingerprint. A value of 0 represents an empty slot, while valid fingerprints are stored as non-zero values.

Conceptually, the bucket layout is:

PrimarySubkey CF:
  <bucket_key> -> bucket_data

bucket_key = InternalKey(<ns_key>, <encoded filter_index and bucket_index>, version)

Here, filter_index and bucket_index are encoded into the bucket subkey as binary fields, not as a numeric sum.

This layout keeps the logical filter metadata separate from the bucket contents, while still reusing Kvrocks' existing internal subkey model.

3.2 Hashing Model

3.2.1 Item Hash

Each item is first converted into a 64-bit hash using HllMurMurHash64A with seed 0. This is the actual function name used in the codebase. It follows Redis' MurmurHash64A-style hash implementation and keeps the hashing model close to RedisBloom's Cuckoo Filter design.

The item hash is the base value used to derive both the fingerprint and the candidate bucket positions. Since these values determine where the item is stored and looked up, the hash function is part of the persistent data layout and should remain stable once data has been written.

3.2.2 Fingerprint

The fingerprint is generated from the item hash as hash % 255 + 1, producing an 8-bit non-zero value in the range 1..255; 0 is reserved as the empty-slot marker.

3.2.3 Candidate Buckets

For each sub-filter, an item has two candidate buckets. The first bucket is derived directly from the item hash:

bucket1 = hash % num_buckets

The second bucket is derived from both the hash and the fingerprint:

delta = fingerprint * 0x5bd1e995
bucket2 = (hash ^ delta) % num_buckets

The constant 0x5bd1e995 follows RedisBloom's Cuckoo Filter implementation and is used as the mixing constant for deriving the alternate bucket.

During kick-out insertion, the original item hash of an evicted fingerprint is no longer available. Instead, the implementation uses the current bucket index and the fingerprint to compute the alternate bucket:

alternate_bucket = (current_bucket ^ delta) % num_buckets

This is valid because num_buckets is always rounded to a power of two. When num_buckets = 2^k, modulo is equivalent to keeping the lower k bits, so:

(hash ^ delta) % num_buckets == ((hash % num_buckets) ^ delta) % num_buckets

This lets the kick-out path move a fingerprint between its two candidate buckets using only the current bucket index and the fingerprint.

3.3 Current Write Path

3.3.1 CF.RESERVE

CF.RESERVE creates the logical Cuckoo Filter key and initializes its metadata. It validates the requested capacity and configuration parameters, checks that the key does not already exist, and then creates a CuckooChainMetadata entry with the initial filter configuration.

The initial metadata records the base capacity, bucket size, maximum insertion iterations, expansion factor, and initializes n_filters to 1. The initial number of buckets is derived from the requested capacity and bucket size. The calculation uses a target load factor of 0.955, which reserves extra slots instead of assuming that all slots can be filled successfully. The result is then rounded to a power of two.

The implementation does not preallocate all buckets during reserve. Buckets are created lazily when they are first written. This keeps CF.RESERVE lightweight and avoids writing empty bucket data for sparse filters.

3.3.2 CF.ADD

CF.ADD inserts an item into an existing logical Cuckoo Filter. It first loads and decodes the CuckooChainMetadata, then computes the item hash and fingerprint.

For each sub-filter in the chain, the implementation derives the two candidate buckets for the item. It reads these buckets, treats missing buckets as empty buckets, and tries to place the fingerprint into any available slot in either bucket.

For insertion, sub-filters are checked from the first one to the latest one, in filter_index order from 0 to n_filters - 1. This prioritizes reusing available slots in earlier sub-filters before placing data into newer sub-filters, which keeps the chain more compact and avoids expanding the effective write target too aggressively.

This is different from RedisBloom, which checks sub-filters from the latest one back to the first one.

If a free slot is found, the updated bucket data and the updated metadata are written in the same write batch. This keeps the bucket content and the logical filter state updated atomically.

If no free slot is available in the candidate buckets, the implementation falls back to kick-out insertion on the latest sub-filter. The kick-out path relocates existing fingerprints between their candidate buckets and writes all modified buckets together when the insertion succeeds.

3.3.3 Expansion

Expansion is triggered when insertion cannot find a free slot and kick-out insertion also fails. Instead of resizing an existing sub-filter in place, the implementation appends a new sub-filter to the chain.

The new sub-filter is represented by increasing n_filters in CuckooChainMetadata. Its capacity is derived from base_capacity, expansion, and the new filter_index.

The existing buckets are not rebuilt or moved during expansion. This avoids rewriting existing filter data. After expansion, the insertion is retried against the newly added sub-filter.

jihuayu · 2026-05-07T07:19:45Z

Thanks for the proposal! These images are beautiful.

Hi @git-hulk @torwig @PragmaTwice @aleksraiden @LiuQhahah.
In this PR, we've added a new data structure. This part #3481 (comment) requires a review—could you all take a look?

jihuayu · 2026-05-07T09:07:25Z

Hi @nagisa-kunhah. Regarding the current proposal, I have the following suggestions:

Paging the Buckets

I believe that assigning one key per bucket will lead to a massive number of small keys, which is fatal to the system's performance. The overhead of RocksDB internal keys, memtables, index/filters, and compaction will be significantly greater than one or a few bytes.

I suggest we introduce a Page abstraction for buckets, where one page contains multiple buckets. For example, a 1KB page could contain 256 buckets.

I have performed a rough estimation as follows:

Full Page Utilization

Page Size	Buckets/Page	Actual Page Size	Full Page Utilization	Worst-case Amplification (1 Bucket)	Worst-case Amplification vs. Bucket-per-Key	Buckets Needed to Break Even
1KB	256	1101B	93.01%	275x	13.6x	14
2KB	512	2125B	96.38%	531x	26.2x	27
4KB	1024	4173B	98.15%	1043x	51.5x	52
8KB	2048	8269B	99.07%	2067x	102.1x	103

Note on "Buckets Needed to Break Even": This represents the minimum number of buckets that must be used within a page for the "paged" approach to become more space-efficient than the "bucket-per-key" approach. This occurs at approximately 5% occupancy.

Scenario for Default Capacity = 1024

Page Size	Buckets/Page	Pages before 1st Expansion	New Sub-filter Pages before 2nd Expansion	Total Pages before 2nd Expansion
1KB	256	2	4	6
2KB	512	1	2	3
4KB	1024	1	1	2
8KB	2048	1	1	2

Based on these findings, I recommend a default page size of 2KB or 4KB.

Using MultiGet / Batch Read for Candidate Buckets/Pages

Operations like CF.EXISTS, MEXISTS, and ADD typically access two candidate buckets in each sub-filter. If we use bucket-level keys, this results in many random reads. Even with page-level keys, we should aggregate requests and use MultiGet to reduce the number of round-trips to RocksDB.

Insertion Order: Prioritize Latest -> Old

RedisBloom queries from the newest sub-filter to the oldest. In Kvrocks, if we proceed from old to new, every write operation will first hit the older, fuller filters. This is likely to increase read amplification and the probability of "kick-outs." I suggest maintaining consistency with RedisBloom's "latest -> old" approach.

nagisa-kunhah · 2026-05-07T12:39:24Z

@jihuayu Thanks for the suggestions. My understanding is that the Page abstraction is a persistent storage-layout unit, not an application-level cache.

I plan to replace the current bucket key layout:

InternalKey(ns_key, version, filter_index + bucket_index) -> bucket data

with a page-based layout:

InternalKey(ns_key, version, filter_index + page_index) -> multiple consecutive buckets

The bucket mapping would be:

buckets_per_page = page_size / bucket_size
page_index = bucket_index / buckets_per_page

Each sub-filter would own its own set of pages, so pages are not shared across sub-filters. Also, I would treat page_size as the upper bound for a page value, not a fixed physical size. Small sub-filters and the last page of a sub-filter may store less than page_size

For the first version, I would like to use a fixed internal page size, likely 2KB, and derive buckets_per_page when needed. We can consider making it configurable later, but then page_size should probably be stored in metadata because it affects the on-disk layout.

Could you confirm whether this matches your expectation, especially:

pages are scoped within each sub-filter and are not shared across sub-filters.
starting with a fixed internal page size is acceptable for the first version?

For the suggestions about MultiGet and insertion order, I agree with both points and will update the implementation accordingly.

jihuayu · 2026-05-07T12:47:22Z

@nagisa-kunhah Your understanding is spot on. There’s no need to rush into code changes just yet, as others might still chime in with feedback. We can wait until everyone is on the same page before you start refactoring.

page_size should probably be stored in metadata

I agree with you. Storing the page size makes sense; it will definitely make future extensions much easier to handle.

nagisa-kunhah · 2026-05-12T14:03:03Z

@jihuayu Excuse me, I’ve added the paging implementation and now included CuckooPageSet. The overall architecture is illustrated in the attached diagram. The diff is a bit large — would you recommend splitting it into smaller PRs for easier review?

jihuayu · 2026-05-13T00:44:21Z

Hi @nagisa-kunhah
Breaking this into multiple commits makes it hard to create a self-contained logic flow, and it’s difficult to verify the correctness of individual PRs. This approach is OK.

The review might take longer since the PR is quite large.

By the way, your image is very clear, and I really like it.

LiuQhahah and others added 11 commits January 21, 2026 12:04

feat: Implement CF.RESERVE command with bucket-based storage

1e98a87

feat: Implement CF.RESERVE command with bucket-based storage

577ee71

feat: Implement CF.RESERVE command with bucket-based storage

77b4c4a

feat: Implement CF.RESERVE command with bucket-based storage

025ee86

Merge branch 'unstable' into feature/3122-cf-reserve-bucket-storage

6a018df

feat: Implement CF.ADD command with kick-out insertion

1fe2c6f

trigger CI checks

f50125c

style: Apply clang-format to cuckoo filter files

65664fe

fix: Correct WriteBatchLogData constructor calls in CuckooChain

73ba1d4

Merge branch 'unstable' into feature/3351-cf-add

7401065

fix: ci

9ddfc83

nagisa-kunhah marked this pull request as ready for review May 4, 2026 13:16

Merge branch 'unstable' into feature/3351-cf-add-ci-fix

a361dd9

jihuayu requested review from Copilot and jihuayu May 5, 2026 05:06

Copilot started reviewing on behalf of jihuayu May 5, 2026 05:07 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Merge branch 'unstable' into feature/3351-cf-add-ci-fix

543e459

nagisa-kunhah and others added 5 commits May 7, 2026 00:49

Merge branch 'unstable' into feature/3351-cf-add-ci-fix

4ef241f

fix: ci

c89f2de

fix: loses atomicity

00abd42

fix: create new filter in CuckooChain::Add when key is not found

67f4c0d

optimize test

23beb86

fix: lint

1ade399

nagisa-kunhah and others added 7 commits May 11, 2026 02:10

feat: CuckooPageSet

da03915

fix: lint

adf0ff0

optimize code

cb91f43

Merge branch 'unstable' into feature/3351-cf-add-ci-fix

343ff28

fix: ci

d924d0a

fix: ci

87b8bdd

Merge branch 'unstable' into feature/3351-cf-add-ci-fix

382670f

Conversation

nagisa-kunhah commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jihuayu commented May 5, 2026

Uh oh!

nagisa-kunhah commented May 6, 2026

1 Design Overview

1.1 Command layer

1.2 Type layer

1.3 Metadata/storage encoding layer

1.4 RocksDB persistence layer

2 Logical Structure

3 Implementation Details

3.1 Data Layout

3.1.1 Metadata

3.1.2 Bucket Storage

3.2 Hashing Model

3.2.1 Item Hash

3.2.2 Fingerprint

3.2.3 Candidate Buckets

3.3 Current Write Path

3.3.1 CF.RESERVE

3.3.2 CF.ADD

3.3.3 Expansion

Uh oh!

jihuayu commented May 7, 2026

Uh oh!

jihuayu commented May 7, 2026

Paging the Buckets

Full Page Utilization

Scenario for Default Capacity = 1024

Using MultiGet / Batch Read for Candidate Buckets/Pages

Insertion Order: Prioritize Latest -> Old

Uh oh!

nagisa-kunhah commented May 7, 2026

Uh oh!

jihuayu commented May 7, 2026

Uh oh!

nagisa-kunhah commented May 12, 2026

Uh oh!

jihuayu commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants