Skip to content

feat(cf): add cf.add command#3481

Open
nagisa-kunhah wants to merge 26 commits into
apache:unstablefrom
nagisa-kunhah:feature/3351-cf-add-ci-fix
Open

feat(cf): add cf.add command#3481
nagisa-kunhah wants to merge 26 commits into
apache:unstablefrom
nagisa-kunhah:feature/3351-cf-add-ci-fix

Conversation

@nagisa-kunhah
Copy link
Copy Markdown
Contributor

task id: #3351

@nagisa-kunhah nagisa-kunhah marked this pull request as ready for review May 4, 2026 13:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial CuckooFilter support to KVrocks (task #3351) by adding the CF.RESERVE and CF.ADD commands, along with a bucket-per-key RocksDB storage implementation and accompanying unit tests.

Changes:

  • Add a new Redis metadata type (kRedisCuckooFilter) and CuckooChainMetadata encoding/decoding.
  • Implement a bucket-based cuckoo filter chain (redis::CuckooChain) with Reserve and Add operations.
  • Register new commands (cf.reserve, cf.add) and add a comprehensive C++ unit test suite.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/cppunit/types/cuckoo_filter_test.cc Adds unit tests for CF reserve/add behavior and cuckoo filter helper functions.
src/types/redis_cuckoo_chain.h Declares the cuckoo chain DB wrapper and CF defaults.
src/types/redis_cuckoo_chain.cc Implements CF.RESERVE/CF.ADD logic using per-bucket RocksDB keys plus kick-out/expand paths.
src/types/cuckoo_filter.h Adds cuckoo filter hash/fingerprint/alt-hash helpers and bucket-count calculation.
src/storage/redis_metadata.h Introduces the new Redis type and CuckooChainMetadata structure.
src/storage/redis_metadata.cc Implements encode/decode and capacity calculation for CuckooChainMetadata.
src/commands/commander.h Adds a new command category for CuckooFilter commands.
src/commands/cmd_cuckoo_filter.cc Implements and registers cf.reserve and cf.add command handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storage/redis_metadata.cc Outdated
Comment thread src/types/redis_cuckoo_chain.cc Outdated
Comment thread src/types/redis_cuckoo_chain.cc Outdated
Comment thread src/types/redis_cuckoo_chain.cc Outdated
Comment thread src/types/redis_cuckoo_chain.cc Outdated
Comment thread src/types/redis_cuckoo_chain.cc
Comment thread src/types/redis_cuckoo_chain.h Outdated
Comment thread src/types/cuckoo_filter.h Outdated
Comment thread src/types/redis_cuckoo_chain.cc Outdated
Comment thread src/commands/cmd_cuckoo_filter.cc Outdated
@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented May 5, 2026

Hi @nagisa-kunhah. Thank you for your contribution.
In this PR, we have introduced a new data structure. This is a significant update that includes additions to the storage architecture. Could you describe the reasoning behind this design? We would like to understand your design philosophy and key decision points.

Additionally, please follow our AI policy: https://kvrocks.apache.org/community/contributing#guidelines-for-ai-assisted-contributions. Please let us know which AI tools and models you used, as this will help us better review the code.

@nagisa-kunhah
Copy link
Copy Markdown
Contributor Author

@jihuayu Hi, thank you for the review. I have summarized the design and implementation details below to explain the layering, responsibilities, and key storage decisions.

1 Design Overview

The design follows Kvrocks' existing layered architecture. Conceptually, the change can be divided into four layers:

  1. Command layer
  2. Type layer
  3. Metadata/storage encoding layer
  4. RocksDB persistence layer

1.1 Command layer

The command layer introduces RedisBloom-compatible Cuckoo Filter commands, such as CF.RESERVE, CF.ADD, CF.EXISTS, and CF.MEXISTS. Following the existing architecture, each command is implemented as a Commander subclass and registered through the existing command registration mechanism.

1.2 Type layer

The type layer introduces the new CuckooChain abstraction to represent each logical Cuckoo Filter.

The reason for using a chain abstraction is that Cuckoo Filters are not easy to resize in place: the bucket positions depend on the current bucket count, so resizing a filter would require rebuilding existing data. Instead, the design appends new sub-filters when expansion is needed. This model is inspired by RedisBloom's scalable Cuckoo Filter design.

CuckooChain implements the high-level operations for a logical Cuckoo Filter, including RESERVE, ADD, EXISTS, MEXISTS, and related key-level behavior.

To keep the type layer separated from the algorithm details, CuckooFilter is introduced as an internal helper. It provides Cuckoo Filter-specific calculations used by CuckooChain, while CuckooChain remains responsible for the Redis/Kvrocks-facing behavior.

1.3 Metadata/storage encoding layer

At the metadata/storage encoding layer, the implementation adds CuckooChainMetadata to store the state and parameters for each logical Cuckoo Filter. This metadata extends the existing Kvrocks metadata model, so common key-level fields such as type, size, expiration, and version are still handled consistently with other data types.

The Cuckoo-specific fields describe the filter chain, including the number of sub-filters, base capacity, bucket size, expansion factor, and insertion iteration limit. This keeps the logical filter state compact while leaving the actual bucket contents in subkeys.

1.4 RocksDB persistence layer

At the persistence layer, the implementation reuses Kvrocks' existing metadata/subkey model. The logical key metadata is stored as metadata, and individual buckets are stored as internal subkeys. This follows the same general pattern used by other complex Redis data structures in Kvrocks.

The metadata entry and bucket entries are written through RocksDB write batches when they need to be updated together. This keeps the logical filter state and the modified bucket data consistent without introducing a separate persistence path for Cuckoo Filter.

2 Logical Structure

A logical Cuckoo Filter is associated with one user key. Internally, it is represented as a chain of sub-filters rather than a single resizable filter.

The relationship between the main concepts is:

cuckoo_filter_logical_structure
  • The logical filter is the user-visible Cuckoo Filter associated with a Redis key.
  • A sub-filter is one filter segment in the chain. Expansion appends new sub-filters to the chain.
  • Each sub-filter contains num_buckets buckets, and num_buckets is rounded to a power of two.
  • A bucket belongs to one sub-filter and contains a fixed number of slots.
  • A slot stores one fingerprint. A zero value means the slot is empty.
  • A fingerprint is a compact representation derived from the item hash.

3 Implementation Details

3.1 Data Layout

3.1.1 Metadata

Each logical Cuckoo Filter key has one CuckooChainMetadata entry. The metadata describes the filter-level state and configuration, while the actual bucket data is stored separately.

The metadata is stored in the metadata column family. The RocksDB key is the namespace-prefixed logical Redis key, represented as ns_key in the implementation.

Conceptually:

metadata CF:
  <ns_key> -> CuckooChainMetadata

The metadata contains the following fields:

Field Name Data Type Size (Bytes) Description
size uint64_t 8 The total number of items recorded in the entire filter chain. This field is inherited from the base Metadata.
expire uint64_t 8 The expiration timestamp of the logical key. This field is inherited from the base Metadata.
version uint64_t 8 The metadata version used to separate current subkeys from stale subkeys. This field is inherited from the base Metadata.
n_filters uint16_t 2 The number of sub-filters in the chain.
expansion uint16_t 2 The growth factor used when a new sub-filter is appended.
base_capacity uint64_t 8 The capacity of the first sub-filter. The capacity of later sub-filters is derived from this value and expansion.
bucket_size uint8_t 1 The number of fingerprint slots each bucket can hold.
max_iterations uint16_t 2 The maximum number of relocation attempts during insertion.
num_deleted_items uint64_t 8 The number of deleted items recorded by the filter.

This metadata belongs to the logical Cuckoo Filter as a whole. It is not stored per bucket or per sub-filter.

3.1.2 Bucket Storage

The bucket data is stored as internal subkeys under the logical Cuckoo Filter key. Each bucket is identified by both a filter_index and a bucket_index.

The bucket subkey is constructed from:

  • filter_index: identifies which sub-filter in the chain the bucket belongs to.
  • bucket_index: identifies the bucket inside that sub-filter.

The bucket value is a fixed-size byte array whose length is bucket_size. Each byte stores one fingerprint. A value of 0 represents an empty slot, while valid fingerprints are stored as non-zero values.

Conceptually, the bucket layout is:

PrimarySubkey CF:
  <bucket_key> -> bucket_data

bucket_key = InternalKey(<ns_key>, <encoded filter_index and bucket_index>, version)

Here, filter_index and bucket_index are encoded into the bucket subkey as binary fields, not as a numeric sum.

This layout keeps the logical filter metadata separate from the bucket contents, while still reusing Kvrocks' existing internal subkey model.

3.2 Hashing Model

3.2.1 Item Hash

Each item is first converted into a 64-bit hash using HllMurMurHash64A with seed 0. This is the actual function name used in the codebase. It follows Redis' MurmurHash64A-style hash implementation and keeps the hashing model close to RedisBloom's Cuckoo Filter design.

The item hash is the base value used to derive both the fingerprint and the candidate bucket positions. Since these values determine where the item is stored and looked up, the hash function is part of the persistent data layout and should remain stable once data has been written.

3.2.2 Fingerprint

The fingerprint is generated from the item hash as hash % 255 + 1, producing an 8-bit non-zero value in the range 1..255; 0 is reserved as the empty-slot marker.

3.2.3 Candidate Buckets

For each sub-filter, an item has two candidate buckets. The first bucket is derived directly from the item hash:

bucket1 = hash % num_buckets

The second bucket is derived from both the hash and the fingerprint:

delta = fingerprint * 0x5bd1e995
bucket2 = (hash ^ delta) % num_buckets

The constant 0x5bd1e995 follows RedisBloom's Cuckoo Filter implementation and is used as the mixing constant for deriving the alternate bucket.

During kick-out insertion, the original item hash of an evicted fingerprint is no longer available. Instead, the implementation uses the current bucket index and the fingerprint to compute the alternate bucket:

alternate_bucket = (current_bucket ^ delta) % num_buckets

This is valid because num_buckets is always rounded to a power of two. When num_buckets = 2^k, modulo is equivalent to keeping the lower k bits, so:

(hash ^ delta) % num_buckets == ((hash % num_buckets) ^ delta) % num_buckets

This lets the kick-out path move a fingerprint between its two candidate buckets using only the current bucket index and the fingerprint.

3.3 Current Write Path

3.3.1 CF.RESERVE

CF.RESERVE creates the logical Cuckoo Filter key and initializes its metadata. It validates the requested capacity and configuration parameters, checks that the key does not already exist, and then creates a CuckooChainMetadata entry with the initial filter configuration.

The initial metadata records the base capacity, bucket size, maximum insertion iterations, expansion factor, and initializes n_filters to 1. The initial number of buckets is derived from the requested capacity and bucket size. The calculation uses a target load factor of 0.955, which reserves extra slots instead of assuming that all slots can be filled successfully. The result is then rounded to a power of two.

The implementation does not preallocate all buckets during reserve. Buckets are created lazily when they are first written. This keeps CF.RESERVE lightweight and avoids writing empty bucket data for sparse filters.

3.3.2 CF.ADD

CF.ADD inserts an item into an existing logical Cuckoo Filter. It first loads and decodes the CuckooChainMetadata, then computes the item hash and fingerprint.

For each sub-filter in the chain, the implementation derives the two candidate buckets for the item. It reads these buckets, treats missing buckets as empty buckets, and tries to place the fingerprint into any available slot in either bucket.

For insertion, sub-filters are checked from the first one to the latest one, in filter_index order from 0 to n_filters - 1. This prioritizes reusing available slots in earlier sub-filters before placing data into newer sub-filters, which keeps the chain more compact and avoids expanding the effective write target too aggressively.

This is different from RedisBloom, which checks sub-filters from the latest one back to the first one.

If a free slot is found, the updated bucket data and the updated metadata are written in the same write batch. This keeps the bucket content and the logical filter state updated atomically.

If no free slot is available in the candidate buckets, the implementation falls back to kick-out insertion on the latest sub-filter. The kick-out path relocates existing fingerprints between their candidate buckets and writes all modified buckets together when the insertion succeeds.

3.3.3 Expansion

Expansion is triggered when insertion cannot find a free slot and kick-out insertion also fails. Instead of resizing an existing sub-filter in place, the implementation appends a new sub-filter to the chain.

The new sub-filter is represented by increasing n_filters in CuckooChainMetadata. Its capacity is derived from base_capacity, expansion, and the new filter_index.

The existing buckets are not rebuilt or moved during expansion. This avoids rewriting existing filter data. After expansion, the insertion is retried against the newly added sub-filter.

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented May 7, 2026

Thanks for the proposal! These images are beautiful.

Hi @git-hulk @torwig @PragmaTwice @aleksraiden @LiuQhahah.
In this PR, we've added a new data structure. This part #3481 (comment) requires a review—could you all take a look?

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented May 7, 2026

Hi @nagisa-kunhah. Regarding the current proposal, I have the following suggestions:

Paging the Buckets

I believe that assigning one key per bucket will lead to a massive number of small keys, which is fatal to the system's performance. The overhead of RocksDB internal keys, memtables, index/filters, and compaction will be significantly greater than one or a few bytes.

I suggest we introduce a Page abstraction for buckets, where one page contains multiple buckets. For example, a 1KB page could contain 256 buckets.

I have performed a rough estimation as follows:

Full Page Utilization

Page Size Buckets/Page Actual Page Size Full Page Utilization Worst-case Amplification (1 Bucket) Worst-case Amplification vs. Bucket-per-Key Buckets Needed to Break Even
1KB 256 1101B 93.01% 275x 13.6x 14
2KB 512 2125B 96.38% 531x 26.2x 27
4KB 1024 4173B 98.15% 1043x 51.5x 52
8KB 2048 8269B 99.07% 2067x 102.1x 103

Note on "Buckets Needed to Break Even": This represents the minimum number of buckets that must be used within a page for the "paged" approach to become more space-efficient than the "bucket-per-key" approach. This occurs at approximately 5% occupancy.

Scenario for Default Capacity = 1024

Page Size Buckets/Page Pages before 1st Expansion New Sub-filter Pages before 2nd Expansion Total Pages before 2nd Expansion
1KB 256 2 4 6
2KB 512 1 2 3
4KB 1024 1 1 2
8KB 2048 1 1 2

Based on these findings, I recommend a default page size of 2KB or 4KB.

Using MultiGet / Batch Read for Candidate Buckets/Pages

Operations like CF.EXISTS, MEXISTS, and ADD typically access two candidate buckets in each sub-filter. If we use bucket-level keys, this results in many random reads. Even with page-level keys, we should aggregate requests and use MultiGet to reduce the number of round-trips to RocksDB.

Insertion Order: Prioritize Latest -> Old

RedisBloom queries from the newest sub-filter to the oldest. In Kvrocks, if we proceed from old to new, every write operation will first hit the older, fuller filters. This is likely to increase read amplification and the probability of "kick-outs." I suggest maintaining consistency with RedisBloom's "latest -> old" approach.

@nagisa-kunhah
Copy link
Copy Markdown
Contributor Author

@jihuayu Thanks for the suggestions. My understanding is that the Page abstraction is a persistent storage-layout unit, not an application-level cache.

I plan to replace the current bucket key layout:

InternalKey(ns_key, version, filter_index + bucket_index) -> bucket data

with a page-based layout:

InternalKey(ns_key, version, filter_index + page_index) -> multiple consecutive buckets

The bucket mapping would be:

buckets_per_page = page_size / bucket_size
page_index = bucket_index / buckets_per_page

Each sub-filter would own its own set of pages, so pages are not shared across sub-filters. Also, I would treat page_size as the upper bound for a page value, not a fixed physical size. Small sub-filters and the last page of a sub-filter may store less than page_size

For the first version, I would like to use a fixed internal page size, likely 2KB, and derive buckets_per_page when needed. We can consider making it configurable later, but then page_size should probably be stored in metadata because it affects the on-disk layout.

Could you confirm whether this matches your expectation, especially:

  1. pages are scoped within each sub-filter and are not shared across sub-filters.
  2. starting with a fixed internal page size is acceptable for the first version?

For the suggestions about MultiGet and insertion order, I agree with both points and will update the implementation accordingly.

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented May 7, 2026

@nagisa-kunhah Your understanding is spot on. There’s no need to rush into code changes just yet, as others might still chime in with feedback. We can wait until everyone is on the same page before you start refactoring.

page_size should probably be stored in metadata

I agree with you. Storing the page size makes sense; it will definitely make future extensions much easier to handle.

@nagisa-kunhah
Copy link
Copy Markdown
Contributor Author

@jihuayu Excuse me, I’ve added the paging implementation and now included CuckooPageSet. The overall architecture is illustrated in the attached diagram. The diff is a bit large — would you recommend splitting it into smaller PRs for easier review?
图片

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented May 13, 2026

Hi @nagisa-kunhah
Breaking this into multiple commits makes it hard to create a self-contained logic flow, and it’s difficult to verify the correctness of individual PRs. This approach is OK.

The review might take longer since the PR is quite large.

By the way, your image is very clear, and I really like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants