From 1a5f0422bad4d2f947cb7e4e8a58155ab9cc79f6 Mon Sep 17 00:00:00 2001 From: Jack Moffitt Date: Tue, 10 Mar 2026 13:36:38 -0500 Subject: [PATCH 1/2] RFC for quantization bootstrapping --- rfcs/00000-quantizer-bootstrap.md | 158 ++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 rfcs/00000-quantizer-bootstrap.md diff --git a/rfcs/00000-quantizer-bootstrap.md b/rfcs/00000-quantizer-bootstrap.md new file mode 100644 index 000000000..18f45ba3d --- /dev/null +++ b/rfcs/00000-quantizer-bootstrap.md @@ -0,0 +1,158 @@ +# Quantization Bootstrapping + +| | | +|---|---| +| **Authors** | Jack Moffitt | +| **Contributors** | | +| **Created** | 2026-03-10 | +| **Updated** | 2026-03-10 | + +## Summary + +Indexes that use quantization must have a minimum number of vectors before they +can build quantization tables and start inserting vectors. Bootstrapping is the +process of incrementally building an index that starts empty, operates on +non-quantized vectors until enough vectors are present to build quantization +tables, and then transitions to normal operation. + +## Motivation + +### Background + +DiskANN's quantizers require some statistical information in order to build +quantization tables. For PQ, 10,000 vectors are generally required to build good +tables; for spherical, 100 are needed. In order to create an index, these +vectors must be provided at creation time in order to build the quantization +tables, at which point each vector is quantized as it is inserted. + +This requirement is easy to fulfill when building indexes from existing +datasets, but when starting from scratch, there is no ability for DiskANN to +build a quantized index since the quantization tables are a required part of the +constructor. + +Current deployments of DiskANN work around this issue by not allowing index +creation until a dataset is sufficient large (pg_diskann), or operating a +separate flat index until sufficient vectors are collected at which point the +quantization tables are calculated and a graph index is built with DiskANN. + +### Problem Statement + +This RFC proposes changing DiskANN to operate in a quantization bootstrap mode +where it operates on full precision vectors until sufficient vectors exist to +create quantization tables, and then seamlessly transitions to a quantized +index. + +This means the index will operate in three different phases. In Phase 1, the +index operates in full precision mode only until sufficient vectors exist to +build quantization tables. During Phase 2, quantization tables will be built and +vectors will be quantized on insert; pre-existing vectors be quantized in the +background. Once all vectors are quantized, Phase Three begins the normal +operation of the quantized index. + +### Goals + +1. Allow quantized indices to start empty and use full precision data only to + operate until sufficient vectors are inserted. +2. Aside from allowing construction of `DiskANNIndex` without providing + quantization tables, there should user visible changes to using the index. +3. Performance should remain as high as possible during the three phases. +4. The quantization of previously inserted full vectors during Phase Two should + be controllable by the data provider. + +## Proposal + +Bootstrapping needs two changes to DiskANN. + +1. **Switching Strategies**: DiskANN needs to start by using full precision only + strategies during Phase One, and switching to a hybrid strategy for Phase + Two, and if the user's intent is to use quantized-only strategies, switching + to quantized-only for Phase Three. +2. **Quantization Backfill**: During Phase Two, previously inserted vectors will + need to be quantized. As background jobs are a performance concern, DiskANN + will need hooks for customizing this behavior. + +### Switching Strategies + +DiskANN already has the ability to run multiple strategies including hybrid full +precision and quantized ones. These should be sufficient for purposes of +bootstrapping, but we will need to orchestrate seamless transitions between +them. + +As the caller designates a strategy to use, we can implement new +`BootstrappedQuantized` and `BootstrappedHybrid` strategies that layer over +existing `FullPrecision`, `Quantized`, and `Hybrid` strategies. These new +bootstrapped strategies will delegate operation to the existing strategies +depending on the current phase. + +*Open question*: How exactly do we do this? + +### Quantization Backfill + +After quantization tables are built, newly inserted vectors will be quantized +before insertion, but previously inserted vectors won't have quantized +representations yet. During Phase Two, these previously inserted full precision +vectors will need to be quantized before the index enters Phase Three. + +Since integrators of DiskANN are sensitive to background jobs, how the index +manages backfilling quantized vectors should be controllable. + +The simplest way is to backfill all missing quantized vectors immediately during +the insert that starts Phase Two. This will cause a latency spike on that single +insert, but doesn't require any background processing. + +A more complicated solution would be to launch a background job that iterates +over full-precision only vectors and quantizes them. DiskANN should provide such +a job that integrators can use, but should also provide some callback that the +hosting database can pump to make incremental progress under its own control. + +Both of these methods can be realized by having a new trait `QuantBackfill`: + +```rust + +pub enum QuantBackfillStatus { + Incomplete, + Complete, +} + +pub trait QuantBackfill { + type BackfillError: AsyncFriendly; + + /// Backfill quantization vectors for up to approximately `duration` amount of time. + fn backfill(duration: Duration) -> impl Future> + AsyncFriendly; +} +``` + +This trait would be implemented on the type that implements the +`BootstrappedQuantized` and `BootstrappedHybrid` strategies. + +*Open question*: How to implement the background task and make it overridable? + +## Trade-offs + +Currently the workarounds in use are either no index at all until sufficient +vectors exist or operating a side index until sufficient vectors exist and then +building a quantized graph. + +pg_diskann uses the former method, which means users are confused when they try +to create indexes on empty tables or insufficiently populated tables and get an +error. Cosmos DB uses the latter strategy and operates a flat index until an +asynchronous graph build is complete enough to use the graph index. This +requires the Cosmos DB team to maintain all their own infrastructure for the +flat index and the code around transitioning to the graph index. + +This proposal mitigates the downsides while still allowing the integrator to +retain control over key performance details. + +## Benchmark Results + +Since there is no way to build an index currently until quantization tables are +built, there is no way to benchmark the first two phases. There should be no +impact during Phase Three to performance. + +## Future Work + +None. + +## References + +None. \ No newline at end of file From e906cda82515111650e334fe818fcea4811630bc Mon Sep 17 00:00:00 2001 From: Jack Moffitt Date: Thu, 19 Mar 2026 17:19:11 -0500 Subject: [PATCH 2/2] changes based on brainstorming with Mark --- rfcs/00000-quantizer-bootstrap.md | 140 +++++++++++++++++++----------- 1 file changed, 90 insertions(+), 50 deletions(-) diff --git a/rfcs/00000-quantizer-bootstrap.md b/rfcs/00000-quantizer-bootstrap.md index 18f45ba3d..a74c60100 100644 --- a/rfcs/00000-quantizer-bootstrap.md +++ b/rfcs/00000-quantizer-bootstrap.md @@ -3,7 +3,7 @@ | | | |---|---| | **Authors** | Jack Moffitt | -| **Contributors** | | +| **Contributors** | Mark Hildebrand | | **Created** | 2026-03-10 | | **Updated** | 2026-03-10 | @@ -31,7 +31,7 @@ build a quantized index since the quantization tables are a required part of the constructor. Current deployments of DiskANN work around this issue by not allowing index -creation until a dataset is sufficient large (pg_diskann), or operating a +creation until a dataset is sufficiently large (pg_diskann), or operating a separate flat index until sufficient vectors are collected at which point the quantization tables are calculated and a graph index is built with DiskANN. @@ -45,8 +45,8 @@ index. This means the index will operate in three different phases. In Phase 1, the index operates in full precision mode only until sufficient vectors exist to build quantization tables. During Phase 2, quantization tables will be built and -vectors will be quantized on insert; pre-existing vectors be quantized in the -background. Once all vectors are quantized, Phase Three begins the normal +vectors will be quantized on insert; pre-existing vectors will be quantized in +the background. Once all vectors are quantized, Phase Three begins the normal operation of the quantized index. ### Goals @@ -54,78 +54,113 @@ operation of the quantized index. 1. Allow quantized indices to start empty and use full precision data only to operate until sufficient vectors are inserted. 2. Aside from allowing construction of `DiskANNIndex` without providing - quantization tables, there should user visible changes to using the index. + quantization tables, there should be no user-visible changes to using the index. 3. Performance should remain as high as possible during the three phases. 4. The quantization of previously inserted full vectors during Phase Two should be controllable by the data provider. ## Proposal -Bootstrapping needs two changes to DiskANN. +Bootstrapping needs two changes to a DiskANN data provider implementation. 1. **Switching Strategies**: DiskANN needs to start by using full precision only - strategies during Phase One, and switching to a hybrid strategy for Phase - Two, and if the user's intent is to use quantized-only strategies, switching - to quantized-only for Phase Three. + strategies during Phase One, and switching to a quantized-only or hybrid + strategy for Phase Two. 2. **Quantization Backfill**: During Phase Two, previously inserted vectors will - need to be quantized. As background jobs are a performance concern, DiskANN - will need hooks for customizing this behavior. + need to be quantized. As background jobs are a performance concern, how + exactly this is accomplished must be customizable by the data provider. ### Switching Strategies DiskANN already has the ability to run multiple strategies including hybrid full -precision and quantized ones. These should be sufficient for purposes of -bootstrapping, but we will need to orchestrate seamless transitions between -them. - -As the caller designates a strategy to use, we can implement new -`BootstrappedQuantized` and `BootstrappedHybrid` strategies that layer over -existing `FullPrecision`, `Quantized`, and `Hybrid` strategies. These new -bootstrapped strategies will delegate operation to the existing strategies +precision and quantized ones. These strategies represent the high level intent, +but the data provider can choose alternate implementions depending on the data +available during the current phase. + +#### Insertion and Deletion + +Insertion and deletion can remain largely the same in the data provider +implementation. Inserts will need to write vectors, mappings, attributes, et al +into storage, and can track the current phase to gate writes to quantized +vectors. The search portion of these operations will return different objects depending on the current phase. -*Open question*: How exactly do we do this? +For example, consider a `DataProvder::set_element()` implementation: -### Quantization Backfill +```rust +struct ExampleProvider { + // other fields omitted + quantizer: Option, +} -After quantization tables are built, newly inserted vectors will be quantized -before insertion, but previously inserted vectors won't have quantized -representations yet. During Phase Two, these previously inserted full precision -vectors will need to be quantized before the index enters Phase Three. +impl SetElement<[f32]> for ExampleProvider { + // associated types ommitted + + async fn set_element( + &self, + context: &Self::Context, + id: &Self::ExternalId, + element: &[T], + ) -> Result { + let internal_id = self.new_id()?; + self.write_vector(context, internal_id, element)?; + self.set_internal_map(internal_id, id)?; + self.set_external_map(id, internal_id)?; + + // Quantize and storage quant vector if we have a quantizer. + if let Some(quantizer) = self.quantizer { + let qv = quantizer.quantize(element)?; + self.write_quant_vector(context, internal_id, element)?; + } else { + // This function will check if we are ready for Phase Two, and if so, do or schedule the quantizer intialization. + self.maybe_initialize_quantizer()?; + } + + Ok(NoopGuard::new(internal_id)) + } +} +``` -Since integrators of DiskANN are sensitive to background jobs, how the index -manages backfilling quantized vectors should be controllable. +Delete can similarly check the status of the quantizer, and delete quantized +vectors if they exist. -The simplest way is to backfill all missing quantized vectors immediately during -the insert that starts Phase Two. This will cause a latency spike on that single -insert, but doesn't require any background processing. +#### Searching -A more complicated solution would be to launch a background job that iterates -over full-precision only vectors and quantizes them. DiskANN should provide such -a job that integrators can use, but should also provide some callback that the -hosting database can pump to make incremental progress under its own control. +To avoid complexity of hybrid distance calculations, either full precision +distances will be used (Phase One and Two) or quantized distances will be used +(Phase Three). If a hybrid strategy is in use, then the hybrid distances will +not be used until Phase Three. -Both of these methods can be realized by having a new trait `QuantBackfill`: +Since vector data may be in one of two representations, the `Accessor::Element` +type should be `Poly` (this should be over-aligned to the correct alignment +for the primitive element type), and the data provider should interpret based on +data size. The distance and query computers will also need modifications to +accept both vector representations, and in the case of query computer the +representation much match that of the query. -```rust -pub enum QuantBackfillStatus { - Incomplete, - Complete, -} +### Quantization Backfill + +After quantization tables are built, newly inserted vectors will be quantized +before insertion, but previously inserted vectors won't have quantized +representations yet. During Phase Two, these previously inserted full precision +vectors will need to be quantized before the index enters Phase Three. -pub trait QuantBackfill { - type BackfillError: AsyncFriendly; +Since integrators of DiskANN are sensitive to background jobs, how the index +manages backfilling quantized vectors is controlled by the data provider +implementation. The data provider must have some way to track which vectors have +missing quantized representations so that it generate them. - /// Backfill quantization vectors for up to approximately `duration` amount of time. - fn backfill(duration: Duration) -> impl Future> + AsyncFriendly; -} -``` +Once Phase Two is reached, the data provider can either pause during insertion +of the phase changing vector, or schedule the work to happen asynchronously +however it likes. -This trait would be implemented on the type that implements the -`BootstrappedQuantized` and `BootstrappedHybrid` strategies. +One possibility is to piggy-back on deletion tracking to track quantization +status of vectors. For example, in diskann-garnet, a free space map is kept that +tracks deletes. This could be expanded from 1-bit to 2-bits, and the second bit +used to track whether the vector is quantized. Alternatively, metadata about the +allocated range can be kept and used to iterate over the unquantized set. -*Open question*: How to implement the background task and make it overridable? ## Trade-offs @@ -134,7 +169,7 @@ vectors exist or operating a side index until sufficient vectors exist and then building a quantized graph. pg_diskann uses the former method, which means users are confused when they try -to create indexes on empty tables or insufficiently populated tables and get an +to create indexes on empty or insufficiently populated tables and get an error. Cosmos DB uses the latter strategy and operates a flat index until an asynchronous graph build is complete enough to use the graph index. This requires the Cosmos DB team to maintain all their own infrastructure for the @@ -143,6 +178,11 @@ flat index and the code around transitioning to the graph index. This proposal mitigates the downsides while still allowing the integrator to retain control over key performance details. +This proposal also entirely encapsulates this inside the `DataProvider` +implementation. Alternatively, one could attempt to solve this with some kind of +index or strategy layering, but the complexity this would introduce seems not +worth the cost. + ## Benchmark Results Since there is no way to build an index currently until quantization tables are