feat: promote RankQuantFastscan to public API + .ovfs persistence#233
Conversation
…-compat Files written by the crate now use the ordvec magics OVR1/OVRQ/OVBM/OVSB (extensions .ovr/.ovrq/.ovbm/.ovsb), replacing the turbovec-era TV* magics. The read contract is unchanged: all loaders (rank_io.rs) AND the C ABI accept BOTH the current OV* and the legacy TV* magics, so every file the crate (or turbovec) ever wrote still loads. Only the write path changed. - src/rank_io.rs: OV* magic constants (written) + TV* retained read-only for back-compat; writers emit OV*; loaders + probe_index_metadata accept both. - ordvec-ffi: the C ABI sniff-magic dispatch accepts both OV* and TV*; probe path was already format-agnostic (uses IndexKind). - tests/persistence_compat.rs: forward fixtures now pin OV*; added tests proving legacy TV* files still load for all four index types. - Parity sweep (docs / extensions only, no logic): ordvec-manifest (+ python bindings), ordvec-python docstrings + tests, ordvec-go test, C header, fuzz targets, docs/*, README format line, SECURITY/THREAT_MODEL, CONTRIBUTING stable-surface statement (the read contract is never broken), .gitignore. Gate: fmt + clippy -D warnings (core/ffi/manifest) + full test suites (core exp+default, manifest, ffi) + ordvec-python check + rustdoc -D warnings. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
The shorter UNSUPPORTED_FORMAT message in info_for_metadata let rustfmt collapse
`info.kind = match {...}` onto one line. Formatting only, no logic change.
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Un-hides RankQuantFastscan (it was `#[doc(hidden)]` on both the re-export and the struct) and gives it persistence, building on the OV* format convention. - Un-hide: remove `#[doc(hidden)]` from the crate-root re-export and the struct; reword its docs as a stable-but-specialized b=2 minimum-latency scan path (not the headline retrieval surface). Add a discoverable crate-root mention. - Persistence (net-new): `RankQuantFastscan::write`/`load` via a new `.ovfs` format (magic `OVFS`; 13-byte header + the opaque block-32 packed payload). rank_io gains `write_fastscan`/`load_fastscan`/`fastscan_payload_bytes` (pub(crate)); the loader validates magic, `dim % 4 == 0`, n_vectors, exact payload length (no trailing bytes), with the MAX_PAYLOAD pre-File::create guard. OVFS is new in the ordvec format — no legacy TV* counterpart. - Tests: write/load round-trip scans byte-identically; empty-index round-trip; OVFS-magic-on-write; rejects wrong magic / trailing bytes / dim%4!=0. - Fuzz: new `load_fastscan` libFuzzer target (parity with the other 4 loaders). Kernels were already AVX-512/AVX2/scalar-optimized; this PR is promotion + persistence only, no kernel changes. Gate: fmt + clippy -D warnings + rustdoc -D warnings + full test suites + `cargo +nightly fuzz build load_fastscan`. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Code Review by Qodo
Context used 1.
|
PR Summary by QodoPromote RankQuantFastscan to public API and add .ovfs persistence WalkthroughsDescription• Make RankQuantFastscan a documented, stable public type for b=2 low-latency scans • Add .ovfs (OVFS) write/load persistence with strict header and payload validation • Extend test + fuzz coverage for OVFS round-trips and malformed-file rejection Diagramgraph TD
A["Public API: RankQuantFastscan"] --> B["rank_io fastscan I/O"] --> C[(".ovfs file (OVFS)")]
D["Tests: fastscan persistence"] --> A
E["Fuzz target: load_fastscan"] --> A
High-Level AssessmentThe following are alternative approaches to this PR: 1. Embed FastScan payload into an existing OV* container (e.g., OVRQ)
2. Add `.ovfs` probing support now (IndexKind::Fastscan)
Recommendation: Keep the dedicated File ChangesEnhancement (2)
Tests (2)
Documentation (1)
Other (1)
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
This pull request introduces persistence support for the RankQuantFastscan index, enabling saving and loading using the new .ovfs file format. It includes the implementation of the write and load methods, a new fuzz target, and comprehensive unit tests. The review feedback suggests validating dim and n_vectors prior to writing to prevent silent truncation and potential file corruption when casting n_vectors to a u32.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
`write_fastscan` backs the now-public `RankQuantFastscan` persistence API but could (a) silently truncate `dim`/`n_vectors` through the `as u32` casts, and (b) panic via `assert_eq!(packed_fs.len(), payload_bytes)` from a `Result`-returning fn (flagged by gemini + qodo). Validate `dim` (range + multiple-of-4 for FastScan b=2), `n_vectors`, the payload size, and the packed buffer length BEFORE `File::create`, returning a clean `io::Error` instead of panicking or truncating — and a rejected write never creates/truncates a file. Add a regression test: valid round-trip + Err-not-panic on bad dim / length. Regenerate `ordvec-ffi/include/ordvec.h` with cbindgen 0.29.3 (the CI version); the committed header had drifted from the loader doc-comment wording. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
Remediated in
Verified: |
…b doc fix #230 (OV* on-disk format) landed on main, conflicting with this branch's FastScan work in src/rank_io.rs. Resolution: - kept this branch's FastScan additions — the 5th `.ovfs`/`OVFS` format (doc bullet, `OVFS_MAGIC`, "Five formats"); - took main's canonicalized OV* loader magic labels (`read_magic(.., "OVR1")` etc.), matching the surrounding `OV*` field labels #230 introduced. Also rolled in the stop-gate follow-up that post-dates #230's merge (per request): the SignBitmap write/load/dim-check doc comments still described the CURRENT format as `.tvsb`; corrected to `.ovsb` (the written extension — `.tvsb` is legacy-only, accepted on load and documented as such). Verified: cbindgen --verify ok (no ffi drift), fmt + clippy clean, full suite green. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…t (PR3) (#243) * docs: propagate RankQuantFastscan public-API promotion + positioning lift #233 promoted RankQuantFastscan to a stable public type in lib.rs only; the README, RANK_MODES, compatibility-policy, determinism, THREAT_MODEL, the fastscan module-doc, the rank_io persistence-contract doc, c-api, CHANGELOG, and the Python binding docstring all still described it as #[doc(hidden)] / unstable and undercounted the on-disk formats (four -> five, adds .ovfs) and the cargo-fuzz targets (eight -> nine). This finishes the propagation so the public surface is self-consistent. Also: - fuzz.yml weekly sweep now actually runs load_fastscan. The .ovfs loader was a registered fuzz target but was missing from the weekly matrix, so the THREAT_MODEL "all nine targets" claim is now true rather than aspirational. - README "What's different" positioning lift: sharpen the training-free / data-oblivious no-fit wedge, credit the FAISS FastScan + binary-quant-rescore lineage for the two-stage / RankQuantFastscan path (no novelty overclaim), and make the HNSW params precise (ef_construction=200, ef_search=128). - Claims-discipline softening (no in-repo benchmark backs them): sign_bitmap.rs "competitive with / superior to learned hash codes" -> a mechanism statement; the RANK_MODES arXiv paper-harness numbers (under the real-corpus rerun guardrail) -> redirect to the reproducible BEIR harness; README + CHANGELOG "nothing hand-entered" -> accurate "tables transcribe the harness summary outputs". No code changes: doc comments, markdown, the Python module docstring, and the fuzz workflow matrix only. Gate: fmt + clippy (-D warnings) + cargo doc (-D warnings) + doctests all clean. Signed-off-by: Nelson Spence <nelson@projectnavi.ai> * fuzz: add load_fastscan to the manual full-campaign default targets The CI weekly sweep now runs all nine targets, but fuzz/run_full_fuzz.sh (the manual dev campaign helper) still defaulted to the eight pre-.ovfs targets, so a full manual run skipped load_fastscan — the untrusted .ovfs loader. Add it to the default TARGETS list (now nine) and fix the comment, so the manual campaign matches the THREAT_MODEL 'all nine targets' claim and the CI weekly matrix. Signed-off-by: Nelson Spence <nelson@projectnavi.ai> * fuzz: refresh run_full_fuzz duration guidance for nine targets Follow-up to adding load_fastscan: the header comment still quoted the eight-target campaign totals (~3h x 8 ~= 24h; the SECS_PER_TARGET=120 example ~16 min). Targets run sequentially, so the totals scale with the count — now ~3h x 9 ~= 27h and ~18 min. The runtime 'est. total' print is already computed from n_targets, so only the static guidance was stale. Signed-off-by: Nelson Spence <nelson@projectnavi.ai> --------- Signed-off-by: Nelson Spence <nelson@projectnavi.ai> Co-authored-by: Navi Bot <267427491+project-navi-bot@users.noreply.github.com>
Stacked on #230 (the
.ov*format). Promotes the FastScan b=2 path from#[doc(hidden)]to a documented public type and gives it persistence in theOV*convention #230 establishes.What
RankQuantFastscan— it was#[doc(hidden)]on both the crate-root re-export and the struct definition. Now a stable, documented type, framed as a specialized b=2 minimum-latency scan path (block-32 nibble-LUT; AVX-512 → AVX2 → scalar) — not the headline retrieval surface (that'sRankQuant/Bitmap/ two-stage). Added a discoverable crate-root mention.RankQuantFastscan::write/loadvia a new.ovfsformat (magicOVFS): a 13-byte header (OVFS| version | dim | n_vectors) + the opaque block-32 packed payload.rank_iogainswrite_fastscan/load_fastscan/fastscan_payload_bytes(pub(crate)). The loader validates magic,dim % 4 == 0, n_vectors, and exact payload length (no trailing bytes), with the MAX_PAYLOAD guard enforced beforeFile::create.OVFSis new in the ordvec format — no legacyTV*counterpart (FastScan never had persistence under turbovec).dim % 4 != 0.load_fastscanlibFuzzer target (parity with the four existing loader targets).Not in scope
.ovfsis deferred:probe_index_metadatareturnsIndexKind, and adding aFastscanvariant to the non-#[non_exhaustive]IndexKindenum would be a breaking change → tracked for the 0.8.0 API re-arch (0.8.0 staging: public API re-architecture & streamlining (breaking) #232). A.ovfsfile currently probes as 'unknown magic'.Validation
cargo fmt --all --check;clippy --all-targets --all-features -D warnings;RUSTDOCFLAGS=-D warnings cargo doc; full test suites (experimental + default, 18 fastscan tests incl. 6 new persistence);cargo +nightly fuzz build load_fastscan.Follow-up: PR3 docs refresh (surfaces FastScan +
.ovfsacross README / bindings / C-ABI / Go).