Skip to content

Zero-copy ByteBuffer-backed vectors, no float[] materialization#1

Merged
eolivelli merged 5 commits into
mainfrom
feature/bytebuffer-zero-copy-vectors
Apr 22, 2026
Merged

Zero-copy ByteBuffer-backed vectors, no float[] materialization#1
eolivelli merged 5 commits into
mainfrom
feature/bytebuffer-zero-copy-vectors

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

@eolivelli eolivelli commented Apr 18, 2026

The original problem

I'm using jvector from HerdDB (https://github.com/diennea/herddb). HerdDB already holds its
vectors as ByteBuffer (or byte[]) in memory and on disk — it's how the storage layer
naturally represents fixed-width columnar data. But to feed a vector to jvector today you
have to go through float[]:

ByteBuffer source = herdDBVector;              // native layout, already little-endian
float[] tmp = new float[dim];                  // ALLOCATION — dim × 4 bytes
source.asFloatBuffer().get(tmp);               // COPY
VectorFloat<?> v = vts.createFloatVector(tmp);
builder.addGraphNode(ordinal, v);

For a per-row insertion or a per-query vector, that's a wasted allocation + copy every
single time. Multiply by millions of rows or thousands of queries and it becomes a real
pressure point: GC churn, dirty cache lines, extra memory bandwidth.

The ask: let HerdDB (and any integrator in the same position) feed vectors to jvector as
ByteBuffer directly — no float[] materialization on the hot path — while preserving
SIMD everywhere jvector already uses it.

Why this fits jvector cleanly

jvector's public API is already VectorFloat<?>-based, not float[]-based.
GraphIndexBuilder.addGraphNode, GraphSearcher.search, RandomAccessVectorValues.getVector,
VectorSimilarityFunction.compare, and every VectorUtil distance kernel already speak
VectorFloat<?>. The work is therefore not a rewrite of the API — it's:

  1. Adding a zero-copy ByteBuffer-backed VectorFloat<?> implementation.
  2. Teaching the SIMD backends to operate on it without casting to ArrayVectorFloat.
  3. Adding ByteBuffer-accepting factories, a ByteBuffer-backed RAVV, and ByteBuffer query
    overloads on the high-level API.
  4. Fixing internal leaks that would have forced materialization anyway
    (MMapRandomAccessVectorValues used to allocate float[dim] per getVector; PQ training's
    ProductQuantization.getSubVector allocated a float[dim/M] per training-vector × subspace).

Baseline: JDK 22+

HerdDB runs on JDK 25+, so this fork baselines on JDK 22. That's where
java.lang.foreign.MemorySegment became stable — the API we need to make Panama SIMD work on
BufferVectorFloat views — and it matches jvector-native's existing target. CI now tests
exactly JDK 22 (and JDK 24 for AVX-512); the old jdk11 / jdk20 Maven profiles are removed.
Version is bumped to 4.0.0-rc.9-herddb-SNAPSHOT.

Changes

Index build + search path (the original ByteBuffer zero-copy work)

  • BufferVectorFloat (new, jvector-base) — VectorFloat<ByteBuffer> view over a
    caller-owned buffer. Slices once at construction via ByteBuffer.slice(int, int) so
    subsequent element access and SIMD dispatch are allocation-free, and caller
    position()/limit() mutation doesn't disturb the view.
  • ByteBufferRandomAccessVectorValues (new, jvector-base) — bulk input RAVV over a single
    concatenated ByteBuffer of N × dimension × 4 bytes.
  • VectorTypeSupport.wrapFloatVector(ByteBuffer[, floatOffset, floatLength]) — typed
    zero-copy factory; MemorySegmentVectorProvider overrides it to return a MemorySegment-backed
    view directly so the native SIMD path stays zero-copy.
  • MemorySegmentVectorFloat.wrap(ByteBuffer) — zero-copy static factory, companion to the
    legacy copying ctor. Also fixes a latent bug in MemorySegmentVectorFloat.get(int) that threw
    on off-heap segments (now falls back to segment.getAtIndex when heapBase() is empty).
  • PanamaVectorUtilSupport — the four protected fromVectorFloat/intoVectorFloat
    helpers gain polymorphic dispatch: BufferVectorFloatFloatVector.fromMemorySegment( MemorySegment.ofBuffer(bb), byteOffset, order). Full SIMD, no float[] materialization.
  • NativeVectorUtilSupport — falls through to super for non-MemorySegment vectors, so
    BufferVectorFloat works under the native backend too.
  • DefaultVectorUtilSupport — scalar kernels made polymorphic: if both operands are
    ArrayVectorFloat, the existing unrolled float[] fast path runs; otherwise the generic
    .get(i) loop.
  • GraphSearcher.search(ByteBuffer, …) and GraphIndexBuilder.addGraphNode(int, ByteBuffer)
    overloads — thin wrappers that call wrapFloatVector internally.
  • MMapRandomAccessVectorValues — rewritten to delegate to ByteBufferRandomAccessVectorValues
    over a MappedByteBuffer. Drops the per-call float[dimension] scratch.

PQ codebook training (follow-up)

Flamegraphs of HerdDB's indexing service showed float[] allocations in
ProductQuantization.getSubVector — one fresh float[dim/M] per training-vector × subspace
during codebook construction. For 100k training vectors × M=8 that's ~800k small allocations.

  • VectorFloat.subview(int floatOffset, int floatLength) — new default method on the
    interface; fallback materializes, concrete types override for zero-copy.
  • ArraySliceVectorFloat (new, jvector-base) — VectorFloat<float[]> that references a
    root float[] at arrayOffset with its own logical length. Companion to
    ArraySliceByteSequence.
  • Zero-copy subview overrides on ArrayVectorFloat (→ ArraySliceVectorFloat),
    BufferVectorFloat (→ another BufferVectorFloat over the same ByteBuffer), and
    MemorySegmentVectorFloat (→ segment.asSlice(...)).
  • Panama SIMD dispatch recognizes ArraySliceVectorFloat:
    FloatVector.fromArray(SPEC, asv.get(), asv.arrayOffset() + offset) — SIMD preserved.
  • ArrayVectorFloat.copyFrom fast-pathed for ArraySliceVectorFloat source.
  • ProductQuantization.getSubVector rewritten to vector.subview(offset, length) — one-liner.

Measured on dim=64, M=8, K=16, N=2000 training vectors:
37 B allocated per training vector (down from ~448 B when subvectors were materialized).
KMeans centroid storage and distance arrays — the algorithmic remainder — are unchanged.

Verification

All tests green locally, on all relevant test targets:

  • mvn test -pl jvector-tests (SIMD profile): 225 tests, 0 failures, 2 skipped
  • mvn test -pl jvector-native: 6 tests green (incl. new MemorySegmentVectorFloatWrapTest)
  • mvn test -pl jvector-examples: 109 tests green (incl. new MMapRandomAccessVectorValuesTest)
  • mvn -B verify (full reactor): green

New test classes:

  • BufferVectorFloatTest — element access, endianness, zero-copy on direct buffers,
    position/limit independence, slice correctness (18 cases)
  • VectorTypeSupportByteBufferTest — typed factory across active + Default provider (6)
  • ByteBufferRandomAccessVectorValuesTest — parity with ListRAVV, concurrent
    threadLocalSupplier correctness (6)
  • BuildFromByteBufferEquivalenceTest — builds the same graph from float[] vs ByteBuffer,
    asserts structural + search equivalence across EUCLIDEAN / DOT_PRODUCT / COSINE (4)
  • TestSearchWithByteBufferQuery — ByteBuffer overloads produce same results as
    VectorFloat<?> overloads (2)
  • MMapRandomAccessVectorValuesTest — rewritten mmap RAVV round-trip (1)
  • MemorySegmentVectorFloatWrapTest — wrap vs legacy copying ctor, endianness, alignment (4)
  • SearchAllocationProfileTest — per-query allocation comparison float[] vs ByteBuffer (1)
  • VectorFloatSubviewTestsubview aliases source, nested subview, SIMD distance/dot/cosine
    equivalence with materialized copies (7)
  • PQTrainingAllocationTestgetSubVector returns a live view; PQ training
    per-vector allocation bounded (2)

Existing tests pass — notably TestProductQuantization (9 cases) confirms the view-based
subvector extraction produces codebooks identical to the prior materialization path.

HerdDB integration, before and after

Before:

ByteBuffer source = herdDBVector;
float[] tmp = new float[dim];
source.asFloatBuffer().get(tmp);
VectorFloat<?> v = vts.createFloatVector(tmp);
builder.addGraphNode(ordinal, v);

After:

ByteBuffer source = herdDBVector;
builder.addGraphNode(ordinal, source);            // zero-copy wrap inside
// — or for bulk build —
var ravv = new ByteBufferRandomAccessVectorValues(allVectors, N, dim);
builder.build();
// — for queries —
GraphSearcher.search(source, topK, ravv, vsf, graph, Bits.ALL);

The HerdDB-side upgrade instructions are tracked in
eolivelli/herddb#174.

🤖 Generated with Claude Code

eolivelli and others added 2 commits April 19, 2026 01:15
Add a zero-copy path from a caller-owned ByteBuffer to a jvector index
build or search, without the per-vector float[] allocation and copy that
integrators have to perform today. The public API already uses
VectorFloat<?>, so the changes are a targeted set of additions at the
abstraction boundary plus polymorphic dispatch in the SIMD backends so
they operate on ByteBuffer-backed vectors without materializing them
to float[].

New types
  - BufferVectorFloat (jvector-base): zero-copy VectorFloat<ByteBuffer>
    view over a caller-owned buffer. Slices once at construction so
    subsequent element access, Panama SIMD dispatch, and mutation of
    the caller's buffer position/limit never need to allocate.
  - ByteBufferRandomAccessVectorValues (jvector-base): RAVV over a
    single concatenated ByteBuffer of N×dimension floats.
  - VectorTypeSupport.wrapFloatVector(ByteBuffer[, floatOffset, floatLength]):
    typed factory producing the zero-copy view.
  - MemorySegmentVectorFloat.wrap(ByteBuffer): zero-copy static factory
    that complements the legacy copying constructor.

SIMD preservation
  - PanamaVectorUtilSupport detects BufferVectorFloat and dispatches to
    FloatVector.fromMemorySegment(MemorySegment.ofBuffer(bb), ...) —
    full SIMD with no float[] materialization. ArrayVectorFloat still
    uses fromArray.
  - NativeVectorUtilSupport's four protected helpers now fall through
    to super for non-MemorySegment vectors, so BufferVectorFloat works
    under native dispatch too.
  - DefaultVectorUtilSupport's scalar kernels gain a polymorphic entry
    that uses VectorFloat.get(i) for non-ArrayVectorFloat inputs.
  - jvector-twenty release bumped 20 -> 22 so MemorySegment is stable
    (matches jvector-native). Preview-locked class files were already
    being avoided by the project; this removes the last blocker.

High-level API
  - GraphSearcher.search(ByteBuffer, ...) and
    GraphIndexBuilder.addGraphNode(int, ByteBuffer) overloads — thin
    wrappers that call wrapFloatVector internally.
  - MMapRandomAccessVectorValues rewritten to delegate to
    ByteBufferRandomAccessVectorValues over a MappedByteBuffer. Drops
    the per-getVector float[dimension] scratch allocation that the
    old implementation performed.
  - MemorySegmentVectorFloat.get/set gain an off-heap fallback via
    segment.getAtIndex/setAtIndex, so wrap(direct ByteBuffer) works
    correctly (the on-heap fast path remains).

Polymorphic copyFrom
  - ArrayVectorFloat.copyFrom, MemorySegmentVectorFloat.copyFrom, and
    BufferVectorFloat.copyFrom handle any VectorFloat source instead
    of requiring the class-strict cast that was there before.

Tests (all green under SIMD and scalar profiles)
  - BufferVectorFloatTest: element access, endianness, zero-copy on
    direct buffers, position/limit independence, slice correctness.
  - VectorTypeSupportByteBufferTest: typed factory across active +
    Default provider, zero-copy proof, subrange views.
  - ByteBufferRandomAccessVectorValuesTest: parity with ListRAVV,
    concurrent threadLocalSupplier correctness, bounds.
  - BuildFromByteBufferEquivalenceTest: builds the same graph from
    float[] and from ByteBuffer and asserts structural + search
    equivalence across EUCLIDEAN / DOT_PRODUCT / COSINE at multiple
    dimensions.
  - TestSearchWithByteBufferQuery: ByteBuffer overloads produce same
    results as VectorFloat<?> overloads.
  - MMapRandomAccessVectorValuesTest: round-trip across rewritten mmap
    RAVV.
  - MemorySegmentVectorFloatWrapTest: wrap vs legacy copying ctor,
    big-endian rejection, alignment validation.
  - SearchAllocationProfileTest: per-query allocation stays within a
    small multiple of the float[] baseline (guard against accidental
    regression to a naive float[] materialization).

HerdDB integration becomes, end-to-end:
    ByteBuffer source = herdDBVector;
    builder.addGraphNode(ordinal, source);            // zero copy
    GraphSearcher.search(source, topK, ravv, vsf, graph, Bits.ALL);

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tags the build with a -herddb suffix so HerdDB can pin to this
jvector fork's artifacts without colliding with upstream
4.0.0-rc.9-SNAPSHOT in a shared local/remote Maven repository.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eolivelli and others added 3 commits April 19, 2026 12:32
…loat

CI on JDK 20 failed with "release version 22 not supported" — the
earlier release bump of jvector-twenty from 20 -> 22 assumed a higher
minimum JDK than the project's CI matrix supports. Revert to release
20 and teach PanamaVectorUtilSupport's three protected SIMD helpers
to handle BufferVectorFloat via a small SPEC-length float[] scratch
instead of FloatVector.fromMemorySegment (which needs java.lang.foreign,
still preview in Java 20). Functional behavior and SIMD on
ArrayVectorFloat unchanged. Full SIMD on BufferVectorFloat still
available via jvector-native, which targets Java 22 and has stable
MemorySegment.

Scratch is <= SPEC.length() floats (typically 8 or 16 -> 32-64 B),
allocated inside the hot helper so escape analysis can usually elide
it. The native backend (jvector-native) remains fully zero-copy and
full-SIMD for BufferVectorFloat via FloatVector.fromMemorySegment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HerdDB runs on JDK 25+ and only JDK 22+ can use the stable
java.lang.foreign.MemorySegment APIs this branch depends on. Drop
the old-JDK scaffolding so the code expresses its real minimum.

- parent pom release 11 -> 22; jvector-twenty restored to release 22
  with the FloatVector.fromMemorySegment path for BufferVectorFloat
  (reverts the scalar-fallback workaround from the prior commit)
- GitHub Actions unit-tests.yaml: build matrix [11,20,22] -> [22];
  build-avx512 matrix [20,24] -> [24]; remove JDK-20-specific
  "Verify Panama Vector Support" / "Test Panama Support" steps
- Drop now-unused jdk11 / jdk20 Maven profiles in jvector-tests
  and jvector-examples poms (jdk21 in tests and jdk22 in examples
  remain the active-by-default profiles)

Modernize to JDK 22 pattern matching + ByteBuffer.slice(int,int):
- BufferVectorFloat constructor uses ByteBuffer.slice(start, length)
  directly (one allocation instead of duplicate+position+limit+slice)
- BufferVectorFloat.copy / copyFrom use slice(int,int) and
  pattern-matching instanceof
- ArrayVectorFloat.copyFrom, MemorySegmentVectorFloat.copyFrom,
  PanamaVectorUtilSupport helpers, NativeVectorUtilSupport helpers,
  MemorySegmentVectorProvider.wrapFloatVector, ByteBufferRandomAccess-
  VectorValues constructor — all simplified with `instanceof T x`
  and slice(int,int)

Local verification: mvn verify (full reactor) green; jvector-tests,
jvector-native, jvector-examples test suites green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
async-profiler flamegraphs of HerdDB's indexing service showed
float[] allocations traceable to ProductQuantization.getSubVector —
one fresh float[dim/M] per (training_vector × subspace) during
PQ codebook training. For 100k training vectors × M=8 that's
~800k small allocations and tens of MB of GC pressure at every
index rebuild.

Eliminate them with zero-copy views:

- New VectorFloat.subview(int floatOffset, int floatLength) default
  method (materializes via VectorTypeSupport.createFloatVector +
  copyFrom as a fallback).
- Zero-copy overrides:
  * ArrayVectorFloat   -> ArraySliceVectorFloat (new)
  * BufferVectorFloat  -> another BufferVectorFloat over the same buffer
  * MemorySegmentVectorFloat -> a MemorySegmentVectorFloat over
    segment.asSlice(...)
- New ArraySliceVectorFloat — a VectorFloat<float[]> that references
  an underlying float[] at arrayOffset with its own logical length.
  Companion to the existing ArraySliceByteSequence.
- SIMD dispatch awareness:
  * PanamaVectorUtilSupport — FloatVector.fromArray(SPEC, asv.get(),
    asv.arrayOffset() + offset) and the same for intoArray + the
    gather variant. SIMD performance is preserved.
  * NativeVectorUtilSupport — falls through to super for
    ArraySliceVectorFloat (already did for non-MemorySegment types).
  * DefaultVectorUtilSupport — the generic .get(i)-based fallback
    path already handles arbitrary VectorFloat<?>.
- ArrayVectorFloat.copyFrom fast-pathed for ArraySliceVectorFloat
  source (System.arraycopy with adjusted offset).
- ProductQuantization.getSubVector rewritten to
  vector.subview(offset, length).

Tests:
- VectorFloatSubviewTest (7 cases) — subview aliases source,
  nested subview, SIMD distance/dot/cosine equivalence with
  materialized copies.
- PQTrainingAllocationTest — asserts that getSubVector returns a
  live view (mutation visible through subview) and that per-
  training-vector allocation during ProductQuantization.compute
  stays under a small bound (measured: 37 B/vector on dim=64 M=8
  K=16, down from the ~448 B/vector the materialization path cost).
- TestProductQuantization (9 existing cases) all green — codebooks
  produced through the view path match the previous materializing
  path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eolivelli eolivelli merged commit cec00dc into main Apr 22, 2026
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant