Zero-copy ByteBuffer-backed vectors, no float[] materialization by eolivelli · Pull Request #1 · eolivelli/jvector

eolivelli · 2026-04-18T23:17:18Z

The original problem

I'm using jvector from HerdDB (https://github.com/diennea/herddb). HerdDB already holds its
vectors as ByteBuffer (or byte[]) in memory and on disk — it's how the storage layer
naturally represents fixed-width columnar data. But to feed a vector to jvector today you
have to go through float[]:

ByteBuffer source = herdDBVector;              // native layout, already little-endian
float[] tmp = new float[dim];                  // ALLOCATION — dim × 4 bytes
source.asFloatBuffer().get(tmp);               // COPY
VectorFloat<?> v = vts.createFloatVector(tmp);
builder.addGraphNode(ordinal, v);

For a per-row insertion or a per-query vector, that's a wasted allocation + copy every
single time. Multiply by millions of rows or thousands of queries and it becomes a real
pressure point: GC churn, dirty cache lines, extra memory bandwidth.

The ask: let HerdDB (and any integrator in the same position) feed vectors to jvector as
ByteBuffer directly — no float[] materialization on the hot path — while preserving
SIMD everywhere jvector already uses it.

Why this fits jvector cleanly

jvector's public API is already VectorFloat<?>-based, not float[]-based.
GraphIndexBuilder.addGraphNode, GraphSearcher.search, RandomAccessVectorValues.getVector,
VectorSimilarityFunction.compare, and every VectorUtil distance kernel already speak
VectorFloat<?>. The work is therefore not a rewrite of the API — it's:

Adding a zero-copy ByteBuffer-backed VectorFloat<?> implementation.
Teaching the SIMD backends to operate on it without casting to ArrayVectorFloat.
Adding ByteBuffer-accepting factories, a ByteBuffer-backed RAVV, and ByteBuffer query
overloads on the high-level API.
Fixing internal leaks that would have forced materialization anyway
(MMapRandomAccessVectorValues used to allocate float[dim] per getVector; PQ training's
ProductQuantization.getSubVector allocated a float[dim/M] per training-vector × subspace).

Baseline: JDK 22+

HerdDB runs on JDK 25+, so this fork baselines on JDK 22. That's where
java.lang.foreign.MemorySegment became stable — the API we need to make Panama SIMD work on
BufferVectorFloat views — and it matches jvector-native's existing target. CI now tests
exactly JDK 22 (and JDK 24 for AVX-512); the old jdk11 / jdk20 Maven profiles are removed.
Version is bumped to 4.0.0-rc.9-herddb-SNAPSHOT.

Changes

Index build + search path (the original ByteBuffer zero-copy work)

BufferVectorFloat (new, jvector-base) — VectorFloat<ByteBuffer> view over a
caller-owned buffer. Slices once at construction via ByteBuffer.slice(int, int) so
subsequent element access and SIMD dispatch are allocation-free, and caller
position()/limit() mutation doesn't disturb the view.
ByteBufferRandomAccessVectorValues (new, jvector-base) — bulk input RAVV over a single
concatenated ByteBuffer of N × dimension × 4 bytes.
VectorTypeSupport.wrapFloatVector(ByteBuffer[, floatOffset, floatLength]) — typed
zero-copy factory; MemorySegmentVectorProvider overrides it to return a MemorySegment-backed
view directly so the native SIMD path stays zero-copy.
MemorySegmentVectorFloat.wrap(ByteBuffer) — zero-copy static factory, companion to the
legacy copying ctor. Also fixes a latent bug in MemorySegmentVectorFloat.get(int) that threw
on off-heap segments (now falls back to segment.getAtIndex when heapBase() is empty).
PanamaVectorUtilSupport — the four protected fromVectorFloat/intoVectorFloat
helpers gain polymorphic dispatch: BufferVectorFloat → FloatVector.fromMemorySegment( MemorySegment.ofBuffer(bb), byteOffset, order). Full SIMD, no float[] materialization.
NativeVectorUtilSupport — falls through to super for non-MemorySegment vectors, so
BufferVectorFloat works under the native backend too.
DefaultVectorUtilSupport — scalar kernels made polymorphic: if both operands are
ArrayVectorFloat, the existing unrolled float[] fast path runs; otherwise the generic
.get(i) loop.
GraphSearcher.search(ByteBuffer, …) and GraphIndexBuilder.addGraphNode(int, ByteBuffer)
overloads — thin wrappers that call wrapFloatVector internally.
MMapRandomAccessVectorValues — rewritten to delegate to ByteBufferRandomAccessVectorValues
over a MappedByteBuffer. Drops the per-call float[dimension] scratch.

PQ codebook training (follow-up)

Flamegraphs of HerdDB's indexing service showed float[] allocations in
ProductQuantization.getSubVector — one fresh float[dim/M] per training-vector × subspace
during codebook construction. For 100k training vectors × M=8 that's ~800k small allocations.

VectorFloat.subview(int floatOffset, int floatLength) — new default method on the
interface; fallback materializes, concrete types override for zero-copy.
ArraySliceVectorFloat (new, jvector-base) — VectorFloat<float[]> that references a
root float[] at arrayOffset with its own logical length. Companion to
ArraySliceByteSequence.
Zero-copy subview overrides on ArrayVectorFloat (→ ArraySliceVectorFloat),
BufferVectorFloat (→ another BufferVectorFloat over the same ByteBuffer), and
MemorySegmentVectorFloat (→ segment.asSlice(...)).
Panama SIMD dispatch recognizes ArraySliceVectorFloat:
FloatVector.fromArray(SPEC, asv.get(), asv.arrayOffset() + offset) — SIMD preserved.
ArrayVectorFloat.copyFrom fast-pathed for ArraySliceVectorFloat source.
ProductQuantization.getSubVector rewritten to vector.subview(offset, length) — one-liner.

Measured on dim=64, M=8, K=16, N=2000 training vectors:
37 B allocated per training vector (down from ~448 B when subvectors were materialized).
KMeans centroid storage and distance arrays — the algorithmic remainder — are unchanged.

Verification

All tests green locally, on all relevant test targets:

mvn test -pl jvector-tests (SIMD profile): 225 tests, 0 failures, 2 skipped
mvn test -pl jvector-native: 6 tests green (incl. new MemorySegmentVectorFloatWrapTest)
mvn test -pl jvector-examples: 109 tests green (incl. new MMapRandomAccessVectorValuesTest)
mvn -B verify (full reactor): green

New test classes:

BufferVectorFloatTest — element access, endianness, zero-copy on direct buffers,
position/limit independence, slice correctness (18 cases)
VectorTypeSupportByteBufferTest — typed factory across active + Default provider (6)
ByteBufferRandomAccessVectorValuesTest — parity with ListRAVV, concurrent
threadLocalSupplier correctness (6)
BuildFromByteBufferEquivalenceTest — builds the same graph from float[] vs ByteBuffer,
asserts structural + search equivalence across EUCLIDEAN / DOT_PRODUCT / COSINE (4)
TestSearchWithByteBufferQuery — ByteBuffer overloads produce same results as
VectorFloat<?> overloads (2)
MMapRandomAccessVectorValuesTest — rewritten mmap RAVV round-trip (1)
MemorySegmentVectorFloatWrapTest — wrap vs legacy copying ctor, endianness, alignment (4)
SearchAllocationProfileTest — per-query allocation comparison float[] vs ByteBuffer (1)
VectorFloatSubviewTest — subview aliases source, nested subview, SIMD distance/dot/cosine
equivalence with materialized copies (7)
PQTrainingAllocationTest — getSubVector returns a live view; PQ training
per-vector allocation bounded (2)

Existing tests pass — notably TestProductQuantization (9 cases) confirms the view-based
subvector extraction produces codebooks identical to the prior materialization path.

HerdDB integration, before and after

Before:

ByteBuffer source = herdDBVector;
float[] tmp = new float[dim];
source.asFloatBuffer().get(tmp);
VectorFloat<?> v = vts.createFloatVector(tmp);
builder.addGraphNode(ordinal, v);

After:

ByteBuffer source = herdDBVector;
builder.addGraphNode(ordinal, source);            // zero-copy wrap inside
// — or for bulk build —
var ravv = new ByteBufferRandomAccessVectorValues(allVectors, N, dim);
builder.build();
// — for queries —
GraphSearcher.search(source, topK, ravv, vsf, graph, Bits.ALL);

The HerdDB-side upgrade instructions are tracked in
eolivelli/herddb#174.

🤖 Generated with Claude Code

Add a zero-copy path from a caller-owned ByteBuffer to a jvector index build or search, without the per-vector float[] allocation and copy that integrators have to perform today. The public API already uses VectorFloat<?>, so the changes are a targeted set of additions at the abstraction boundary plus polymorphic dispatch in the SIMD backends so they operate on ByteBuffer-backed vectors without materializing them to float[]. New types - BufferVectorFloat (jvector-base): zero-copy VectorFloat<ByteBuffer> view over a caller-owned buffer. Slices once at construction so subsequent element access, Panama SIMD dispatch, and mutation of the caller's buffer position/limit never need to allocate. - ByteBufferRandomAccessVectorValues (jvector-base): RAVV over a single concatenated ByteBuffer of N×dimension floats. - VectorTypeSupport.wrapFloatVector(ByteBuffer[, floatOffset, floatLength]): typed factory producing the zero-copy view. - MemorySegmentVectorFloat.wrap(ByteBuffer): zero-copy static factory that complements the legacy copying constructor. SIMD preservation - PanamaVectorUtilSupport detects BufferVectorFloat and dispatches to FloatVector.fromMemorySegment(MemorySegment.ofBuffer(bb), ...) — full SIMD with no float[] materialization. ArrayVectorFloat still uses fromArray. - NativeVectorUtilSupport's four protected helpers now fall through to super for non-MemorySegment vectors, so BufferVectorFloat works under native dispatch too. - DefaultVectorUtilSupport's scalar kernels gain a polymorphic entry that uses VectorFloat.get(i) for non-ArrayVectorFloat inputs. - jvector-twenty release bumped 20 -> 22 so MemorySegment is stable (matches jvector-native). Preview-locked class files were already being avoided by the project; this removes the last blocker. High-level API - GraphSearcher.search(ByteBuffer, ...) and GraphIndexBuilder.addGraphNode(int, ByteBuffer) overloads — thin wrappers that call wrapFloatVector internally. - MMapRandomAccessVectorValues rewritten to delegate to ByteBufferRandomAccessVectorValues over a MappedByteBuffer. Drops the per-getVector float[dimension] scratch allocation that the old implementation performed. - MemorySegmentVectorFloat.get/set gain an off-heap fallback via segment.getAtIndex/setAtIndex, so wrap(direct ByteBuffer) works correctly (the on-heap fast path remains). Polymorphic copyFrom - ArrayVectorFloat.copyFrom, MemorySegmentVectorFloat.copyFrom, and BufferVectorFloat.copyFrom handle any VectorFloat source instead of requiring the class-strict cast that was there before. Tests (all green under SIMD and scalar profiles) - BufferVectorFloatTest: element access, endianness, zero-copy on direct buffers, position/limit independence, slice correctness. - VectorTypeSupportByteBufferTest: typed factory across active + Default provider, zero-copy proof, subrange views. - ByteBufferRandomAccessVectorValuesTest: parity with ListRAVV, concurrent threadLocalSupplier correctness, bounds. - BuildFromByteBufferEquivalenceTest: builds the same graph from float[] and from ByteBuffer and asserts structural + search equivalence across EUCLIDEAN / DOT_PRODUCT / COSINE at multiple dimensions. - TestSearchWithByteBufferQuery: ByteBuffer overloads produce same results as VectorFloat<?> overloads. - MMapRandomAccessVectorValuesTest: round-trip across rewritten mmap RAVV. - MemorySegmentVectorFloatWrapTest: wrap vs legacy copying ctor, big-endian rejection, alignment validation. - SearchAllocationProfileTest: per-query allocation stays within a small multiple of the float[] baseline (guard against accidental regression to a naive float[] materialization). HerdDB integration becomes, end-to-end: ByteBuffer source = herdDBVector; builder.addGraphNode(ordinal, source); // zero copy GraphSearcher.search(source, topK, ravv, vsf, graph, Bits.ALL); Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tags the build with a -herddb suffix so HerdDB can pin to this jvector fork's artifacts without colliding with upstream 4.0.0-rc.9-SNAPSHOT in a shared local/remote Maven repository. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…loat CI on JDK 20 failed with "release version 22 not supported" — the earlier release bump of jvector-twenty from 20 -> 22 assumed a higher minimum JDK than the project's CI matrix supports. Revert to release 20 and teach PanamaVectorUtilSupport's three protected SIMD helpers to handle BufferVectorFloat via a small SPEC-length float[] scratch instead of FloatVector.fromMemorySegment (which needs java.lang.foreign, still preview in Java 20). Functional behavior and SIMD on ArrayVectorFloat unchanged. Full SIMD on BufferVectorFloat still available via jvector-native, which targets Java 22 and has stable MemorySegment. Scratch is <= SPEC.length() floats (typically 8 or 16 -> 32-64 B), allocated inside the hot helper so escape analysis can usually elide it. The native backend (jvector-native) remains fully zero-copy and full-SIMD for BufferVectorFloat via FloatVector.fromMemorySegment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

HerdDB runs on JDK 25+ and only JDK 22+ can use the stable java.lang.foreign.MemorySegment APIs this branch depends on. Drop the old-JDK scaffolding so the code expresses its real minimum. - parent pom release 11 -> 22; jvector-twenty restored to release 22 with the FloatVector.fromMemorySegment path for BufferVectorFloat (reverts the scalar-fallback workaround from the prior commit) - GitHub Actions unit-tests.yaml: build matrix [11,20,22] -> [22]; build-avx512 matrix [20,24] -> [24]; remove JDK-20-specific "Verify Panama Vector Support" / "Test Panama Support" steps - Drop now-unused jdk11 / jdk20 Maven profiles in jvector-tests and jvector-examples poms (jdk21 in tests and jdk22 in examples remain the active-by-default profiles) Modernize to JDK 22 pattern matching + ByteBuffer.slice(int,int): - BufferVectorFloat constructor uses ByteBuffer.slice(start, length) directly (one allocation instead of duplicate+position+limit+slice) - BufferVectorFloat.copy / copyFrom use slice(int,int) and pattern-matching instanceof - ArrayVectorFloat.copyFrom, MemorySegmentVectorFloat.copyFrom, PanamaVectorUtilSupport helpers, NativeVectorUtilSupport helpers, MemorySegmentVectorProvider.wrapFloatVector, ByteBufferRandomAccess- VectorValues constructor — all simplified with `instanceof T x` and slice(int,int) Local verification: mvn verify (full reactor) green; jvector-tests, jvector-native, jvector-examples test suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

async-profiler flamegraphs of HerdDB's indexing service showed float[] allocations traceable to ProductQuantization.getSubVector — one fresh float[dim/M] per (training_vector × subspace) during PQ codebook training. For 100k training vectors × M=8 that's ~800k small allocations and tens of MB of GC pressure at every index rebuild. Eliminate them with zero-copy views: - New VectorFloat.subview(int floatOffset, int floatLength) default method (materializes via VectorTypeSupport.createFloatVector + copyFrom as a fallback). - Zero-copy overrides: * ArrayVectorFloat -> ArraySliceVectorFloat (new) * BufferVectorFloat -> another BufferVectorFloat over the same buffer * MemorySegmentVectorFloat -> a MemorySegmentVectorFloat over segment.asSlice(...) - New ArraySliceVectorFloat — a VectorFloat<float[]> that references an underlying float[] at arrayOffset with its own logical length. Companion to the existing ArraySliceByteSequence. - SIMD dispatch awareness: * PanamaVectorUtilSupport — FloatVector.fromArray(SPEC, asv.get(), asv.arrayOffset() + offset) and the same for intoArray + the gather variant. SIMD performance is preserved. * NativeVectorUtilSupport — falls through to super for ArraySliceVectorFloat (already did for non-MemorySegment types). * DefaultVectorUtilSupport — the generic .get(i)-based fallback path already handles arbitrary VectorFloat<?>. - ArrayVectorFloat.copyFrom fast-pathed for ArraySliceVectorFloat source (System.arraycopy with adjusted offset). - ProductQuantization.getSubVector rewritten to vector.subview(offset, length). Tests: - VectorFloatSubviewTest (7 cases) — subview aliases source, nested subview, SIMD distance/dot/cosine equivalence with materialized copies. - PQTrainingAllocationTest — asserts that getSubVector returns a live view (mutation visible through subview) and that per- training-vector allocation during ProductQuantization.compute stays under a small bound (measured: 37 B/vector on dim=64 M=8 K=16, down from the ~448 B/vector the materialization path cost). - TestProductQuantization (9 existing cases) all green — codebooks produced through the view path match the previous materializing path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eolivelli and others added 2 commits April 19, 2026 01:15

eolivelli mentioned this pull request Apr 19, 2026

Upgrade jvector to 4.0.0-rc.9-herddb-SNAPSHOT (zero-copy ByteBuffer vectors) eolivelli/herddb#174

Closed

4 tasks

eolivelli closed this Apr 19, 2026

eolivelli reopened this Apr 19, 2026

eolivelli and others added 3 commits April 19, 2026 12:32

eolivelli merged commit cec00dc into main Apr 22, 2026
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-copy ByteBuffer-backed vectors, no float[] materialization#1

Zero-copy ByteBuffer-backed vectors, no float[] materialization#1
eolivelli merged 5 commits into
mainfrom
feature/bytebuffer-zero-copy-vectors

eolivelli commented Apr 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eolivelli commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The original problem

Why this fits jvector cleanly

Baseline: JDK 22+

Changes

Index build + search path (the original ByteBuffer zero-copy work)

PQ codebook training (follow-up)

Verification

HerdDB integration, before and after

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eolivelli commented Apr 18, 2026 •

edited

Loading