Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3) by eolivelli · Pull Request #4 · eolivelli/jvector

eolivelli · 2026-04-23T05:12:49Z

Summary

Addresses #3 — the issue reports that
ConcurrentNeighborMap's upper-layer SparseIntMap pays a boxed Integer per key plus a
ConcurrentHashMap.Node per entry, accounting for ~3.5 GiB of heap per HerdDB IndexingService
pod at 50M vectors and contributing to OOM crashes during 100M-vector ingest. This PR applies
all three suggestions from the issue.

Change A — replace `SparseIntMap` internals

ConcurrentHashMap<Integer, T> → 32 striped shards of Agrona's primitive Int2ObjectHashMap,
each guarded by a StampedLock. Optimistic reads keep the get path competitive with CHM's
volatile-read; writes are serialised per shard. Per-entry footprint drops from ~136 B to
~42 B (69 % reduction) measured by the new SparseIntMapMemoryBenchmark.

Change B — static memory estimator

New OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float overflowRatio). External
indexing services (e.g. HerdDB's PROPERTY_MEMORY_VECTOR_LIMIT) can now pre-size their
memory budget to include graph overhead without needing a built graph instance.

Change C — lock the no-boxing guarantee at the interface

Suggestion #3 in the issue is based on a misreading of the codebase (Neighbors already uses
primitive int[], not boxed Maps). To apply the spirit: promote forEachKey(IntConsumer)
to the IntMap interface and remove the (SparseIntMap<Neighbors>) downcast at
OnHeapGraphIndex.nodeStream(). DenseIntMap and SparseIntMap both override for zero
allocation; the default delegates to forEach for backward compatibility with the legacy
benchmark impls.

Benchmarks (apples-to-apples vs. legacy CHM impl)

LegacySparseIntMap is kept in the benchmarks-jmh module — same pattern as the existing
LegacyDenseIntMap — so old vs. new can be measured in the same JVM.

SparseIntMapConcurrentBenchmark, dense keys, totalKeys=100k, 8 threads (smoke run, 2 iterations):

Benchmark	striped (new)	legacy (CHM)	Notes
`getHot1`	45.1 M ops/s	36.0 M ops/s	striped wins via StampedLock optimistic
`getHot8`	279.6 M ops/s	313.4 M ops/s	~10 % off CHM
`casChurn1`	17.4 M ops/s	6.1 M ops/s	striped wins (single-shard fast path)
`casChurn8`	53.1 M ops/s	63.7 M ops/s	~17 % off CHM
`forEachKey`	1382 ops/s	3395 ops/s	snapshot cost; called once per layer

SparseIntMapMemoryBenchmark, totalKeys=100k:

Impl	bytesUsed	bytesPerEntry
striped	4 267 696	42
legacy	13 797 408	136

Test plan

mvn -pl jvector-tests test — all 257 tests pass (2 pre-existing skips)
TestIntMap extended with forEach/forEachKey/keysStream/CAS-contract cases (parameterised over both impls)
new TestIntMapConcurrency covering CAS linearizability on a hot key, forEachKey-during-mutation, reentrant forEachKey, size-under-contention, happens-before across successful CAS, stale-expected, 16-thread mixed stress
new TestSparseIntMapShards validates the avalanche-mix shard distribution
new TestOnHeapGraphIndexEstimator cross-checks the static estimator against the instance method
mvn -pl benchmarks-jmh package — new benchmarks compile and register correctly

🤖 Generated with Claude Code

Issue #3: HNSW graph build OOMs on large datasets (100M+ vectors) — the ConcurrentNeighborMap's upper-layer SparseIntMap pays a boxed Integer per key (~16 B) and a ConcurrentHashMap.Node per entry (~32 B), accounting for ~3.5 GiB of heap per IndexingService pod at 50M vectors. Three changes: 1. Replace SparseIntMap's ConcurrentHashMap<Integer,T> with 32 striped shards of Agrona's primitive Int2ObjectHashMap, each guarded by its own StampedLock. Optimistic-read fast path keeps reads competitive with CHM's volatile-read; writes are serialised per-shard. Per-entry footprint drops from ~136 B to ~42 B (69% reduction, measured by the new SparseIntMapMemoryBenchmark). 2. Expose OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float overflowRatio) — a static helper external indexing services can use to pre-size their memory budget (e.g. HerdDB's PROPERTY_MEMORY_VECTOR_LIMIT) without a built graph instance. 3. Promote forEachKey(IntConsumer) to the IntMap interface so iteration over an upper layer no longer requires a SparseIntMap downcast in OnHeapGraphIndex.nodeStream(). Default delegates to forEach for backward compatibility; DenseIntMap and SparseIntMap both override for zero allocation. Tests: - TestIntMap extended with forEach/forEachKey/keysStream/contract cases (parameterised over both impls). - New TestIntMapConcurrency: linearizability of compareAndPut on a hot key, forEachKey-during-mutation, reentrant forEachKey, size accuracy under concurrent insert/remove, happens-before via successful CAS, stale-expected handling, 16-thread mixed-workload stress. - New TestSparseIntMapShards: white-box check that the avalanche-mix spreads sequential and random keys evenly across shards. - New TestOnHeapGraphIndexEstimator: validates the static estimator matches the instance ramBytesUsedOneNode, scales monotonically with M and overflow. Benchmarks (mirror the existing LegacyDenseIntMap pattern): - LegacySparseIntMap kept in benchmarks-jmh for apples-to-apples comparison against the legacy CHM-backed impl. - SparseIntMapConcurrentBenchmark covers getHot/casChurn/forEachKey/ insertSequential/mixed90r10w at 1 and 8 threads, dense and sparse keys, both impls. - SparseIntMapMemoryBenchmark surfaces per-entry byte cost via JMH AuxCounters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eolivelli mentioned this pull request Apr 23, 2026

Pickup jvector improvements on locking and memory accounting eolivelli/herddb#232

Closed

eolivelli merged commit 7e788a6 into main Apr 23, 2026
3 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3)#4

Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3)#4
eolivelli merged 1 commit into
mainfrom
fix/sparse-int-map-overhead-issue-3

eolivelli commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eolivelli commented Apr 23, 2026

Summary

Change A — replace SparseIntMap internals

Change B — static memory estimator

Change C — lock the no-boxing guarantee at the interface

Benchmarks (apples-to-apples vs. legacy CHM impl)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Change A — replace `SparseIntMap` internals