Skip to content

Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3)#4

Merged
eolivelli merged 1 commit into
mainfrom
fix/sparse-int-map-overhead-issue-3
Apr 23, 2026
Merged

Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3)#4
eolivelli merged 1 commit into
mainfrom
fix/sparse-int-map-overhead-issue-3

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Summary

Addresses #3 — the issue reports that
ConcurrentNeighborMap's upper-layer SparseIntMap pays a boxed Integer per key plus a
ConcurrentHashMap.Node per entry, accounting for ~3.5 GiB of heap per HerdDB IndexingService
pod at 50M vectors and contributing to OOM crashes during 100M-vector ingest. This PR applies
all three suggestions from the issue.

Change A — replace SparseIntMap internals

ConcurrentHashMap<Integer, T> → 32 striped shards of Agrona's primitive Int2ObjectHashMap,
each guarded by a StampedLock. Optimistic reads keep the get path competitive with CHM's
volatile-read; writes are serialised per shard. Per-entry footprint drops from ~136 B to
~42 B (69 % reduction)
measured by the new SparseIntMapMemoryBenchmark.

Change B — static memory estimator

New OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float overflowRatio). External
indexing services (e.g. HerdDB's PROPERTY_MEMORY_VECTOR_LIMIT) can now pre-size their
memory budget to include graph overhead without needing a built graph instance.

Change C — lock the no-boxing guarantee at the interface

Suggestion #3 in the issue is based on a misreading of the codebase (Neighbors already uses
primitive int[], not boxed Maps). To apply the spirit: promote forEachKey(IntConsumer)
to the IntMap interface and remove the (SparseIntMap<Neighbors>) downcast at
OnHeapGraphIndex.nodeStream(). DenseIntMap and SparseIntMap both override for zero
allocation; the default delegates to forEach for backward compatibility with the legacy
benchmark impls.

Benchmarks (apples-to-apples vs. legacy CHM impl)

LegacySparseIntMap is kept in the benchmarks-jmh module — same pattern as the existing
LegacyDenseIntMap — so old vs. new can be measured in the same JVM.

SparseIntMapConcurrentBenchmark, dense keys, totalKeys=100k, 8 threads (smoke run, 2 iterations):

Benchmark striped (new) legacy (CHM) Notes
getHot1 45.1 M ops/s 36.0 M ops/s striped wins via StampedLock optimistic
getHot8 279.6 M ops/s 313.4 M ops/s ~10 % off CHM
casChurn1 17.4 M ops/s 6.1 M ops/s striped wins (single-shard fast path)
casChurn8 53.1 M ops/s 63.7 M ops/s ~17 % off CHM
forEachKey 1382 ops/s 3395 ops/s snapshot cost; called once per layer

SparseIntMapMemoryBenchmark, totalKeys=100k:

Impl bytesUsed bytesPerEntry
striped 4 267 696 42
legacy 13 797 408 136

Test plan

  • mvn -pl jvector-tests test — all 257 tests pass (2 pre-existing skips)
  • TestIntMap extended with forEach/forEachKey/keysStream/CAS-contract cases (parameterised over both impls)
  • new TestIntMapConcurrency covering CAS linearizability on a hot key, forEachKey-during-mutation, reentrant forEachKey, size-under-contention, happens-before across successful CAS, stale-expected, 16-thread mixed stress
  • new TestSparseIntMapShards validates the avalanche-mix shard distribution
  • new TestOnHeapGraphIndexEstimator cross-checks the static estimator against the instance method
  • mvn -pl benchmarks-jmh package — new benchmarks compile and register correctly

🤖 Generated with Claude Code

Issue #3: HNSW graph build OOMs on large datasets (100M+ vectors) — the
ConcurrentNeighborMap's upper-layer SparseIntMap pays a boxed Integer per
key (~16 B) and a ConcurrentHashMap.Node per entry (~32 B), accounting for
~3.5 GiB of heap per IndexingService pod at 50M vectors.

Three changes:

1. Replace SparseIntMap's ConcurrentHashMap<Integer,T> with 32 striped
   shards of Agrona's primitive Int2ObjectHashMap, each guarded by its own
   StampedLock. Optimistic-read fast path keeps reads competitive with
   CHM's volatile-read; writes are serialised per-shard. Per-entry
   footprint drops from ~136 B to ~42 B (69% reduction, measured by the
   new SparseIntMapMemoryBenchmark).

2. Expose OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float
   overflowRatio) — a static helper external indexing services can use to
   pre-size their memory budget (e.g. HerdDB's PROPERTY_MEMORY_VECTOR_LIMIT)
   without a built graph instance.

3. Promote forEachKey(IntConsumer) to the IntMap interface so iteration
   over an upper layer no longer requires a SparseIntMap downcast in
   OnHeapGraphIndex.nodeStream(). Default delegates to forEach for
   backward compatibility; DenseIntMap and SparseIntMap both override for
   zero allocation.

Tests:
 - TestIntMap extended with forEach/forEachKey/keysStream/contract cases
   (parameterised over both impls).
 - New TestIntMapConcurrency: linearizability of compareAndPut on a hot
   key, forEachKey-during-mutation, reentrant forEachKey, size accuracy
   under concurrent insert/remove, happens-before via successful CAS,
   stale-expected handling, 16-thread mixed-workload stress.
 - New TestSparseIntMapShards: white-box check that the avalanche-mix
   spreads sequential and random keys evenly across shards.
 - New TestOnHeapGraphIndexEstimator: validates the static estimator
   matches the instance ramBytesUsedOneNode, scales monotonically with
   M and overflow.

Benchmarks (mirror the existing LegacyDenseIntMap pattern):
 - LegacySparseIntMap kept in benchmarks-jmh for apples-to-apples
   comparison against the legacy CHM-backed impl.
 - SparseIntMapConcurrentBenchmark covers getHot/casChurn/forEachKey/
   insertSequential/mixed90r10w at 1 and 8 threads, dense and sparse
   keys, both impls.
 - SparseIntMapMemoryBenchmark surfaces per-entry byte cost via JMH
   AuxCounters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eolivelli eolivelli merged commit 7e788a6 into main Apr 23, 2026
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant