Cut SparseIntMap heap overhead and add static graph-overhead estimator (#3)#4
Merged
Merged
Conversation
Issue #3: HNSW graph build OOMs on large datasets (100M+ vectors) — the ConcurrentNeighborMap's upper-layer SparseIntMap pays a boxed Integer per key (~16 B) and a ConcurrentHashMap.Node per entry (~32 B), accounting for ~3.5 GiB of heap per IndexingService pod at 50M vectors. Three changes: 1. Replace SparseIntMap's ConcurrentHashMap<Integer,T> with 32 striped shards of Agrona's primitive Int2ObjectHashMap, each guarded by its own StampedLock. Optimistic-read fast path keeps reads competitive with CHM's volatile-read; writes are serialised per-shard. Per-entry footprint drops from ~136 B to ~42 B (69% reduction, measured by the new SparseIntMapMemoryBenchmark). 2. Expose OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float overflowRatio) — a static helper external indexing services can use to pre-size their memory budget (e.g. HerdDB's PROPERTY_MEMORY_VECTOR_LIMIT) without a built graph instance. 3. Promote forEachKey(IntConsumer) to the IntMap interface so iteration over an upper layer no longer requires a SparseIntMap downcast in OnHeapGraphIndex.nodeStream(). Default delegates to forEach for backward compatibility; DenseIntMap and SparseIntMap both override for zero allocation. Tests: - TestIntMap extended with forEach/forEachKey/keysStream/contract cases (parameterised over both impls). - New TestIntMapConcurrency: linearizability of compareAndPut on a hot key, forEachKey-during-mutation, reentrant forEachKey, size accuracy under concurrent insert/remove, happens-before via successful CAS, stale-expected handling, 16-thread mixed-workload stress. - New TestSparseIntMapShards: white-box check that the avalanche-mix spreads sequential and random keys evenly across shards. - New TestOnHeapGraphIndexEstimator: validates the static estimator matches the instance ramBytesUsedOneNode, scales monotonically with M and overflow. Benchmarks (mirror the existing LegacyDenseIntMap pattern): - LegacySparseIntMap kept in benchmarks-jmh for apples-to-apples comparison against the legacy CHM-backed impl. - SparseIntMapConcurrentBenchmark covers getHot/casChurn/forEachKey/ insertSequential/mixed90r10w at 1 and 8 threads, dense and sparse keys, both impls. - SparseIntMapMemoryBenchmark surfaces per-entry byte cost via JMH AuxCounters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #3 — the issue reports that
ConcurrentNeighborMap's upper-layerSparseIntMappays a boxedIntegerper key plus aConcurrentHashMap.Nodeper entry, accounting for ~3.5 GiB of heap per HerdDB IndexingServicepod at 50M vectors and contributing to OOM crashes during 100M-vector ingest. This PR applies
all three suggestions from the issue.
Change A — replace
SparseIntMapinternalsConcurrentHashMap<Integer, T>→ 32 striped shards of Agrona's primitiveInt2ObjectHashMap,each guarded by a
StampedLock. Optimistic reads keep thegetpath competitive with CHM'svolatile-read; writes are serialised per shard. Per-entry footprint drops from ~136 B to
~42 B (69 % reduction) measured by the new
SparseIntMapMemoryBenchmark.Change B — static memory estimator
New
OnHeapGraphIndex.estimatedBytesPerNode(int maxDegree, float overflowRatio). Externalindexing services (e.g. HerdDB's
PROPERTY_MEMORY_VECTOR_LIMIT) can now pre-size theirmemory budget to include graph overhead without needing a built graph instance.
Change C — lock the no-boxing guarantee at the interface
Suggestion #3 in the issue is based on a misreading of the codebase (Neighbors already uses
primitive
int[], not boxed Maps). To apply the spirit: promoteforEachKey(IntConsumer)to the
IntMapinterface and remove the(SparseIntMap<Neighbors>)downcast atOnHeapGraphIndex.nodeStream().DenseIntMapandSparseIntMapboth override for zeroallocation; the default delegates to
forEachfor backward compatibility with the legacybenchmark impls.
Benchmarks (apples-to-apples vs. legacy CHM impl)
LegacySparseIntMapis kept in thebenchmarks-jmhmodule — same pattern as the existingLegacyDenseIntMap— so old vs. new can be measured in the same JVM.SparseIntMapConcurrentBenchmark, dense keys, totalKeys=100k, 8 threads (smoke run, 2 iterations):getHot1getHot8casChurn1casChurn8forEachKeySparseIntMapMemoryBenchmark, totalKeys=100k:Test plan
mvn -pl jvector-tests test— all 257 tests pass (2 pre-existing skips)TestIntMapextended with forEach/forEachKey/keysStream/CAS-contract cases (parameterised over both impls)TestIntMapConcurrencycovering CAS linearizability on a hot key, forEachKey-during-mutation, reentrant forEachKey, size-under-contention, happens-before across successful CAS, stale-expected, 16-thread mixed stressTestSparseIntMapShardsvalidates the avalanche-mix shard distributionTestOnHeapGraphIndexEstimatorcross-checks the static estimator against the instance methodmvn -pl benchmarks-jmh package— new benchmarks compile and register correctly🤖 Generated with Claude Code