Skip to content

STF-383: Fix off-heap memory growth in FileMode.MEMORY#368

Open
oschwald wants to merge 3 commits intomainfrom
greg/stf-383
Open

STF-383: Fix off-heap memory growth in FileMode.MEMORY#368
oschwald wants to merge 3 commits intomainfrom
greg/stf-383

Conversation

@oschwald
Copy link
Copy Markdown
Member

@oschwald oschwald commented May 7, 2026

Summary

  • Avoid off-heap buffer cache growth in MEMORY modeFileChannel.read() into a heap buffer populates sun.nio.ch.Util.BufferCache, a per-thread cache that retains the largest direct buffer ever requested for the JVM lifetime. Under chunked MEMORY mode that meant chunkSize × loading-threads of off-heap pinned permanently. Switching to FileInputStream bypasses the cache. FileMode.MEMORY_MAPPED is unaffected (it uses FileChannel.map, not read).
  • Test MEMORY mode under chunkSizes matrix — adds Reader(File, FileMode, int) package-private ctor and a testMemoryMode parametrized over the existing chunkSizes() provider, so the multi-chunk + remainder branch in BufferHolder now has integration coverage.
  • Add mise configurationmise.toml / mise.lock for local Java + Maven version management, matching minfraud-api-java. CI uses actions/setup-java directly and is unaffected.
  • Preparing for 4.1.0 — version bump + changelog entry.

Note: the linked Linear ticket suggested Files.readAllBytes() as the fix, but that wouldn't work — Files.readAllBytes is implemented as Channels.newInputStream(FileChannel) and still routes through FileChannel.readBufferCache. Only java.io.FileInputStream (whose native read(byte[]) copies via SetByteArrayRegion without going through NIO) actually bypasses the cache. This was previously commented on STF-383.

Test plan

  • mvn test (199/199 pass, including 3 new chunked MEMORY-mode test runs)
  • mvn checkstyle:check clean
  • CI matrix passes (Java 17/21/24 × Linux/Windows/macOS)

🤖 Generated with Claude Code

oschwald and others added 3 commits May 7, 2026 22:37
`FileChannel.read(ByteBuffer)` with a heap-backed buffer causes the
JDK to substitute a temporary direct buffer obtained from a per-thread
cache (`sun.nio.ch.Util.BufferCache`). With chunk sizes near
`Integer.MAX_VALUE`, a single MEMORY-mode database load leaves up to
~2 GB of direct memory cached on the loading thread for that thread's
lifetime. Repeated loads on different threads compound the growth.

Open the database via `FileInputStream` and delegate to the existing
chunked `InputStream` read path. `FileInputStream.read(byte[])` is
implemented natively without going through the NIO buffer cache, so it
avoids the leak entirely. The MMAP path is unchanged, since
`FileChannel.map()` does not use the cache.

Note: `Files.readAllBytes()` and `Files.newInputStream()` would NOT
fix this, as both are backed by `Channels.newInputStream(FileChannel)`
internally and still trigger the cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manages local Java and Maven versions via mise, matching the setup
in minfraud-api-java. CI is unaffected since the GitHub Actions
workflows use setup-java directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing chunkSizes parametrized matrix in ReaderTest only routed
through Reader(File, int chunkSize), which hardcodes
FileMode.MEMORY_MAPPED. As a result the chunked file-MEMORY load path
in BufferHolder had no integration coverage — a remainder-chunk-sized-
wrong regression or an EOF-handling change could ship silently.

Add a package-private Reader(File, FileMode, int chunkSize) constructor
and a testMemoryMode(int chunkSize) test that mirrors test(int) but in
MEMORY mode. With chunk sizes 512/2048 against the test DBs (1285 and
2794 bytes), the multi-chunk + remainder branch is now exercised end
to end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with unbounded off-heap memory growth in FileMode.MEMORY by replacing FileChannel.read() with FileInputStream. This change prevents the JDK from caching large direct ByteBuffers in per-thread storage (sun.nio.ch.Util.BufferCache). The update also includes a version bump to 4.1.0, the addition of mise configuration for development environment management, and new tests for memory mode. Feedback was provided regarding potential integer overflow when calculating chunk counts for extremely large databases, which could lead to an incorrectly sized buffers array.

Comment on lines 39 to 42
var fullChunks = (int) (size / chunkSize);
var remainder = (int) (size % chunkSize);
var totalChunks = fullChunks + (remainder > 0 ? 1 : 0);
var buffers = new ByteBuffer[totalChunks];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The calculation of fullChunks and totalChunks uses an explicit cast from long to int without checking for overflow. If the database size is extremely large relative to the chunkSize (e.g., a multi-terabyte database with a small custom chunkSize), fullChunks could overflow, leading to an incorrectly sized buffers array and potential data corruption or IndexOutOfBoundsException during the reading loop. While MaxMind databases are typically within int range for chunk counts, it is safer to validate this or use Math.toIntExact.

                    var fullChunksLong = size / chunkSize;
                    var remainder = (int) (size % chunkSize);
                    var totalChunksLong = fullChunksLong + (remainder > 0 ? 1 : 0);
                    if (totalChunksLong > Integer.MAX_VALUE) {
                        throw new IOException("Database size exceeds supported chunk count for the given chunkSize");
                    }
                    var totalChunks = (int) totalChunksLong;
                    var fullChunks = (int) fullChunksLong;
                    var buffers = new ByteBuffer[totalChunks];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant