MLE-27509 No longer using ByteArrayDataSource on a multipart response#1919
MLE-27509 No longer using ByteArrayDataSource on a multipart response#1919
Conversation
This should be a no-op for existing tests, given that we don't have any tests that read a document large enough to cause an out-of-memory error. ArchUnit is used to verify that ByteArrayDataSource is not used except in the one place where it's allowable.
|
Copyright Validation Results ⏭️ Skipped (Excluded) Files
✅ Valid Files
✅ All files have valid copyright headers! |
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent eager, in-memory buffering of multipart response parts by replacing ByteArrayDataSource usage with a streaming DataSource implementation, and adds an ArchUnit guardrail to prevent regressions.
Changes:
- Replaced
ByteArrayDataSourcewith a newInputStreamDataSourcefor multipart handling inOkHttpServices. - Added
InputStreamDataSource(a read-onlyDataSourcewrapper over anInputStream). - Added an ArchUnit test (and dependency) to enforce that
ByteArrayDataSourceis not used in production code (exceptPartIterator).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
marklogic-client-api/src/main/java/com/marklogic/client/impl/OkHttpServices.java |
Switches multipart DataSource creation to streaming to avoid buffering entire parts in memory. |
marklogic-client-api/src/main/java/com/marklogic/client/impl/okhttp/InputStreamDataSource.java |
Introduces a streaming DataSource used by multipart parsing. |
marklogic-client-api/src/test/java/com/marklogic/client/test/VerifyByteArrayDataSourceIsNotUsedTest.java |
Adds ArchUnit enforcement to prevent ByteArrayDataSource use outside an allowed exception. |
marklogic-client-api/build.gradle |
Adds ArchUnit JUnit5 dependency for the new architecture test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "surprising to any user using InputStreamHandle to access the content of a document, as that user " + | ||
| "is likely expecting to stream document from MarkLogic to some target. The InputStreamDataSource " + | ||
| "class was created to avoiding reading the contents of the document into an in-memory byte " + | ||
| "array, thus allowing for streaming reads to occur via a multipart response."); |
There was a problem hiding this comment.
The ArchUnit failure message has a couple grammatical issues that make it harder to read: "expecting to stream document" is missing an article, and "was created to avoiding" should be "was created to avoid".
| // Use custom DataSource to avoid reading document into memory. Allows a user to use an | ||
| // InputStreamHandle to fetch the content without being surprised that all the data is in memory already. | ||
| InputStreamDataSource dataSource = new InputStreamDataSource(responseBody.byteStream(), contentType.toString()); | ||
| setMultipart(new MimeMultipart(dataSource)); |
There was a problem hiding this comment.
Switching multipart handling to InputStreamDataSource changes this code path from eagerly buffering the entire response body (ByteArrayDataSource) to potentially leaving the underlying OkHttp ResponseBody open until the multipart is fully consumed. MultipleCallResponseImpl does not override close(), and the various asStreamOf*/asArrayOf* methods don’t close per-part InputStreams, so it’s easy for callers to accidentally leak the response body/connection. Consider (1) overriding close() on MultipleCallResponseImpl to close the stored Response/ResponseBody, and (2) ensuring part streams are closed when converting them to bytes/strings/readers.
| // Use custom DataSource to avoid reading document into memory. Allows a user to use an | ||
| // InputStreamHandle to fetch the content without being surprised that all the data is in memory already. | ||
| InputStreamDataSource dataSource = new InputStreamDataSource(body.byteStream(), contentType); | ||
| return (T) new MimeMultipart(dataSource); |
There was a problem hiding this comment.
In this MimeMultipart conversion, replacing ByteArrayDataSource with a streaming DataSource means the ResponseBody may no longer be fully consumed/closed during construction. If the returned MimeMultipart (or any BodyPart streams) are not fully consumed and closed by the caller, the underlying OkHttp response can remain open. Consider documenting/ensuring lifecycle management (e.g., requiring callers to close the originating CallResponse/ResponseBody, or eagerly parsing/closing when appropriate).
| public String getName() { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
InputStreamDataSource.getName() currently returns null. DataSource implementations typically return a non-null, stable name (useful for debugging/logging and some consumers may assume non-null). Consider returning a constant like the class name, or a name derived from contentType if appropriate.
|
Not sure this is the right path, a memory error still occurs due to the "getCount()" call, back to the drawing board. |
This should be a no-op for existing tests, given that we don't have any tests that read a document large enough to cause an out-of-memory error. ArchUnit is used to verify that ByteArrayDataSource is not used except in the one place where it's allowable.