Skip to content

MLE-27509 No longer using ByteArrayDataSource on a multipart response#1919

Closed
rjrudin wants to merge 1 commit intodevelopfrom
feature/streaming-data-source
Closed

MLE-27509 No longer using ByteArrayDataSource on a multipart response#1919
rjrudin wants to merge 1 commit intodevelopfrom
feature/streaming-data-source

Conversation

@rjrudin
Copy link
Contributor

@rjrudin rjrudin commented Mar 10, 2026

This should be a no-op for existing tests, given that we don't have any tests that read a document large enough to cause an out-of-memory error. ArchUnit is used to verify that ByteArrayDataSource is not used except in the one place where it's allowable.

This should be a no-op for existing tests, given that we don't have any tests that read a document large enough to cause an out-of-memory error. ArchUnit is used to verify that ByteArrayDataSource is not used except in the one place where it's allowable.
Copilot AI review requested due to automatic review settings March 10, 2026 13:06
@github-actions
Copy link

Copyright Validation Results
Total: 4 | Passed: 3 | Failed: 0 | Skipped: 1 | at: 2026-03-10 13:07:14 UTC | commit: e298f04

⏭️ Skipped (Excluded) Files

  • marklogic-client-api/build.gradle

✅ Valid Files

  • marklogic-client-api/src/main/java/com/marklogic/client/impl/OkHttpServices.java
  • marklogic-client-api/src/main/java/com/marklogic/client/impl/okhttp/InputStreamDataSource.java
  • marklogic-client-api/src/test/java/com/marklogic/client/test/VerifyByteArrayDataSourceIsNotUsedTest.java

✅ All files have valid copyright headers!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent eager, in-memory buffering of multipart response parts by replacing ByteArrayDataSource usage with a streaming DataSource implementation, and adds an ArchUnit guardrail to prevent regressions.

Changes:

  • Replaced ByteArrayDataSource with a new InputStreamDataSource for multipart handling in OkHttpServices.
  • Added InputStreamDataSource (a read-only DataSource wrapper over an InputStream).
  • Added an ArchUnit test (and dependency) to enforce that ByteArrayDataSource is not used in production code (except PartIterator).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
marklogic-client-api/src/main/java/com/marklogic/client/impl/OkHttpServices.java Switches multipart DataSource creation to streaming to avoid buffering entire parts in memory.
marklogic-client-api/src/main/java/com/marklogic/client/impl/okhttp/InputStreamDataSource.java Introduces a streaming DataSource used by multipart parsing.
marklogic-client-api/src/test/java/com/marklogic/client/test/VerifyByteArrayDataSourceIsNotUsedTest.java Adds ArchUnit enforcement to prevent ByteArrayDataSource use outside an allowed exception.
marklogic-client-api/build.gradle Adds ArchUnit JUnit5 dependency for the new architecture test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +42
"surprising to any user using InputStreamHandle to access the content of a document, as that user " +
"is likely expecting to stream document from MarkLogic to some target. The InputStreamDataSource " +
"class was created to avoiding reading the contents of the document into an in-memory byte " +
"array, thus allowing for streaming reads to occur via a multipart response.");
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ArchUnit failure message has a couple grammatical issues that make it harder to read: "expecting to stream document" is missing an article, and "was created to avoiding" should be "was created to avoid".

Copilot uses AI. Check for mistakes.
Comment on lines +6060 to 6063
// Use custom DataSource to avoid reading document into memory. Allows a user to use an
// InputStreamHandle to fetch the content without being surprised that all the data is in memory already.
InputStreamDataSource dataSource = new InputStreamDataSource(responseBody.byteStream(), contentType.toString());
setMultipart(new MimeMultipart(dataSource));
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching multipart handling to InputStreamDataSource changes this code path from eagerly buffering the entire response body (ByteArrayDataSource) to potentially leaving the underlying OkHttp ResponseBody open until the multipart is fully consumed. MultipleCallResponseImpl does not override close(), and the various asStreamOf*/asArrayOf* methods don’t close per-part InputStreams, so it’s easy for callers to accidentally leak the response body/connection. Consider (1) overriding close() on MultipleCallResponseImpl to close the stored Response/ResponseBody, and (2) ensuring part streams are closed when converting them to bytes/strings/readers.

Copilot uses AI. Check for mistakes.
Comment on lines +5308 to 5311
// Use custom DataSource to avoid reading document into memory. Allows a user to use an
// InputStreamHandle to fetch the content without being surprised that all the data is in memory already.
InputStreamDataSource dataSource = new InputStreamDataSource(body.byteStream(), contentType);
return (T) new MimeMultipart(dataSource);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this MimeMultipart conversion, replacing ByteArrayDataSource with a streaming DataSource means the ResponseBody may no longer be fully consumed/closed during construction. If the returned MimeMultipart (or any BodyPart streams) are not fully consumed and closed by the caller, the underlying OkHttp response can remain open. Consider documenting/ensuring lifecycle management (e.g., requiring callers to close the originating CallResponse/ResponseBody, or eagerly parsing/closing when appropriate).

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +56
public String getName() {
return null;
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputStreamDataSource.getName() currently returns null. DataSource implementations typically return a non-null, stable name (useful for debugging/logging and some consumers may assume non-null). Consider returning a constant like the class name, or a name derived from contentType if appropriate.

Copilot uses AI. Check for mistakes.
@rjrudin
Copy link
Contributor Author

rjrudin commented Mar 10, 2026

Not sure this is the right path, a memory error still occurs due to the "getCount()" call, back to the drawing board.

@rjrudin rjrudin closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants