Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/ipc/message.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,20 @@ import { ArrowJSON, ArrowJSONLike, ITERATOR_DONE, FileHandle } from '../io/inter
/** @ignore */ const invalidMessageMetadata = (expected: number, actual: number) => `Expected to read ${expected} metadata bytes, but only read ${actual}.`;
/** @ignore */ const invalidMessageBodyLength = (expected: number, actual: number) => `Expected to read ${expected} bytes for message body, but only read ${actual}.`;

/**
* Maximum allowed metadata length (256 MB). This is a safeguard against corrupted
* files that could cause the reader to hang or consume excessive memory.
* @ignore
*/
const MAX_METADATA_LENGTH = 256 * 1024 * 1024;

/**
* Maximum allowed message body length (2 GB). This is a safeguard against corrupted
* files that could cause the reader to hang or consume excessive memory.
* @ignore
*/
const MAX_BODY_LENGTH = 2 * 1024 * 1024 * 1024;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these are wrong, Arrow tables can be larger than this. It'd be better to figure out why your IPC stream is corrupted when it's written.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these are wrong, Arrow tables can be larger than this. It'd be better to figure out why your IPC stream is corrupted when it's written.

Thank you for the feedback! You're right - I've reverted the size limits as they were too restrictive for valid Arrow files.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add a timeout here? A function that never returns and has no way to be interrupted isn't something we can use in production.

Copy link
Copy Markdown
Contributor

@trxcllnt trxcllnt Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add a timeout in Arrow since we're just processing a stream the library user gives us. We don't have enough context to know why a read call might take a certain amount of time. Is the file on a slow NFS drive? Is it a fetch() to a dynamic resource and the the next batch hasn't been computed yet? Is the user's connection just slow?

That said, you can add a timeout before/after Arrow's transform stream to either terminate/cancel based on your users' expected behavior.

/** @ignore */
export class MessageReader implements IterableIterator<Message> {
protected source: ByteStream;
Expand Down Expand Up @@ -59,6 +73,9 @@ export class MessageReader implements IterableIterator<Message> {
}
public readMessageBody(bodyLength: number): Uint8Array {
if (bodyLength <= 0) { return new Uint8Array(0); }
if (bodyLength > MAX_BODY_LENGTH) {
throw new Error(`Message body length ${bodyLength} exceeds maximum allowed size of ${MAX_BODY_LENGTH} bytes. The file may be corrupted.`);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic for MAX_BODY_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when bodyLength exceeds MAX_BODY_LENGTH (2GB), and that valid body lengths just under the limit still work correctly. This is especially important given that this fix addresses a critical hanging issue reported in issue #315.

Copilot uses AI. Check for mistakes.
const buf = toUint8Array(this.source.read(bodyLength));
if (buf.byteLength < bodyLength) {
throw new Error(invalidMessageBodyLength(bodyLength, buf.byteLength));
Expand All @@ -81,6 +98,10 @@ export class MessageReader implements IterableIterator<Message> {
const buf = this.source.read(PADDING);
const bb = buf && new ByteBuffer(buf);
const len = bb?.readInt32(0) || 0;
// Guard against corrupted files with unreasonable metadata lengths
if (len < 0 || len > MAX_METADATA_LENGTH) {
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation condition len < 0 will break the continuation indicator logic used for Arrow IPC format v0.15+. According to the comment on lines 56-58 (and 132-134 for async), a value of -1 is a valid continuation indicator per ARROW-6313. This validation should exclude -1 from being treated as an error. Consider changing the condition to len < -1 or (len < 0 && len !== -1).

Suggested change
if (len < 0 || len > MAX_METADATA_LENGTH) {
if (len < -1 || len > MAX_METADATA_LENGTH) {

Copilot uses AI. Check for mistakes.
throw new Error(`Invalid metadata length ${len}. The file may be corrupted.`);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic for MAX_METADATA_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when metadata length exceeds MAX_METADATA_LENGTH (256MB) or is negative (excluding the valid -1 continuation indicator), and that valid metadata lengths still work correctly.

Copilot uses AI. Check for mistakes.
return { done: len === 0, value: len };
}
protected readMetadata(metadataLength: number): IteratorResult<Message> {
Expand Down Expand Up @@ -128,6 +149,9 @@ export class AsyncMessageReader implements AsyncIterableIterator<Message> {
}
public async readMessageBody(bodyLength: number): Promise<Uint8Array> {
if (bodyLength <= 0) { return new Uint8Array(0); }
if (bodyLength > MAX_BODY_LENGTH) {
throw new Error(`Message body length ${bodyLength} exceeds maximum allowed size of ${MAX_BODY_LENGTH} bytes. The file may be corrupted.`);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic for MAX_BODY_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when bodyLength exceeds MAX_BODY_LENGTH (2GB), and that valid body lengths just under the limit still work correctly.

Copilot uses AI. Check for mistakes.
const buf = toUint8Array(await this.source.read(bodyLength));
if (buf.byteLength < bodyLength) {
throw new Error(invalidMessageBodyLength(bodyLength, buf.byteLength));
Expand All @@ -150,6 +174,10 @@ export class AsyncMessageReader implements AsyncIterableIterator<Message> {
const buf = await this.source.read(PADDING);
const bb = buf && new ByteBuffer(buf);
const len = bb?.readInt32(0) || 0;
// Guard against corrupted files with unreasonable metadata lengths
if (len < 0 || len > MAX_METADATA_LENGTH) {
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation condition len < 0 will break the continuation indicator logic used for Arrow IPC format v0.15+. According to the comment on lines 132-134, a value of -1 is a valid continuation indicator per ARROW-6313. This validation should exclude -1 from being treated as an error. Consider changing the condition to len < -1 or (len < 0 && len !== -1).

Suggested change
if (len < 0 || len > MAX_METADATA_LENGTH) {
if ((len < 0 && len !== -1) || len > MAX_METADATA_LENGTH) {

Copilot uses AI. Check for mistakes.
throw new Error(`Invalid metadata length ${len}. The file may be corrupted.`);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic for MAX_METADATA_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when metadata length exceeds MAX_METADATA_LENGTH (256MB) or is negative (excluding the valid -1 continuation indicator), and that valid metadata lengths still work correctly.

Copilot uses AI. Check for mistakes.
return { done: len === 0, value: len };
}
protected async readMetadata(metadataLength: number): Promise<IteratorResult<Message>> {
Expand Down
Loading