[TASK-431] Byte-sized cap instead of hard one for ArrowWriter by fresh-borzoni · Pull Request #443 · apache/fluss-rust

fresh-borzoni · 2026-03-19T00:42:41Z

Summary

closes #431

Arrow log batches were previously capped at a hard 256-record limit regardless of actual data size, meaning a batch of 256 tiny ints (~1KB) was treated the same as 256 large rows (~10MB). This replaces that fixed cap with byte-size-based fullness matching Java's ArrowWriter, so batches now fill to the configured writer_batch_size (default 2MB).

The fullness check uses a threshold-based optimization to avoid computing sizes on every append, it estimates how many records should fit, skips checks until that count is reached, then recalculates. Size estimation reads buffer lengths directly from Arrow builders (O(num_columns), zero allocation), with a pre-computed IPC overhead constant that captures metadata and body framing for the schema.

Compression is accounted for via an adaptive ArrowCompressionRatioEstimator shared across batches for the same table. It starts at 1.0 (assume no compression) and adjusts asymmetrically after each batch is serialized, quick to increase when compression worsens, slow to decrease when it improves. This matches Java's conservative approach to avoid underestimating batch sizes.

Also aligns VARIABLE_WIDTH_AVG_BYTES (64 -> 8) and INITIAL_ROW_CAPACITY (256 -> 1024) with Java Arrow defaults.

Writer pooling (ArrowWriterPool) and DynamicWriteBatchSizeEstimator are out of scope -> TODOs added.

fresh-borzoni · 2026-03-19T00:44:33Z

@luoyuxia @charlesdong1991 @leekeiabstraction
PTAL 🙏

fresh-borzoni · 2026-03-19T11:33:58Z

Created issues for TODOs: #444 #445

charlesdong1991

Very nice PR! leave some minor comments/questions

crates/fluss/src/record/arrow.rs

crates/fluss/src/client/write/batch.rs

crates/fluss/src/record/arrow.rs

fresh-borzoni · 2026-03-20T00:49:48Z

@charlesdong1991 Ty for the review, addressed comments
PTAL 🙏

[TASK-431] Byte-sized cap instead of hard one for ArrowWriter

86639c7

charlesdong1991 reviewed Mar 19, 2026

View reviewed changes

Address feedback

dc7f07b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TASK-431] Byte-sized cap instead of hard one for ArrowWriter#443

[TASK-431] Byte-sized cap instead of hard one for ArrowWriter#443
fresh-borzoni wants to merge 2 commits intoapache:mainfrom
fresh-borzoni:byte-sized-cap

fresh-borzoni commented Mar 19, 2026

Uh oh!

fresh-borzoni commented Mar 19, 2026

Uh oh!

fresh-borzoni commented Mar 19, 2026

Uh oh!

charlesdong1991 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fresh-borzoni commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fresh-borzoni commented Mar 19, 2026

Summary

Uh oh!

fresh-borzoni commented Mar 19, 2026

Uh oh!

fresh-borzoni commented Mar 19, 2026

Uh oh!

charlesdong1991 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fresh-borzoni commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants