feat: Add RecordBatchLogReader for bounded log reading by charlesdong1991 · Pull Request #446 · apache/fluss-rust

charlesdong1991 · 2026-03-19T21:18:23Z

Purpose

Move query_latest_offsets and poll-until-offsets logic from Python binding into Rust core as RecordBatchLogReader.

This enables both Python and C++ bindings to share the same bounded-read implementation.

Linked issue: close #406

Tests

Tests are passed locally

API and Format

Documentation

charlesdong1991 · 2026-03-19T21:24:31Z

crates/fluss/src/client/table/scanner.rs

+    arrow_schema: SchemaRef,
+    /// Serializes overlapping `poll` / `poll_batches` across clones sharing this `Arc`.
+    ///
+    /// TODO: Consider an API that consumes


it is cheap to clone for this record batch log scanner, but all clones will share one Arc , so two overlapping poll is not supported under current usage model, i add a client-side guard with poll_session so overlapping calls can fail fast.

Not sure what you think, i am happy to create a new issue and do a follow-up on that, or if you prefer i can have a stricter API in this PR?

Let's do it properly in this PR. The reader should take ownership of the scanner (move, not clone). That way the compiler prevents concurrent polls - no mutex needed.

fresh-borzoni

@charlesdong1991 Ty for the PR. Left comments, PTAL

fresh-borzoni · 2026-03-20T01:13:30Z

crates/fluss/src/client/table/reader.rs

+    /// Each call may internally poll multiple batches from the scanner,
+    /// buffer them, and return one at a time. Batches that cross a stopping
+    /// offset boundary are sliced to exclude records at or beyond the stop point.
+    pub async fn next_batch(&mut self) -> Result<Option<RecordBatch>> {


next_batch() returns RecordBatch discarding bucket/offset metadata that was in use before with ScanRecord

fresh-borzoni · 2026-03-20T01:25:50Z

bindings/python/src/table.rs

    /// The projected row type to use for record-based scanning
    projected_row_type: fcore::metadata::RowType,
-    /// Cache for partition_id -> partition_name mapping (avoids repeated list_partition_infos calls)
-    partition_name_cache: std::sync::RwLock<Option<HashMap<i64, String>>>,


Why have we removed this?

fresh-borzoni · 2026-03-20T01:30:10Z

bindings/python/src/lib.rs

    m.add_class::<Lookuper>()?;
    m.add_class::<Schema>()?;
    m.add_class::<LogScanner>()?;
+    m.add_class::<PyRecordBatchLogReader>()?;


isn't it internal iterator?

fresh-borzoni · 2026-03-20T01:49:29Z

bindings/python/src/table.rs

+
+    fn __next__(&mut self, py: Python) -> PyResult<Option<Py<PyAny>>> {
+        let batch = py
+            .detach(|| TOKIO_RUNTIME.block_on(self.reader.next_batch()))


PyRecordBatchLogReader holds async RecordBatchLogReader and calls TOKIO_RUNTIME.block_on() directly, duplicating what SyncRecordBatchLogReader already does.
Per the design spec, Python should use the shared sync adapter wrapped in py.detach().

fresh-borzoni · 2026-03-20T01:52:36Z

crates/fluss/src/client/table/reader.rs

+        buffer.push_back(batch);
+
+        if last_offset >= stop_at - 1 {
+            stopping_offsets.remove(&bucket);


Shall we unsibscribe as well?

fresh-borzoni · 2026-03-20T02:20:45Z

crates/fluss/src/client/table/reader.rs

+            });
+        }
+
+        let stopping_offsets = query_latest_offsets(admin, &scanner, &subscribed).await?;


Buckets where subscribed offset >= latest offset stay in stopping_offsets forever, next_batch() loops indefinitely on empty polls

fresh-borzoni · 2026-03-21T01:07:10Z

crates/fluss/src/client/table/scanner.rs

+    arrow_schema: SchemaRef,
+    /// Serializes overlapping `poll` / `poll_batches` across clones sharing this `Arc`.
+    ///
+    /// TODO: Consider an API that consumes


Let's do it properly in this PR. The reader should take ownership of the scanner (move, not clone). That way the compiler prevents concurrent polls - no mutex needed.

Add RecordBatchLogReader for bounded log reading

1502702

charlesdong1991 commented Mar 19, 2026

View reviewed changes

fresh-borzoni reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add RecordBatchLogReader for bounded log reading#446

feat: Add RecordBatchLogReader for bounded log reading#446
charlesdong1991 wants to merge 1 commit intoapache:mainfrom
charlesdong1991:arrow-batch-reader

charlesdong1991 commented Mar 19, 2026

Uh oh!

charlesdong1991 Mar 19, 2026 •

edited

Loading

Uh oh!

fresh-borzoni Mar 21, 2026

Uh oh!

fresh-borzoni left a comment

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 20, 2026

Uh oh!

fresh-borzoni Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charlesdong1991 commented Mar 19, 2026

Purpose

Tests

API and Format

Documentation

Uh oh!

charlesdong1991 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charlesdong1991 Mar 19, 2026 •

edited

Loading