Skip to content

feat(cubestore): Support Arrow IPC response format#10819

Merged
ovr merged 5 commits into
masterfrom
feat/cubestore-arrow-ipc
May 6, 2026
Merged

feat(cubestore): Support Arrow IPC response format#10819
ovr merged 5 commits into
masterfrom
feat/cubestore-arrow-ipc

Conversation

@ovr
Copy link
Copy Markdown
Member

@ovr ovr commented May 5, 2026

Clients can opt in via HttpQuery.response_format = Arrow to receive results as a binary Arrow IPC stream wrapped in a new HttpQueryResult flatbuffer variant, instead of the legacy HttpResultSet where every cell is stringified.

Default stays Legacy, so existing clients that don't set the flag keep getting HttpResultSet — no behavior change for them.

@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels May 5, 2026
Comment thread rust/cube/cubeshared/src/codegen/http_message.fbs
Comment thread rust/cubestore/cubestore/src/http/mod.rs Outdated
Comment thread rust/cubestore/cubestore/src/store/mod.rs Outdated
Comment thread rust/cubestore/cubestore/src/http/mod.rs
@cube-js cube-js deleted a comment from claude Bot May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.79%. Comparing base (2baf0bc) to head (5df91c3).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10819       +/-   ##
===========================================
+ Coverage   58.09%   78.79%   +20.69%     
===========================================
  Files         216      470      +254     
  Lines       16899    92179    +75280     
  Branches     3426     3436       +10     
===========================================
+ Hits         9818    72634    +62816     
- Misses       6578    19042    +12464     
  Partials      503      503               
Flag Coverage Δ
cube-backend 58.15% <ø> (+0.05%) ⬆️
cubesql 83.44% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Clients can opt in via `HttpQuery.response_format = Arrow` to receive
results as a binary Arrow IPC stream wrapped in a new `HttpQueryResult`
flatbuffer variant, instead of the legacy `HttpResultSet` where every
cell is stringified.

Default stays `Legacy`, so existing clients that don't set the flag
keep getting `HttpResultSet` — no behavior change for them.
@ovr ovr force-pushed the feat/cubestore-arrow-ipc branch from a5d775b to b77cae2 Compare May 5, 2026 15:12
@ovr ovr marked this pull request as ready for review May 5, 2026 15:39
@ovr ovr requested a review from a team as a code owner May 5, 2026 15:39
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @ovr's task in 2m 16s —— View job


Code Review: Arrow IPC Response Format

  • Read and analyze FlatBuffer schema changes
  • Read and analyze generated FlatBuffer code
  • Read and analyze HTTP module changes
  • Read and analyze SQL module changes
  • Check for security, performance, and correctness issues
  • Post comprehensive review

Overall Assessment

This is a clean, well-structured feature addition. The design is solid — a new QueryResultFormat enum opt-in on the request side, a new HttpQueryResult FlatBuffer wrapper on the response side, and QueryResult::to_arrow_ipc_stream() bridging between the internal representation and the wire format. The test coverage is thorough.

Correction on Prior Review

The earlier review comment about a union ordering wire-format break is incorrect. Looking at the actual diff and generated code:

HttpQuery      = 1  (unchanged)
HttpResultSet  = 2  (unchanged)
HttpError      = 3  (unchanged)  ← NOT shifted
HttpQueryResult = 4 (new, appended)

HttpQueryResult is correctly appended after HttpError in the FlatBuffer union, preserving all existing ordinals. No wire compatibility issue.

Issues Found

1. rows_to_columns runs on the async executor (minor performance concern)

File: rust/cubestore/cubestore/src/sql/mod.rs:125-129

In to_arrow_ipc_stream, the Frame variant calls rows_to_columns() and RecordBatch::try_new() before entering spawn_blocking. For large DataFrames this is CPU-intensive work running on the Tokio executor, which can starve other tasks.

The Stream path has a similar issue — batches.try_collect().await is inherently async, but the IPC write happens inside spawn_blocking. For the Frame path, the row-to-column conversion could also be moved inside spawn_blocking:

QueryResult::Frame(df) => {
    // Move CPU-intensive conversion into spawn_blocking too
    cube_ext::spawn_blocking(move || -> Result<Vec<u8>, CubeError> {
        let schema = df.get_schema();
        let arrays = crate::table::data::rows_to_columns(df.get_columns(), df.get_rows());
        let batch = RecordBatch::try_new(schema.clone(), arrays)?;
        let mut writer = StreamWriter::try_new(Cursor::new(Vec::new()), schema.as_ref())?;
        writer.write(&batch)?;
        writer.finish()?;
        Ok(writer.into_inner()?.into_inner())
    }).await?
}

Not a blocker — this matches the existing pattern for collect() in the Stream path — but worth considering.

2. No test for empty DataFrame on the Arrow path

File: rust/cubestore/cubestore/src/http/mod.rs:1186

The arrow_response_format_round_trip test covers all column types and null values (good!), but there's no test for an empty DataFrame (0 rows). Arrow IPC stream writers handle this by emitting the schema only, but it would be good to have a test verifying that the round-trip works cleanly with no rows — especially since rows_to_columns with an empty row slice is a distinct code path.

3. No serialization timing metric for Arrow path

File: rust/cubestore/cubestore/src/http/mod.rs:726-746

The legacy ResultSet path tracks serialization time via HTTP_MESSAGE_DATA_FRAME_SERIALIZATION_TIME_US (line 787), but the QueryResultArrow path doesn't report any timing metric. Since one of the motivations for Arrow IPC is performance, having a comparable metric would help measure the improvement. Not a blocker, but worth adding.

Design Observations

  • HttpQueryResultData union with single variant: Using a union wrapping just HttpQueryResultArrow is a smart forward-looking choice — it leaves room for other data formats (e.g. Parquet) without another wire break.

  • is_last field: Well-designed for future streaming support. is_last: true is the correct default for the current single-batch path.

  • spawn_blocking for IPC write: Good call — Arrow IPC serialization with large batches involves non-trivial CPU work that shouldn't block the Tokio runtime.

  • Backward compatibility: Clean. QueryResultFormat defaults to Legacy in the FlatBuffer schema, so existing clients that don't set response_format continue getting HttpResultSet. The deserialization path (HttpMessage::read) intentionally doesn't handle HttpQueryResult since it's outbound-only — this is correct.

Summary

The PR is well-implemented. The main items are minor performance and observability nits, not correctness bugs. The earlier review's wire-compatibility concern was a false alarm. The test is comprehensive and covers all column types including nulls. Good to merge with or without the suggested improvements.

… streaming

Reserve an is_last bool on HttpQueryResultArrow so clients can implement
frame-by-frame streaming once the server supports it. The server sets
is_last=true today since it returns a single-frame payload.
@ovr ovr force-pushed the feat/cubestore-arrow-ipc branch from 63ad3ec to 878a22c Compare May 5, 2026 15:47
@ovr ovr changed the title feat(cubestore): Arrow IPC response format for HTTP queries feat(cubestore): Support Arrow IPC response format May 5, 2026
@ovr ovr force-pushed the feat/cubestore-arrow-ipc branch from e6cf9c8 to 5f978b4 Compare May 6, 2026 10:15
@ovr ovr merged commit 3226a63 into master May 6, 2026
104 of 115 checks passed
@ovr ovr deleted the feat/cubestore-arrow-ipc branch May 6, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants