feat(xorq): XorqBuckarooWidget — buckaroo against an ibis/xorq expression by paddymul · Pull Request #703 · buckaroo-data/buckaroo

paddymul · 2026-05-05T20:36:40Z

Summary

XorqBuckarooWidget(expr) — buckaroo over a xorq/ibis expression. Stats push down through XorqDfStatsV2 (one batched aggregate + per-column histogram queries); only a display-sized sample lands in pandas via expr.limit(N).execute().
add_processing(fn) takes an expression-to-expression function so postprocessing also pushes down — filtered stats reflect the filtered expression without re-materialising.
BuckarooWidgetBase gains a dataflow_klass class attribute so the inner dataflow can be swapped without duplicating __init__. Pandas widget unchanged.

What's new in `buckaroo/xorq_buckaroo.py`

XorqSampling — serialize_sample does expr.limit().execute(); pre-stats sampling is a no-op (the point is push-down).
XorqAutocleaning — pass-through. The lisp interpreter targets pandas DataFrames; for ibis exprs we skip cleaning entirely.
XorqDataflow — overrides populate_df_meta to use expr.count().execute() instead of len(), and rewrites the summary-dict keys from original column names (what XorqStatPipeline produces) to the rewritten a, b, c space the frontend works in.
XorqBuckarooWidget — wires the above together with XORQ_STATS_V2 + the default styling klasses, and exposes add_processing(expr_func).

Test plan

9 new unit tests in tests/unit/test_xorq_buckaroo_widget.py cover instantiation, df_meta from count(), column rewrite, push-down filter via add_processing, switching back to no-op processing, and the error-frame fallback when a postprocessor raises
full unit suite (931 tests) still passes
manually verified: XorqBuckarooWidget(memtable).add_processing(lambda e: e.filter(e.qty > 2)) → 4-row filtered display, stats min/max reflect the filtered expr (9.9 / 50.0 vs 7.4 / 50.0 unfiltered)

🤖 Generated with Claude Code

github-actions · 2026-05-05T20:38:35Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25438596086

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25438596086

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.13.5.dev25438596086" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 202c5e44e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Subprocess + meta-path blocker — importing ``buckaroo.xorq_buckaroo`` fails fast if polars (or any polars-only buckaroo module) gets pulled in transitively, and vice versa for the polars side. Catches the regression Codex flagged on PR #703 (NoCleaningConfPl import) before it reaches a ``buckaroo[xorq]`` install with no polars. Both extras are optional; cross-imports turn one extra into a hard requirement for the other and break the advertised install paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three checks were red, all from the last test push: - JS / Build + Test, Build JS + Python Wheel, ReadTheDocs (all run ``tsc``): xorqWindow.dom.test.tsx used ``{ displayer: "float" }`` which is missing the required ``min_fraction_digits`` / ``max_fraction_digits`` from FloatDisplayerA. Add them. - Python / Lint (ruff F841): the multi-stage step test bound ``df = pd.read_parquet(...)`` but never used it. Renamed to ``_bufs`` and dropped the call. Verified locally: ruff clean, tsc -b passes, vite build succeeds, 36 python tests + 2 dom tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sion Stats push down via XorqDfStatsV2 (one batched aggregate + per-column histogram queries); only a display-sized sample lands in pandas. add_processing takes an expression-to-expression function so postprocessing also pushes down — filtered stats reflect the filtered expr without re-materialising. - buckaroo/xorq_buckaroo.py: XorqBuckarooWidget plus XorqSampling (limit+execute), XorqAutocleaning (no-op; the lisp interpreter targets pandas), and XorqDataflow (count() for df_meta, rewrites SD keys orig→a/b/c since XorqStatPipeline keeps orig names but the frontend works in rewritten space). - buckaroo/buckaroo_widget.py: dataflow_klass class attribute on BuckarooWidgetBase so the inner dataflow can be swapped without duplicating __init__. Pandas widget unchanged. - tests/unit/test_xorq_buckaroo_widget.py: instantiation, df_meta from count(), column rewrite, push-down filter through add_processing, switch-back, error-frame fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

XorqBuckarooInfiniteWidget pushes pagination down to the backend. Each infinite_request slices via expr.limit(end - start, offset=start).execute(); sort fires expr.order_by(asc/desc).limit(...) on the backend. df_data_dict.main stays empty — pagination drives loading. The total length in each response is one expr.count().execute(); the small-window pandas frame is what crosses the parquet boundary. Search wired through quick_command_args. XorqSearch carries the 'search' lisp symbol and the frontend quick-args pattern, but the actual filter dispatch happens directly in XorqAutocleaning — the lisp interpreter built by configure_buckaroo bakes in df.copy()/df.clone(), neither of which works on immutable ibis exprs. _xorq_search builds an OR of contains() across all string columns; an empty value clears the filter. NoCleaningConfXorq replaces NoCleaningConfPl in the autocleaning config so command symbols are registered for the frontend. Tests: 9 new (3 search behaviours, 6 infinite widget — slice, sort pushdown, second_request piggyback, search-then-paginate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Subprocess + meta-path blocker — importing ``buckaroo.xorq_buckaroo`` fails fast if polars (or any polars-only buckaroo module) gets pulled in transitively, and vice versa for the polars side. Catches the regression Codex flagged on PR #703 (NoCleaningConfPl import) before it reaches a ``buckaroo[xorq]`` install with no polars. Both extras are optional; cross-imports turn one extra into a hard requirement for the other and break the advertised install paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Run paddy-format on the two test files added this branch — collapses multiline literals with trailing commas to a single line. CI lint job was failing for these two files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two changes to address paddymul's note that it wasn't obvious that the expression stays lazy until a *bounded* execute on a small slice: - Inline the slicing inside _handle_payload_args. Sort and slice flow through one branch, the explicit ``expr.limit(end - start, offset=start) .execute()`` lives in _execute_window (renamed from _sliced_pandas) with a docstring spelling out that this is the only place the expr is materialised, and only over rows [start, end). - Refuse a bare expr.execute() in XorqSampling.serialize_sample. Previously the ``serialize_limit == 0`` else branch would have done ``df_or_expr.execute()`` — unreachable in practice (basic widget has serialize_limit=5_000; infinite widget skips this path via skip_main_serial=True), but exactly the kind of thing the reviewer was guarding against. Now an unbounded serialize_limit raises RuntimeError instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirror the polars infinite widget's wire path: arrow → parquet, no pandas detour. The ibis branch of _window_to_parquet (renamed from _execute_window) goes: expr.select(rename).limit(end - start, offset=start) .to_pyarrow() -> pa.Table -> pq.write_table instead of `.execute()` (pandas DataFrame) followed by fastparquet. The pandas branch (postprocessor materialised, error frame) still flows through fastparquet — there's no expression to push down. Side benefit of the arrow path: string columns travel as native parquet UTF8 instead of fastparquet's JSON-encoded strings. Tests around the infinite behaviour — went from 18 to 35: - TestInfiniteWindowEdges: partial-at-end window, past-end empty, index-column carries absolute offsets across pages. - TestInfiniteSort: descending, by string column, with non-zero offset, second_request skipped when sorted. - TestInfinitePostprocessing: filter-returning processor stays push-down; pandas-returning processor takes the in-process branch. - TestInfiniteErrorPath: bad sort column emits error_info + length=0. - TestLazyPostprocessor: postprocessor body runs without triggering any materialisation (pandas or arrow); paginates with bounded plan. - TestInfiniteBoundedExecution: the central PR-review invariant — the only ops materialised are CountStar (aggregate) and Limit (windowed slice), never InMemoryTable. Includes a no-pandas- detour spy that confirms the row window goes through to_pyarrow, never .execute(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two separate _handle_payload_args calls — [0, 50) then a jump to [300, 350) — confirm the infinite widget handles scroll-jumps without materialising rows outside either window. Each request emits exactly one CountStar (total) and one Limit (slice); InMemoryTable never appears in the spy log. This exercises the contract a frontend would hit when the user scrolls past the loaded buffer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The xorq infinite widget switched to arrow-direct serialization (expr.to_pyarrow() -> pyarrow.parquet.write_table) instead of pandas / fastparquet. The wire format changes — most notably string columns travel as native parquet UTF8 instead of fastparquet's JSON-encoded form — so we need a contract test that the JS frontend's hyparquet reader handles the new bytes. Pattern (mirrors the existing summary_stats_parquet_b64 fixture): - scripts/generate_xorq_window_fixture.py — emits the fixture by running XorqBuckarooInfiniteWidget._handle_payload_args on a small deterministic dataset and base64-encoding the parquet response. - tests/unit/test_xorq_window_fixture.py — re-runs the generator in memory and diffs every field (including the parquet bytes) against the committed fixture, so drift between Python output and the JS fixture surfaces in CI. - packages/.../xorqWindow.test.ts — uses hyparquet to decode the fixture and asserts: row count, schema in the rewritten 'a, b, index' space, strings come through native UTF8 (not JSON-quoted), and the index column carries absolute window offsets. Run scripts/generate_xorq_window_fixture.py to refresh the fixture when the wire format intentionally changes; the drift test will tell you if you forgot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

xorqWindow.test.ts proved the parquet bytes are decodable; this test adds the next layer: the decoded rows actually reach AgGridReact's rowData with the right values. End-to-end: parquet bytes → hyparquet decode → DFViewerInfinite (Raw mode) → AgGridReact rowData Mocking AgGridReact (the same pattern DFViewerInfinite.test.tsx uses; real AG-Grid uses virtual scrolling + canvas which jsdom can't render cleanly) and inspecting ``latestAgGridProps.gridOptions.rowData`` is the surface where 'what will be rendered into the grid DOM' is captured. Asserts: - row count matches the request window; - index column carries absolute window offsets; - name strings come through native UTF8 (no JSON quotes — proves the arrow-direct serialization survives the full pipeline); - price column is numeric (not strings); - column_defs' ``field`` keys match what's in rowData (otherwise AG-Grid would render blank cells). Sanitises BigInt → Number on decode, mirroring resolveDFData's parseParquetRow. Hyparquet decodes parquet INT64 as JS BigInt, and DFViewerInfinite's Raw path JSON-stringifies rowData for a memo signature; without sanitising, the test crashes on a BigInt that production never encounters (production uses DataSource mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-to-end check that the arrow-direct serialization (Python) lines up with hyparquet decoding (JS) and lands real values in real AG-Grid cells. Loads a 5000-row XorqBuckarooInfiniteWidget into JupyterLab, exercises three scenarios: 1. Initial render — row 0's str_col cell ("foo_10") is visible and matches the predictable invariant ``str_col == "foo_{row_num + 10}"``. 2. Deep scroll — scrolls ~30000px, waits for the follow-up infinite_request to round-trip, then reads every visible body row (filtered to ``[row-index]`` so the pinned dtype row doesn't leak in) and asserts the same invariant. The min visible row_num must be > 500 — proves we genuinely scrolled rather than re-rendering the top. 3. Jump back to top — after a deep scroll, scrollTop=0 and verify row 0 is rendered again (cache hit or re-fetch, both fine). Test setup follows the existing infinite-scroll-transcript.spec.ts pattern: notebook lives in tests/integration_notebooks/, the runner script copies it into the JupyterLab cwd, and the spec file is matched by playwright.config.integration.ts. The script's notebook→spec routing now sends test_xorq_infinite_scroll.ipynb to xorq-infinite-scroll.spec.ts with a 60s timeout (xorq + DataFusion startup is heavier than polars). Verified locally: 3/3 sub-tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three checks were red, all from the last test push: - JS / Build + Test, Build JS + Python Wheel, ReadTheDocs (all run ``tsc``): xorqWindow.dom.test.tsx used ``{ displayer: "float" }`` which is missing the required ``min_fraction_digits`` / ``max_fraction_digits`` from FloatDisplayerA. Add them. - Python / Lint (ruff F841): the multi-stage step test bound ``df = pd.read_parquet(...)`` but never used it. Renamed to ``_bufs`` and dropped the call. Verified locally: ruff clean, tsc -b passes, vite build succeeds, 36 python tests + 2 dom tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walks through the regular and infinite-scroll variants against citibike parquet (100k rows) and a 1.2M-row NumPy-generated frame, plus a lazy postprocessing demo via expr.filter(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paddymul temporarily deployed to testpypi May 5, 2026 20:37 — with GitHub Actions Inactive

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread buckaroo/xorq_buckaroo.py Outdated

paddymul temporarily deployed to testpypi May 5, 2026 21:28 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 6, 2026 01:46 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 6, 2026 10:11 — with GitHub Actions Inactive

paddymul commented May 6, 2026

View reviewed changes

Comment thread buckaroo/xorq_buckaroo.py Outdated

paddymul temporarily deployed to testpypi May 6, 2026 10:17 — with GitHub Actions Inactive

paddymul mentioned this pull request May 6, 2026

TDD demo: xorq↔polars isolation test (red→green) #704

Closed

paddymul temporarily deployed to testpypi May 6, 2026 10:45 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 6, 2026 11:13 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 6, 2026 12:47 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi May 6, 2026 13:31 — with GitHub Actions Inactive

paddymul and others added 13 commits May 6, 2026 09:34

style: paddy-format new xorq test files

ccfee57

Run paddy-format on the two test files added this branch — collapses multiline literals with trailing commas to a single line. CI lint job was failing for these two files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

examples of using xorq buckaroo

28db376

paddymul force-pushed the feat/xorq-buckaroo-widget branch from 16711c9 to 28db376 Compare May 6, 2026 13:34

paddymul deployed to testpypi May 6, 2026 13:36 — with GitHub Actions Active

paddymul added this pull request to the merge queue May 6, 2026

Merged via the queue into main with commit 4ef5916 May 6, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(xorq): XorqBuckarooWidget — buckaroo against an ibis/xorq expression#703

feat(xorq): XorqBuckarooWidget — buckaroo against an ibis/xorq expression#703
paddymul merged 13 commits intomainfrom
feat/xorq-buckaroo-widget

paddymul commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paddymul commented May 5, 2026

Summary

What's new in buckaroo/xorq_buckaroo.py

Test plan

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 TestPyPI package published

MCP server for Claude Code

📖 Docs preview

🎨 Storybook preview

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What's new in `buckaroo/xorq_buckaroo.py`

github-actions Bot commented May 5, 2026 •

edited

Loading