Skip to content

Support DuckDB main (PEG parser): port expression FFI + parser-swap regression test#38

Merged
nicosuave merged 3 commits into
mainfrom
claude/nice-mahavira-37211d
Jun 14, 2026
Merged

Support DuckDB main (PEG parser): port expression FFI + parser-swap regression test#38
nicosuave merged 3 commits into
mainfrom
claude/nice-mahavira-37211d

Conversation

@nicosuave

@nicosuave nicosuave commented Jun 14, 2026

Copy link
Copy Markdown
Member

Why

DuckDB is replacing its legacy PostgreSQL parser with a PEG parser (opt-in in 1.5, the sole parser on main, default in 2.0). Two problems surfaced:

  1. duckdb-next-build has been red for ~a week — unrelated to PEG. DuckDB main refactored the parsed-expression API: subclass fields are now private (accessors instead) and names use a dedicated Identifier type instead of std::string. src/yardstick_parser_ffi.cpp accessed those fields directly, so it no longer compiled against main.
  2. Yardstick integrates via allow_parser_override_extension="fallback"; when the override declines a query, DuckDB's parser handles it — including surfacing syntax errors, whose shape differs between the legacy and PEG parsers.

What this does

1. Ports the parser FFI to DuckDB main's expression API, keeping 1.5 working (src/yardstick_parser_ffi.cpp).
A small compat shim selected at compile time via __has_include("duckdb/common/identifier.hpp") (matching the project's existing __has_include style) routes all expression field access through helpers. Both the old field-based API (1.5 + the pinned submodule) and the new accessor-based API (main) build. Behavior is unchanged — it's a pure access adaptation over the same tree walks.

Validated locally with -fsyntax-only against both DuckDB main and v1.5-variegata headers (0 errors each). This is what un-reds duckdb-next-build; because main is PEG-only, that lane then exercises the whole suite under PEG automatically.

2. Adds test/sql/peg_parser.test — parser-agnostic regression guard for the surface most exposed to the swap:

  • parser_override rewrite path still works (prefixed + unprefixed AGGREGATE, AT (ALL ...)).
  • When Yardstick declines a malformed query, the active parser still raises an error — matched on the parser-agnostic Parser Error prefix (emitted by both parsers).

It runs under whichever parser the build uses, so it becomes real PEG coverage on duckdb-next-build.

Validation

  • -fsyntax-only against DuckDB main and 1.5 headers: 0 errors both.
  • Full builds run in CI on this PR (duckdb-stable-build = 1.5, duckdb-next-build = main/PEG). Not built locally — make test needs the submodule initialized and a full from-source compile.

DuckDB is replacing its legacy PostgreSQL parser with a PEG parser (opt-in
in 1.5, the sole parser on main, default in 2.0). Yardstick integrates via
allow_parser_override_extension="fallback", so when the override declines a
query, DuckDB's parser handles it -- including surfacing syntax errors, whose
shape differs between the legacy and PEG parsers.

Add test/sql/peg_parser.test guarding the two behaviors most exposed to the
swap: the parser_override rewrite path still works, and the fallback path
still raises a parser-agnostic "Parser Error". The test runs under whichever
parser the build uses, so it becomes PEG coverage automatically once the suite
runs against a PEG-default DuckDB.
@nicosuave nicosuave force-pushed the claude/nice-mahavira-37211d branch from d3e920b to 1b7dcce Compare June 14, 2026 13:57
@nicosuave nicosuave changed the title Add PEG parser CI canary and regression test Add parser-agnostic regression test for DuckDB parser swap Jun 14, 2026
DuckDB main refactored the parsed-expression API: subclass fields are now
private (accessed via FunctionName()/GetArgumentsMutable()/LeftMutable()/...)
and names use a dedicated Identifier type instead of std::string. This broke
the duckdb-next-build CI lane (compile failure in yardstick_parser_ffi.cpp),
which has been red for a week.

Add a small compatibility shim, selected at compile time via
__has_include("duckdb/common/identifier.hpp") (matching the project's existing
__has_include pattern), and route all expression field access through it. The
old field-based API (DuckDB 1.5 and the pinned submodule) and the new
accessor-based API (main) are both supported, so the stable and next CI lanes
both build.

Behavior is unchanged: the shim is a pure access adaptation over the same tree
walks. Verified with -fsyntax-only against both DuckDB main and v1.5-variegata
headers (0 errors each).
@nicosuave nicosuave changed the title Add parser-agnostic regression test for DuckDB parser swap Support DuckDB main (PEG parser): port expression FFI + parser-swap regression test Jun 14, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef1898017b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/yardstick_parser_ffi.cpp
Fixing the expression FFI let the next-build compile advance to the next
DuckDB-main API change: parse_function_t now receives the post-PEG-failure
token tail (const vector<SimpleToken>&) instead of the raw query string
(yardstick_extension.hpp/.cpp).

Yardstick does all of its rewriting in yardstick_parser_override, which still
receives the full query string on both APIs, so on the new signature
parse_function is a no-op fallback. The old string-based fallback is retained
for DuckDB 1.5 via the same __has_include detection. Matching yardstick_parse
to parse_function_t also fixes the function-pointer comparison in yardstick_bind.

Verified with -fsyntax-only against both DuckDB main and v1.5-variegata (0
errors); all three extension translation units now compile against main.
@nicosuave nicosuave merged commit 3eb6e0f into main Jun 14, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant