Add BigQuery support by gajop · Pull Request #458 · posit-dev/ggsql

gajop · 2026-05-21T13:20:38Z

Hello!
Since this is my PR here, I wanted to give a short introduction.
My name is Gajo, and at my work we use BigQuery a lot. This looks like an interesting project, and I would like to give it a go, so I decided to try adding BigQuery support.

All the code in this PR was written by Claude but I did my review and got it to a reasonable state.
I purposefully didn't add it to the default feature set, as I'm not sure which direction you would like to take, but I think it would be nice if we could add it later. Maybe even in this PR?

As I was writing this, I hit two problems with the current implementation of Histogram & Percentile, and tbh I'm not familiar with the library enough to tell if this is the right way of handling it. Learning about the finer details felt a bit daunting/somewhat out of scope for what I wanted to do in this PR. I did manage to satisfy the tests at least... but anyway, this is something that I'd take a deeper look at.

On testing, I added a couple of your usual unit tests, but I also added some integration tests, that create real tables and demonstrate that this works with real BigQuery, but since this requires a real infrastructure I've made them disabled by default. I did manage to confirm that they pass in one of my projects.

As a future PR, I would like to add some examples for DuckDB and BigQuery, something users can easily run. I would also like to dig a bit deeper in the security design of this, in case ggsql is used behind a library where user input might not be trusted (I'm not yet sure how you would support parametrized queries here). Also personally not a fan of large files, and if it's OK with you, I'd at least split the bigquery reader into a few files.. but leaving this up to you.

Regarding this PR, if you would like to first discuss the approach or maybe start with something smaller, please let me know. I can understand the potential maintenance burden of adding new features from new contributors - the drive-by PR is certainly a thing these days.

Below is AI generated PR message

feature flag (enable with --features bigquery, or all-readers).

BigQueryReader (src/reader/bigquery.rs) — authenticates via Application
Default Credentials using the gcloud-bigquery crate; paginates results and
converts them to Arrow.
Connection string bigquery://[PROJECT[/DATASET]][?location=REGION] — the
project is optional and resolves from ADC / GOOGLE_CLOUD_PROJECT when
omitted; an optional dataset sets the default dataset; location defaults to
US.
BigQueryDialect — backtick quoting, BigQuery type names
(INT64/FLOAT64/STRING/DATETIME), GREATEST/LEAST, GENERATE_ARRAY
series, APPROX_QUANTILES quantiles, and CREATE OR REPLACE TEMP TABLE.
VS Code / Positron connection picker and the Jupyter kernel both accept
bigquery://; cli.qmd documents the scheme and ADC auth.

SQL portability fixes

Two SQL-generation issues surfaced under BigQuery's stricter semantics. Both
fixes are output-identical on DuckDB/SQLite:

Histogram — bin_end and density are now derived in the outer SELECT
from the already-grouped bin/count columns, instead of inside the GROUP BY
query. BigQuery rejects a GROUP BY query whose SELECT list references an input
column outside the grouping key.
Percentile — sql_percentile / sql_quantile_inline emit the
APPROX_QUANTILES aggregate instead of a correlated scalar subquery, which
BigQuery rejects for grouped boxplot / density.

Testing

Live integration tests (bq_integration_*) — #[ignore] by default, opt in
via GGSQL_BIGQUERY_TEST_URI. They create a uuid-named dataset (auto-dropped)
and cover catalog/schema/table/column introspection, execute_sql,
point/boxplot/grouped-boxplot/histogram rendering, type conversion, result
pagination, and query-error propagation.
Dialect unit tests for quoting, series generation, and quantile SQL.
Full workspace test suite green; clippy clean with --features bigquery.

Follow-ups (not in this PR)

CI does not yet compile-check the bigquery feature; it should gain steps
mirroring the existing ADBC ones, or bigquery.rs will bitrot.
Release binaries (ggsql, ggsql-jupyter) build with default features and so
do not include bigquery — needs a decision on whether to ship it.
src/CLAUDE.md / ggsql-jupyter/CLAUDE.md reader and feature-flag tables need
updating to list bigquery (and adbc).

Add a native BigQuery reader behind a new off-by-default 'bigquery' feature flag. BigQueryReader authenticates via Application Default Credentials and accepts bigquery://[PROJECT[/DATASET]][?location=REGION] connection strings; the project resolves from ADC / GOOGLE_CLOUD_PROJECT when omitted. VS Code / Positron and the Jupyter kernel recognise the same scheme. Two SQL-generation fixes were needed for BigQuery's strict semantics; both are output-identical on DuckDB/SQLite: - Histogram: bin_end and density are now derived in the outer SELECT from the already-grouped bin/count columns, instead of inside the GROUP BY query. BigQuery rejects a GROUP BY query whose SELECT list references an input column outside the grouping key. - BigQuery percentile: sql_percentile / sql_quantile_inline now emit the APPROX_QUANTILES aggregate instead of a correlated scalar subquery, which BigQuery rejects for grouped boxplot / density. Integration tests (bq_integration_*) create a uuid-named dataset, run the public API against it, and drop it on completion; they are #[ignore] by default and opt in via GGSQL_BIGQUERY_TEST_URI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The fixture now also creates a `types` table (DATE/TIMESTAMP/DATETIME/ TIME/NUMERIC, plus an all-NULL row) and a 25_000-row `wide` table. New #[ignore] integration tests: - type conversion — asserts each BigQuery type maps to the expected Arrow dtype. - pagination — a 25_000-row scan must stitch three result pages (PAGE_SIZE is 10_000). - query error — a failing query surfaces as Err, not a panic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gajop and others added 2 commits May 21, 2026 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BigQuery support#458

Add BigQuery support#458
gajop wants to merge 2 commits into
posit-dev:mainfrom
gajop:feat/add-bigquery

gajop commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gajop commented May 21, 2026

SQL portability fixes

Testing

Follow-ups (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant