Add sqlite reader and adjust SQL queries to work there#182
Open
georgestagg wants to merge 10 commits intomainfrom
Open
Add sqlite reader and adjust SQL queries to work there#182georgestagg wants to merge 10 commits intomainfrom
georgestagg wants to merge 10 commits intomainfrom
Conversation
georgestagg
commented
Mar 10, 2026
Comment on lines
-80
to
+88
| default = ["duckdb", "sqlite", "vegalite", "ipc", "builtin-data"] | ||
| default = ["duckdb", "sqlite", "vegalite", "ipc", "parquet", "builtin-data"] | ||
| ipc = ["polars/ipc"] | ||
| duckdb = ["dep:duckdb", "dep:arrow"] | ||
| polars-sql = ["polars/sql"] | ||
| builtin-data = ["polars/parquet"] | ||
| parquet = ["polars/parquet"] | ||
| postgres = ["dep:postgres"] | ||
| sqlite = ["dep:rusqlite"] | ||
| vegalite = [] | ||
| ggplot2 = [] | ||
| builtin-data = [] |
Collaborator
Author
There was a problem hiding this comment.
This tweak to features is just prep for Wasm, not related to sqlite.
Collaborator
Author
|
Some more context: I almost added a new A question arises of if there are performance considerations to reach for the lowest common denominator implementations. I had Claude whip up a benchmark, and these are the results: Detailsuse criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use duckdb::Connection;
use ggsql::utils::{sql_generate_series, sql_percentile};
fn setup_connection() -> Connection {
Connection::open_in_memory().expect("Failed to open DuckDB in-memory connection")
}
fn create_data_table(conn: &Connection, name: &str, n_rows: usize) {
conn.execute_batch(&format!(
"CREATE OR REPLACE TABLE {name} AS \
SELECT random() * 1000.0 AS val \
FROM GENERATE_SERIES(0, {}) AS seq(n)",
n_rows - 1
))
.expect("Failed to create data table");
}
fn bench_generate_series(c: &mut Criterion) {
let conn = setup_connection();
let mut group = c.benchmark_group("generate_series");
for n in [64, 512, 1000, 4096] {
group.bench_with_input(BenchmarkId::new("native", n), &n, |b, &n| {
let sql = format!(
"SELECT n FROM GENERATE_SERIES(0, {}) AS seq(n)",
n - 1
);
b.iter(|| {
let mut stmt = conn.prepare(&sql).unwrap();
let rows = stmt.query_map([], |row| row.get::<_, f64>(0)).unwrap();
for r in rows {
std::hint::black_box(r.unwrap());
}
});
});
group.bench_with_input(BenchmarkId::new("portable", n), &n, |b, &n| {
let cte = sql_generate_series(n);
let sql = format!("WITH RECURSIVE {cte} SELECT n FROM __ggsql_seq__");
b.iter(|| {
let mut stmt = conn.prepare(&sql).unwrap();
let rows = stmt.query_map([], |row| row.get::<_, f64>(0)).unwrap();
for r in rows {
std::hint::black_box(r.unwrap());
}
});
});
}
group.finish();
}
fn bench_percentile(c: &mut Criterion) {
let conn = setup_connection();
let mut group = c.benchmark_group("percentile");
for n_rows in [100, 1000, 10000] {
let table = format!("data_{n_rows}");
create_data_table(&conn, &table, n_rows);
group.bench_with_input(BenchmarkId::new("native", n_rows), &n_rows, |b, _| {
let sql = format!(
"SELECT QUANTILE_CONT(val, 0.25) AS q1, QUANTILE_CONT(val, 0.75) AS q3 \
FROM {table}"
);
b.iter(|| {
let mut stmt = conn.prepare(&sql).unwrap();
let row = stmt
.query_row([], |row| {
Ok((row.get::<_, f64>(0)?, row.get::<_, f64>(1)?))
})
.unwrap();
std::hint::black_box(row);
});
});
group.bench_with_input(BenchmarkId::new("portable", n_rows), &n_rows, |b, _| {
let from = format!("SELECT * FROM {table}");
let q1 = sql_percentile("val", 0.25, &from, &[]);
let q3 = sql_percentile("val", 0.75, &from, &[]);
let sql = format!("SELECT {q1} AS q1, {q3} AS q3");
b.iter(|| {
let mut stmt = conn.prepare(&sql).unwrap();
let row = stmt
.query_row([], |row| {
Ok((row.get::<_, f64>(0)?, row.get::<_, f64>(1)?))
})
.unwrap();
std::hint::black_box(row);
});
});
}
group.finish();
}
criterion_group!(benches, bench_generate_series, bench_percentile);
criterion_main!(benches); |
Collaborator
Author
|
Just a quick comment to note b3d6b49 reintroduced |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I went back and forth many times for this PR on whether to introduce SQL engine specific syntax for percentiles and other incompatibilities. In the end, I decided not to, instead opting to try and use as SQL-agnostic code as I could throughout.
Various things don't work in sqlite, here is the situation as it currently stands as far as I recall:
No
EXCLUDE. Instead we keep duplicated columns during the affected queries and drop them from the result.LIMIT 0does not return the correct column type information - Switch toLIMIT 1.No
GREATESTorLEAST- Switch to a utility function to build the equivalent withCASE WHEN.No
QUANTILE_CONTorPERCENTILE_CONT- Return back to using anNTILE-based method for boxplot and density.No
ANY_VALUE- We can useMIN, that gives us an any value.No
GENERATE_SERIES- Use aRECURSIVECTE to generate the series values.