Spatial phase I: Layer#370
Conversation
Registers GeomType::Spatial across the parser, AST, and builder. The spatial geom requires a `geometry` aesthetic and supports fill, stroke, opacity, linewidth, and linetype. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… SpatialRenderer - Feature-gated `spatial` (default-on) with geozero + hex dependencies - Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers - SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features - Vega-Lite geoshape mark with proper encoding channel handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip geometry aesthetic in encoding builder (structural, not visual) - Remove SpatialRenderer::modify_encoding stub (default suffices) - Remove geometry column auto-detection (revisit after Arrow migration) - Add end-to-end tests: GeoJSON features, WKB hex, mixed layers - Add execute test for explicit geometry mapping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relax the grammar's other_sql_statement rule to accept any non-delimiter tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error. Execute these setup statements before the main query in the pipeline. Flip DDL detection in DuckDB and SQLite readers to a returns_rows whitelist, so unknown statement types are handled gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…query_arrow DuckDB's query_arrow API exports results directly as Arrow RecordBatches, eliminating the manual row-by-row type mapping. Extension types like GEOMETRY now flow through as native Binary columns instead of hitting a lossy string fallback. Decimal128 columns are normalized to Float64 at the boundary for downstream compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… materialisation Replaces the execute_sql() → register() two-step with direct CREATE TEMP TABLE AS DDL. This keeps data inside the database engine and preserves native types (e.g. DuckDB GEOMETRY) that were lost during the Arrow materialisation round-trip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y tests Only apply ST_AsWKB when the geometry column is Binary (native DuckDB GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through to the writer unchanged. Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText to verify the full native GEOMETRY pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mp table materialisation" This reverts commit 2694724.
Remove GeoJSON/WKB hex string handling from the writer and stat transform. Spatial data should come from native GEOMETRY columns (via ST_Read, ST_GeomFromText, etc.), not string columns. Drops hex crate dependency from the spatial feature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial stat transform uses the dialect instead of hardcoding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Natural Earth 110m country boundaries as geoparquet. Columns: name, iso_a3, continent, subregion, income_group, population, gdp, geom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spatial stat transform now calls dialect.sql_spatial_setup() before emitting ST_AsWKB, so users no longer need manual LOAD spatial statements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a geom declares a geometry aesthetic and the user hasn't mapped it explicitly, scan the schema for a column with a conventional geometry name (geom, geometry, wkb_geometry, the_geom, shape) and binary type. Requires exactly one match to avoid ambiguity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB reads geoparquet geometry columns as BLOB when the spatial extension is not loaded, making ST_AsWKB fail later. Pre-load spatial when the world dataset is referenced so the column is read as GEOMETRY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
thomasp85
left a comment
There was a problem hiding this comment.
This in general looks good. Most comments are cosmetic.
One thing we kinda ignore here, and maybe that is for the future projection PR, but do we need a geometry scale? Or do you scale latitude and longitude (defined by the projection), and the geometry mapping just sort of follows?
| > Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it. | ||
|
|
||
| The spatial layer is used to render geographic geometries consisting of polygons, lines and points used to make maps like choropleths. | ||
| It differs from other layers in that uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes. |
There was a problem hiding this comment.
| It differs from other layers in that uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes. | |
| It differs from other layers in that it uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes. |
|
|
||
| ```{ggsql} | ||
| VISUALISE FROM ggsql:world | ||
| DRAW spatial |
There was a problem hiding this comment.
| DRAW spatial | |
| DRAW spatial |
| ```{ggsql} | ||
| SELECT geom AS foo FROM ggsql:world | ||
| VISUALISE | ||
| DRAW spatial MAPPING foo AS geometry |
There was a problem hiding this comment.
| DRAW spatial MAPPING foo AS geometry | |
| DRAW spatial | |
| MAPPING foo AS geometry |
|
|
||
| ```{ggsql} | ||
| VISUALISE FROM ggsql:world | ||
| DRAW spatial FILTER continent == 'Asia' |
There was a problem hiding this comment.
| DRAW spatial FILTER continent == 'Asia' | |
| DRAW spatial | |
| FILTER continent == 'Asia' |
|
|
||
| ```{ggsql} | ||
| VISUALISE FROM ggsql:world | ||
| DRAW spatial |
There was a problem hiding this comment.
| DRAW spatial | |
| DRAW spatial |
| DRAW spatial | ||
| MAPPING population AS fill | ||
| SETTING opacity => 1 |
There was a problem hiding this comment.
| DRAW spatial | |
| MAPPING population AS fill | |
| SETTING opacity => 1 | |
| DRAW spatial | |
| MAPPING population AS fill | |
| SETTING opacity => 1 |
| fn looks_like_geometry(name: &str) -> bool { | ||
| matches!( | ||
| name.to_lowercase().as_str(), | ||
| "geom" | "geometry" | "wkb_geometry" | "the_geom" | "shape" |
There was a problem hiding this comment.
Are these based on use in the wild? "the_geom" sounds made up 🫠
There was a problem hiding this comment.
These are 100% based on what claude decided might be reasonable. I didn't ask it for receipts, but it does seem to be out there in the wild:
https://gis.stackexchange.com/questions/49455/geometry-column-naming-convention-geom-or-the-geom
| return Some(candidates[0].name.clone()); | ||
| } | ||
|
|
||
| // Fall back to name-only match (e.g. extension types we don't recognise) |
There was a problem hiding this comment.
what if there is a geometry type with another name? Shouldn't that win over the "standard" name?
There was a problem hiding this comment.
I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.
The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first
| pub fn register_builtin_datasets_duckdb( | ||
| sql: &str, | ||
| conn: &duckdb::Connection, | ||
| ) -> Result<(), GgsqlError> { | ||
| use std::{env, fs}; | ||
|
|
||
| let dataset_names = extract_builtin_dataset_names(sql)?; | ||
|
|
||
| // Load spatial extension before registering datasets that contain | ||
| // geometry columns, so DuckDB reads them as GEOMETRY rather than BLOB. | ||
| if dataset_names.iter().any(|n| n == "world") { | ||
| let _ = conn.execute("LOAD spatial", duckdb::params![]); | ||
| } | ||
|
|
||
| for name in dataset_names { | ||
| let Some(parquet_bytes) = builtin_parquet_bytes(&name) else { | ||
| continue; |
There was a problem hiding this comment.
Not really part of this PR but since you touch it - this whole function should be moved to the duckdb.rs
thomasp85
left a comment
There was a problem hiding this comment.
Also, a comment from Claude to consider:
GeoJSON reserved-key collisions in SpatialRenderer::prepare_data (src/writer/vegalite/layer.rs:2239):
properties.insert(col_name.clone(), value.clone());
feature.insert(col_name.clone(), value);
Every non-geometry column is also splatted into the feature top level. If a user has a column named type, this overwrites "Feature" and Vega-Lite stops treating it as a feature. properties, bbox, id collide too. At minimum, skip those reserved keys when copying to the top level — or just drop the top-level copy (see next point).
Worth fixing while you're here
Properties stored twice in the inline data (same lines as above). Each non-geometry column ends up both at feature.X and feature.properties.X. Vega-Lite resolves bare field: "X" references against datum.X, which is the top-level form — so the properties map is unused output. Drop the properties map (simpler, matches how the rest of the codebase emits row data), or drop the top-level copy and use from: { data: ..., fields: ["properties.X"] } style. Either way, half the geometry-row JSON goes away.
|
|
||
| # Spatial | ||
| geozero = { version = "0.14", default-features = false } | ||
| hex = "0.4" |
There was a problem hiding this comment.
Shouldn't this be removed?
| } | ||
|
|
||
| fn sql_spatial_setup(&self) -> Vec<String> { | ||
| vec!["LOAD spatial".into()] |
There was a problem hiding this comment.
What happens if spatial is not installed? IS it installed on first load?
| return Some(candidates[0].name.clone()); | ||
| } | ||
|
|
||
| // Fall back to name-only match (e.g. extension types we don't recognise) |
There was a problem hiding this comment.
I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.
The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first
| ```{ggsql} | ||
| VISUALISE FROM ggsql:world | ||
| DRAW spatial | ||
| FILTER ST_Intersects(geom, ST_MAkeEnvelope(-20.0, -35.0, 55.0, 38.0)) |
There was a problem hiding this comment.
| FILTER ST_Intersects(geom, ST_MAkeEnvelope(-20.0, -35.0, 55.0, 38.0)) | |
| FILTER ST_Intersects(geom, ST_MakeEnvelope(-20.0, -35.0, 55.0, 38.0)) |
This PR advances #336.
It introduces the
spatiallayer with draws the simple feature geometry.Internally, the geometry column is converted to WKB, which is handled via a dialect.
The vegalite writer converts the WKB to GeoJSON.
Like ggplot2, detection of the geometry column is automated for common cases.
This PR also adds the
ggsql:worldbuilt-in dataset that contains the Natural Earth 110m dataset with selected columns.Noteably absent in this PR is the whole projection ordeal.