Skip to content

Spatial phase I: Layer#370

Open
teunbrand wants to merge 27 commits intoposit-dev:mainfrom
teunbrand:spatial_start
Open

Spatial phase I: Layer#370
teunbrand wants to merge 27 commits intoposit-dev:mainfrom
teunbrand:spatial_start

Conversation

@teunbrand
Copy link
Copy Markdown
Collaborator

This PR advances #336.

It introduces the spatial layer with draws the simple feature geometry.
Internally, the geometry column is converted to WKB, which is handled via a dialect.
The vegalite writer converts the WKB to GeoJSON.
Like ggplot2, detection of the geometry column is automated for common cases.
This PR also adds the ggsql:world built-in dataset that contains the Natural Earth 110m dataset with selected columns.

Noteably absent in this PR is the whole projection ordeal.

teunbrand and others added 26 commits April 22, 2026 14:05
Registers GeomType::Spatial across the parser, AST, and builder.
The spatial geom requires a `geometry` aesthetic and supports
fill, stroke, opacity, linewidth, and linetype.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… SpatialRenderer

- Feature-gated `spatial` (default-on) with geozero + hex dependencies
- Auto-detect GEOMETRY columns via DESCRIBE for `DRAW spatial` layers
- SpatialRenderer converts WKB hex / GeoJSON strings to GeoJSON Features
- Vega-Lite geoshape mark with proper encoding channel handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip geometry aesthetic in encoding builder (structural, not visual)
- Remove SpatialRenderer::modify_encoding stub (default suffices)
- Remove geometry column auto-detection (revisit after Arrow migration)
- Add end-to-end tests: GeoJSON features, WKB hex, mixed layers
- Add execute test for explicit geometry mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relax the grammar's other_sql_statement rule to accept any non-delimiter
tokens, so statements like INSTALL/LOAD/SET/ATTACH parse without error.
Execute these setup statements before the main query in the pipeline.
Flip DDL detection in DuckDB and SQLite readers to a returns_rows
whitelist, so unknown statement types are handled gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…query_arrow

DuckDB's query_arrow API exports results directly as Arrow RecordBatches,
eliminating the manual row-by-row type mapping. Extension types like
GEOMETRY now flow through as native Binary columns instead of hitting a
lossy string fallback. Decimal128 columns are normalized to Float64 at
the boundary for downstream compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… materialisation

Replaces the execute_sql() → register() two-step with direct
CREATE TEMP TABLE AS DDL. This keeps data inside the database engine
and preserves native types (e.g. DuckDB GEOMETRY) that were lost
during the Arrow materialisation round-trip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y tests

Only apply ST_AsWKB when the geometry column is Binary (native DuckDB
GEOMETRY via Arrow). String columns (GeoJSON, WKB hex) pass through
to the writer unchanged.

Adds tests using INSTALL spatial; LOAD spatial with ST_GeomFromText
to verify the full native GEOMETRY pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove GeoJSON/WKB hex string handling from the writer and stat
transform. Spatial data should come from native GEOMETRY columns
(via ST_Read, ST_GeomFromText, etc.), not string columns.

Drops hex crate dependency from the spatial feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sql_geometry_to_wkb() to SqlDialect trait with ST_AsBinary
default (OGC standard). DuckDB overrides with ST_AsWKB. The spatial
stat transform uses the dialect instead of hardcoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Natural Earth 110m country boundaries as geoparquet. Columns: name,
iso_a3, continent, subregion, income_group, population, gdp, geom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spatial stat transform now calls dialect.sql_spatial_setup() before
emitting ST_AsWKB, so users no longer need manual LOAD spatial statements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a geom declares a geometry aesthetic and the user hasn't mapped it
explicitly, scan the schema for a column with a conventional geometry
name (geom, geometry, wkb_geometry, the_geom, shape) and binary type.
Requires exactly one match to avoid ambiguity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB reads geoparquet geometry columns as BLOB when the spatial
extension is not loaded, making ST_AsWKB fail later. Pre-load spatial
when the world dataset is referenced so the column is read as GEOMETRY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@teunbrand teunbrand marked this pull request as ready for review May 5, 2026 12:48
@teunbrand teunbrand requested a review from thomasp85 May 5, 2026 12:49
Copy link
Copy Markdown
Collaborator

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in general looks good. Most comments are cosmetic.

One thing we kinda ignore here, and maybe that is for the future projection PR, but do we need a geometry scale? Or do you scale latitude and longitude (defined by the projection), and the geometry mapping just sort of follows?

> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

The spatial layer is used to render geographic geometries consisting of polygons, lines and points used to make maps like choropleths.
It differs from other layers in that uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It differs from other layers in that uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes.
It differs from other layers in that it uses a special [simple features](https://en.wikipedia.org/wiki/Simple_Features) geometry column that defines the shapes.


```{ggsql}
VISUALISE FROM ggsql:world
DRAW spatial
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DRAW spatial
DRAW spatial

```{ggsql}
SELECT geom AS foo FROM ggsql:world
VISUALISE
DRAW spatial MAPPING foo AS geometry
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DRAW spatial MAPPING foo AS geometry
DRAW spatial
MAPPING foo AS geometry


```{ggsql}
VISUALISE FROM ggsql:world
DRAW spatial FILTER continent == 'Asia'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DRAW spatial FILTER continent == 'Asia'
DRAW spatial
FILTER continent == 'Asia'


```{ggsql}
VISUALISE FROM ggsql:world
DRAW spatial
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DRAW spatial
DRAW spatial

Comment on lines +81 to +83
DRAW spatial
MAPPING population AS fill
SETTING opacity => 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DRAW spatial
MAPPING population AS fill
SETTING opacity => 1
DRAW spatial
MAPPING population AS fill
SETTING opacity => 1

Comment thread src/execute/mod.rs
fn looks_like_geometry(name: &str) -> bool {
matches!(
name.to_lowercase().as_str(),
"geom" | "geometry" | "wkb_geometry" | "the_geom" | "shape"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these based on use in the wild? "the_geom" sounds made up 🫠

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 100% based on what claude decided might be reasonable. I didn't ask it for receipts, but it does seem to be out there in the wild:
https://gis.stackexchange.com/questions/49455/geometry-column-naming-convention-geom-or-the-geom

Comment thread src/execute/mod.rs
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is a geometry type with another name? Shouldn't that win over the "standard" name?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

Comment thread src/reader/data.rs
Comment on lines 72 to 88
pub fn register_builtin_datasets_duckdb(
sql: &str,
conn: &duckdb::Connection,
) -> Result<(), GgsqlError> {
use std::{env, fs};

let dataset_names = extract_builtin_dataset_names(sql)?;

// Load spatial extension before registering datasets that contain
// geometry columns, so DuckDB reads them as GEOMETRY rather than BLOB.
if dataset_names.iter().any(|n| n == "world") {
let _ = conn.execute("LOAD spatial", duckdb::params![]);
}

for name in dataset_names {
let Some(parquet_bytes) = builtin_parquet_bytes(&name) else {
continue;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of this PR but since you touch it - this whole function should be moved to the duckdb.rs

Copy link
Copy Markdown
Collaborator

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a comment from Claude to consider:

GeoJSON reserved-key collisions in SpatialRenderer::prepare_data (src/writer/vegalite/layer.rs:2239):

properties.insert(col_name.clone(), value.clone()); 
feature.insert(col_name.clone(), value); 

Every non-geometry column is also splatted into the feature top level. If a user has a column named type, this overwrites "Feature" and Vega-Lite stops treating it as a feature. properties, bbox, id collide too. At minimum, skip those reserved keys when copying to the top level — or just drop the top-level copy (see next point).

Worth fixing while you're here

Properties stored twice in the inline data (same lines as above). Each non-geometry column ends up both at feature.X and feature.properties.X. Vega-Lite resolves bare field: "X" references against datum.X, which is the top-level form — so the properties map is unused output. Drop the properties map (simpler, matches how the rest of the codebase emits row data), or drop the top-level copy and use from: { data: ..., fields: ["properties.X"] } style. Either way, half the geometry-row JSON goes away.

Comment thread Cargo.toml

# Spatial
geozero = { version = "0.14", default-features = false }
hex = "0.4"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be removed?

Comment thread src/reader/duckdb.rs
}

fn sql_spatial_setup(&self) -> Vec<String> {
vec!["LOAD spatial".into()]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if spatial is not installed? IS it installed on first load?

Comment thread src/execute/mod.rs
return Some(candidates[0].name.clone());
}

// Fall back to name-only match (e.g. extension types we don't recognise)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that name only should never "win" since this could map to a column that doesn't support ST_asWKB() and end up with a super confusing error.

The logic should be: Any column holding valid geometry. If multiple of these chose the one with standard name, if no standard naming then the first

```{ggsql}
VISUALISE FROM ggsql:world
DRAW spatial
FILTER ST_Intersects(geom, ST_MAkeEnvelope(-20.0, -35.0, 55.0, 38.0))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FILTER ST_Intersects(geom, ST_MAkeEnvelope(-20.0, -35.0, 55.0, 38.0))
FILTER ST_Intersects(geom, ST_MakeEnvelope(-20.0, -35.0, 55.0, 38.0))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants