Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
0d6d157
feat(python): add ggsql-python package with PyO3 bindings
cpsievert Jan 20, 2026
994062e
Add dev dependency group
cpsievert Jan 21, 2026
9b3370d
fix: exclude ggsql-python from default cargo build
cpsievert Jan 22, 2026
a7b331c
style: format ggsql-python
cpsievert Jan 22, 2026
0f725c2
refactor: use pyproject.toml extras for CI dependencies
cpsievert Jan 22, 2026
757da46
fix: Python CI workflow improvements
cpsievert Jan 22, 2026
a374a66
fix: handle narwhals DataFrames and use correct global data key
cpsievert Jan 22, 2026
ba4633e
feat: use narwhals for DataFrame conversion
cpsievert Jan 22, 2026
f0cf267
fix: correct wheel path in Python CI workflow
cpsievert Jan 22, 2026
f19ecc4
chore: drop pyarrow dependency
cpsievert Jan 22, 2026
9a46b9a
fix: restore pyarrow dependency required by pyo3-polars
cpsievert Jan 22, 2026
cbc08e2
fix: commit tree-sitter generated files for Windows CI
cpsievert Jan 22, 2026
f951bfc
Add a basic .gitignore
cpsievert Jan 22, 2026
4b4e7aa
feat(python): return altair.Chart from render()
cpsievert Jan 22, 2026
23053a2
fix(python): add runtime validation for writer parameter
cpsievert Jan 22, 2026
abc2ff6
refactor(python): consolidate tests and focus on Python logic
cpsievert Jan 22, 2026
67cc299
refactor(python): rename render() to render_altair()
cpsievert Jan 22, 2026
0d0d5f3
refactor(python): remove pyarrow dependency, use IPC for data transfer
cpsievert Jan 22, 2026
e351fab
style(python): fix Rust formatting
cpsievert Jan 22, 2026
9c974d5
docs: add Python bindings section to CLAUDE.md
cpsievert Jan 22, 2026
8ffdc1a
fix: add tree-sitter-cli to CI workflows for Windows compatibility
cpsievert Jan 27, 2026
d0bb9fe
fix(ci): skip doc tests to avoid linker memory issues
cpsievert Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
name: Python

on:
push:
paths: ['ggsql-python/**', '.github/workflows/python.yml']
pull_request:
paths: ['ggsql-python/**', '.github/workflows/python.yml']

jobs:
test:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python: ['3.10', '3.11', '3.12', '3.13']
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: ggsql-python
shared-key: ${{ matrix.os }}-python

- name: Build wheel
uses: PyO3/maturin-action@v1
with:
working-directory: ggsql-python
command: build
args: --release
sccache: true

- name: Install wheel and test
shell: bash
run: |
pip install --find-links target/wheels/ ggsql[test]
pytest ggsql-python/tests/test_ggsql.py -v

e2e-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: '3.13'

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: ggsql-python
shared-key: ubuntu-latest-python

- name: Build wheel
uses: PyO3/maturin-action@v1
with:
working-directory: ggsql-python
command: build
args: --release
sccache: true

- name: Install wheel and E2E dependencies
shell: bash
run: pip install --find-links target/wheels/ ggsql[test,e2e]

- name: Run E2E tests
shell: bash
run: pytest ggsql-python/tests/test_altair_e2e.py -v

lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Check Rust formatting
run: cargo fmt --package ggsql-python -- --check

- name: Clippy
run: cargo clippy --package ggsql-python -- -D warnings
6 changes: 0 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,6 @@ Cargo.lock
ehthumbs.db
Thumbs.db

# Tree-sitter generated files
/tree-sitter-ggsql/src/parser.c
/tree-sitter-ggsql/src/tree_sitter/
/tree-sitter-ggsql/src/node-types.json
/tree-sitter-ggsql/src/grammar.json

Comment on lines -27 to -32
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was needed since tree-sitter generate was failing on Windows. Claude offered a few different ways to fix this, but recommended this approach:

Approach Repo Size Build Speed Reliability Complexity
Commit generated files +1.3MB Fast High Low
CI installs tree-sitter-cli Small Slow Medium Low
tree-sitter-cli crate Small Slow High Medium
Conditional build.rs Small* Fast* Medium Medium

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the issue is simply that tree sitter is not installed in CI, I'd rather not check in the generated files and instead install it on Windows using something like https://github.com/tree-sitter/setup-action/tree/master.

Those generated files can get very large as grammars get complex, and I'd like to avoid the noise if we can.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do recall having to do a little 'nvm' song and dance to please installation on windows, but I've forgotten the details already.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8ffdc1a

# Node.js (for tree-sitter CLI)
node_modules/
npm-debug.log*
Expand Down
8 changes: 8 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
[workspace]
members = [
"tree-sitter-ggsql",
"src",
"ggsql-jupyter",
"ggsql-python"
]
# ggsql-python is excluded from default builds because it's a PyO3 extension
# that requires Python dev headers. Build it separately with maturin.
default-members = [
"tree-sitter-ggsql",
"src",
"ggsql-jupyter"
Expand Down
19 changes: 19 additions & 0 deletions ggsql-python/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[package]
name = "ggsql-python"
version = "0.1.0"
edition = "2021"
license = "MIT"
description = "Python bindings for ggsql"

[lib]
name = "_ggsql"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.26", features = ["extension-module"] }
pyo3-polars = { version = "0.25", features = ["dtype-decimal", "dtype-struct"] }
polars.workspace = true
ggsql = { path = "../src", default-features = false, features = ["vegalite"] }

[features]
default = []
156 changes: 156 additions & 0 deletions ggsql-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# ggsql

Python bindings for [ggsql](https://github.com/georgestagg/ggsql), a SQL extension for declarative data visualization.

This package provides a thin wrapper around the Rust `ggsql` crate, enabling Python users to render Vega-Lite visualizations from polars DataFrames using ggsql's VISUALISE syntax.

## Installation

### From PyPI (when published)

```bash
pip install ggsql
```

### From source

Building from source requires:
- Rust toolchain (install via [rustup](https://rustup.rs/))
- Python 3.10+
- [maturin](https://github.com/PyO3/maturin)

```bash
# Clone the monorepo
git clone https://github.com/georgestagg/ggsql.git
cd ggsql/ggsql-python

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows

# Install build dependencies
pip install maturin

# Build and install in development mode
maturin develop

# Or build a wheel
maturin build --release
pip install target/wheels/ggsql-*.whl
```

## Usage

```python
import ggsql
import polars as pl

# Split a ggSQL query into SQL and VISUALISE portions
sql, viz = ggsql.split_query("""
SELECT date, revenue, region FROM sales
WHERE year = 2024
VISUALISE date AS x, revenue AS y, region AS color
DRAW line
LABEL title => 'Sales Trends'
""")

# Execute SQL with your preferred tool
import duckdb
df = duckdb.sql(sql).pl()

# Render DataFrame + VISUALISE spec to Vega-Lite JSON
vegalite_json = ggsql.render(df, viz)
```

### Mapping styles

The `render()` function supports various mapping styles:

```python
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30], "category": ["A", "B", "A"]})

# Explicit mapping
ggsql.render(df, "VISUALISE x AS x, y AS y DRAW point")

# Implicit mapping (column name = aesthetic name)
ggsql.render(df, "VISUALISE x, y DRAW point")

# Wildcard mapping (map all matching columns)
ggsql.render(df, "VISUALISE * DRAW point")

# With color encoding
ggsql.render(df, "VISUALISE x, y, category AS color DRAW point")
```

## API

### `split_query(query: str) -> tuple[str, str]`

Split a ggSQL query into SQL and VISUALISE portions.

**Parameters:**
- `query`: The full ggSQL query string

**Returns:**
- Tuple of `(sql_portion, visualise_portion)`

**Raises:**
- `ValueError`: If the query cannot be parsed

### `render(df, viz, *, writer="vegalite") -> str`

Render a DataFrame with a VISUALISE specification.

**Parameters:**
- `df`: Any narwhals-compatible DataFrame (polars, pandas, etc.). LazyFrames are collected automatically.
- `viz`: The VISUALISE specification string
- `writer`: Output format, currently only `"vegalite"` is supported

**Returns:**
- JSON string containing the Vega-Lite specification

**Raises:**
- `ValueError`: If the spec cannot be parsed or rendered

## Development

### Keeping in sync with the monorepo

The `ggsql-python` package is part of the [ggsql monorepo](https://github.com/georgestagg/ggsql) and depends on the Rust `ggsql` crate via a path dependency. When the Rust crate is updated, you may need to rebuild:

```bash
cd ggsql-python

# Rebuild after Rust changes
maturin develop

# If tree-sitter grammar changed, clean and rebuild
cd .. && cargo clean -p tree-sitter-ggsql && cd ggsql-python
maturin develop
```

### Running tests

```bash
# Install test dependencies
pip install pytest altair

# Run unit tests
pytest tests/test_ggsql.py -v

# Run E2E tests with altair
pytest tests/test_altair_e2e.py -v

# Run all tests
pytest tests/ -v
```

## Requirements

- Python >= 3.10
- polars >= 1.0
- narwhals >= 2.15

## License

MIT
37 changes: 37 additions & 0 deletions ggsql-python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[build-system]
requires = ["maturin>=1.4"]
build-backend = "maturin"

[project]
name = "ggsql"
version = "0.1.0"
description = "SQL extension for declarative data visualization"
readme = "README.md"
requires-python = ">=3.10"
license = { text = "MIT" }
keywords = ["sql", "visualization", "vega-lite", "grammar-of-graphics"]
classifiers = [
"Programming Language :: Rust",
"Programming Language :: Python :: Implementation :: CPython",
]
dependencies = [
"narwhals>=2.15.0",
"polars>=1.0",
"pyarrow>=14.0",
]

[project.optional-dependencies]
test = ["pytest>=7.0"]
e2e = ["altair>=5.0"]
dev = ["maturin>=1.4"]

[tool.maturin]
features = ["pyo3/extension-module"]
python-source = "python"
module-name = "ggsql._ggsql"

[dependency-groups]
dev = [
"maturin>=1.11.5",
"pytest>=9.0.2",
]
48 changes: 48 additions & 0 deletions ggsql-python/python/ggsql/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from __future__ import annotations
from typing import Literal

import narwhals as nw
from narwhals.typing import IntoFrame

from ggsql._ggsql import split_query, render as _render

__all__ = ["split_query", "render"]
__version__ = "0.1.0"


def render(
df: IntoFrame,
viz: str,
*,
writer: Literal["vegalite"] = "vegalite",
) -> str:
"""Render a DataFrame with a VISUALISE spec.

Parameters
----------
df
Data to visualize. Accepts polars, pandas, or any narwhals-compatible
DataFrame. LazyFrames are collected automatically.
viz
VISUALISE spec string (e.g., "VISUALISE x, y DRAW point")
writer
Output format. Currently only "vegalite" supported.

Returns
-------
str
Vega-Lite JSON specification.
"""

df = nw.from_native(df, pass_through=True)

if isinstance(df, nw.LazyFrame):
df = df.collect()

if not isinstance(df, nw.DataFrame):
raise TypeError("df must be a narwhals DataFrame or compatible type")

# Should be safe as long as we take polars dependency
pl_df = df.to_polars()

return _render(pl_df, viz, writer=writer)
1 change: 1 addition & 0 deletions ggsql-python/python/ggsql/py.typed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# PEP 561 marker file
Loading