Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions skills/hotdata-analytics/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ version: 0.5.0

```bash
hotdata query "<sql>" [--workspace-id <workspace_id>] [--database <database>] [--output table|json|csv]
hotdata query status <query_run_id> [--output table|json|csv]
hotdata query status <query_run_id>
```

- **PostgreSQL dialect.** Quote mixed-case identifiers: `"CustomerName"`.
Expand Down Expand Up @@ -65,6 +65,7 @@ hotdata results <result_id> [--workspace-id <workspace_id>] [--output table|json

- Prefer **`results <id>`** over re-running identical heavy queries.
- Query footers may include `[result-id: rslt...]`; also available from `queries <query_run_id>`.
- `results list --limit` defaults to **100** (max **1000**) — unlike `queries list`, which defaults to **20**.

---

Expand Down Expand Up @@ -103,9 +104,9 @@ Full procedure: [references/WORKFLOWS.md](references/WORKFLOWS.md).
For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hotdata-search`**):

```bash
hotdata indexes create --catalog <connection-name-or-id> --schema <schema> --table <table> \
hotdata indexes create --catalog <catalog-alias> --schema <schema> --table <table> \
--name idx_orders_created --column created_at --type sorted [--async]
```

List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here.
List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here. With `--async`, track the build via **`hotdata jobs list`** (see **`hotdata`** skill → Jobs).

2 changes: 1 addition & 1 deletion skills/hotdata-analytics/references/WORKFLOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ hotdata query "SELECT * FROM chain_db.public.revenue_slice WHERE ..."

### Naming and documentation

- Prefer predictable `--name` values: `chain_<topic>_<YYYYMMDD>`.
- Prefer predictable `--catalog` / `--table` values, e.g. catalog `chain_db` + table `chain_<topic>_<YYYYMMDD>` (the chain table is named by `databases load --table`, not a `--name` flag).
- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`database.schema.table`).
- Promote join/grain findings to **context:DATAMODEL** when they should be shared or persisted (**`hotdata`** skill).

Expand Down
230 changes: 56 additions & 174 deletions skills/hotdata-geospatial/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,34 @@ version: 0.5.0

# Hotdata Geospatial Skill

Use this skill when working with geospatial data in Hotdata. Hotdata supports a subset of PostGIS-style functions using **PostgreSQL dialect SQL**. This reference is dataset-agnostic — apply it to any table with geometry columns.
Hotdata supports a subset of PostGIS-style functions in **PostgreSQL-dialect SQL**. This skill is dataset-agnostic — apply it to any table with geometry columns.

**Related skills:** **`hotdata`** (core CLI), **`hotdata-search`** (BM25/vector), **`hotdata-analytics`** (OLAP SQL).
**Requires the core `hotdata` skill** for auth, workspace, and table discovery. **Related:** **`hotdata-analytics`** (OLAP SQL), **`hotdata-search`** (BM25/vector).

## Running these queries

All SQL below runs through the core CLI:

```bash
hotdata query "<sql>" [--workspace-id <id>] [--database <db>] [--output table|json|csv]
```

- **Fully qualify tables** as `<connection>.<schema>.<table>` (or `<catalog>.<schema>.<table>` for a managed database) — every `<table>` placeholder below means a qualified name.
- **PostgreSQL dialect:** double-quote any non-lowercase identifier (e.g. `"GeoID"`).
- Discover candidate tables/columns with **`hotdata tables list --connection-id <id>`** (see core skill).

---

## Geometry Columns
## Geometry columns

Most geospatial datasets in Hotdata carry one or both of:
Most geospatial tables in Hotdata carry one or both of:

| Column | Type | Description |
|---|---|---|
| `wkb_geometry` | `Binary` | WKB-encoded geometry (polygon, point, multipolygon, etc.) |
| `wkb_geometry_bbox` | `Struct` | Precomputed bounding box with fields `xmin`, `ymin`, `xmax`, `ymax` (Float32) |
| `wkb_geometry_bbox` | `Struct` | Precomputed bounding box: `xmin`, `ymin`, `xmax`, `ymax` (Float32) |

**Always parse `wkb_geometry` with `ST_GeomFromWKB()` before using it in any spatial function:**
**Parse `wkb_geometry` with `ST_GeomFromWKB()`** before any spatial function:

```sql
ST_GeomFromWKB(wkb_geometry)
Expand All @@ -34,98 +46,27 @@ wkb_geometry_bbox['xmin'] -- ✓ works
(wkb_geometry_bbox).xmin -- ✗ not supported
```

Discover geometry columns with:

```sql
hotdata tables list --connection-id <id>
```
Find these columns by their `Binary` / `Struct` types in `hotdata tables list --connection-id <id>`.

---

## Supported Functions

### Input / Construction

| Function | Example |
|---|---|
| `ST_GeomFromWKB(col)` | `ST_GeomFromWKB(wkb_geometry)` |
| `ST_GeomFromText(wkt)` | `ST_GeomFromText('POLYGON((...))')` |
| `ST_MakePoint(lon, lat)` | `ST_MakePoint(-122.27, 37.80)` |

### Output

| Function | Example |
|---|---|
| `ST_AsText(geom)` | `ST_AsText(ST_GeomFromWKB(wkb_geometry))` → WKT string |
| `ST_AsBinary(geom)` | `ST_AsBinary(ST_GeomFromWKB(wkb_geometry))` → WKB binary |

### Accessors / Inspection

| Function | Returns |
|---|---|
| `ST_GeometryType(geom)` | e.g. `ST_Polygon`, `ST_MultiPolygon`, `ST_Point` |
| `ST_IsValid(geom)` | boolean |
| `ST_NumPoints(geom)` | integer |
| `ST_NPoints(geom)` | integer (alias for ST_NumPoints) |
| `ST_X(point)` | longitude (float) |
| `ST_Y(point)` | latitude (float) |
| `ST_Centroid(geom)` | point geometry |
## Functions

### Measurement
Common building blocks (full catalog, the unsupported set + workarounds, and unit conversions: **[references/functions.md](references/functions.md)**):

| Function | Unit | Notes |
|---|---|---|
| `ST_Area(geom)` | degrees² | Multiply by `111000 * 111000` for m², then `* 10.7639` for ft² |
| `ST_Length(geom)` | degrees | Multiply by `111000` for approximate meters |
| `ST_Distance(geom_a, geom_b)` | degrees | Multiply by `111000` for approximate meters |

> **No meter-native measurements:** `::geography` cast is not supported. All measurements are in decimal degrees. The conversion factor ~111,000 m/degree is accurate at mid-latitudes (~30–50°N/S) and degrades toward the poles.

### Spatial Relationships

All return `boolean`:

| Function | Meaning |
|---|---|
| `ST_Within(a, b)` | `a` is completely inside `b` |
| `ST_Contains(a, b)` | `a` contains `b` |
| `ST_Covers(a, b)` | `a` covers `b` (includes boundary) |
| `ST_CoveredBy(a, b)` | `a` is covered by `b` |
| `ST_Intersects(a, b)` | geometries share any space |
| `ST_Overlaps(a, b)` | geometries overlap (same dimension) |
| `ST_Touches(a, b)` | share boundary only, no interior overlap |
| `ST_Crosses(a, b)` | geometries cross (different dimensions) |
| `ST_Disjoint(a, b)` | geometries share no space |
| `ST_Equals(a, b)` | geometries are spatially identical |

### Processing / Geometry Operations

| Function | Notes |
|---|---|
| `ST_ConvexHull(geom)` | Returns convex hull polygon |
| `ST_Simplify(geom, tolerance)` | Douglas-Peucker simplification; tolerance in degrees |
| `ST_OrientedEnvelope(geom)` | Minimum oriented bounding box |
- **Construct:** `ST_GeomFromWKB`, `ST_GeomFromText('POLYGON((...))')`, `ST_MakePoint(lon, lat)`
- **Inspect:** `ST_GeometryType`, `ST_IsValid`, `ST_X` / `ST_Y`, `ST_Centroid`, `ST_AsText`
- **Relate (boolean):** `ST_Within`, `ST_Contains`, `ST_Covers`, `ST_Intersects`, `ST_Overlaps`, `ST_Touches`, `ST_Disjoint`, `ST_Equals`
- **Measure (in degrees):** `ST_Distance`, `ST_Length`, `ST_Area`
- **Process:** `ST_ConvexHull`, `ST_Simplify(geom, tol)`, `ST_OrientedEnvelope`

---

## Not Supported

| Category | Not Supported | Workaround |
|---|---|---|
| Output | `ST_AsGeoJSON`, `ST_AsEWKT` | Use `ST_AsText`; parse WKT client-side |
| Cast | `::geography` | Multiply degrees by ~111,000 for meters |
| Input | `ST_MakeEnvelope`, `ST_GeomFromGeoJSON`, `ST_MakeLine` | Use `ST_GeomFromText('POLYGON(...)')` for envelopes |
| Accessors | `ST_SRID`, `ST_IsEmpty`, `ST_NumGeometries`, `ST_GeometryN`, `ST_ExteriorRing`, `ST_PointN`, `ST_StartPoint`, `ST_EndPoint` | — |
| Measurement | `ST_Perimeter`, `ST_MaxDistance` | — |
| Relationships | `ST_DWithin` | Use `ST_Within` + `ST_GeomFromText('POLYGON(...)')` |
| Processing | `ST_Buffer`, `ST_Envelope`, `ST_Boundary`, `ST_Union`, `ST_Intersection`, `ST_Difference`, `ST_SymDifference`, `ST_Collect`, `ST_ClosestPoint`, `ST_Snap`, `ST_BoundingDiagonal`, `ST_Expand` | Use `ST_OrientedEnvelope` instead of `ST_Envelope` |
| Projection | `ST_Transform`, `ST_SetSRID`, `ST_FlipCoordinates` | — |
**Key limits** (see reference for the full list + workarounds): no `::geography` cast — **measurements are in decimal degrees**, convert with the factors in the reference (distance `× 111000` ≈ meters at mid-latitudes; **area is only order-of-magnitude**). No `ST_Buffer`, `ST_DWithin`, `ST_MakeEnvelope`, `ST_Union`, `ST_Transform`, or GeoJSON I/O — use the documented substitutes.

---

## Common Patterns
## Common patterns

### Check geometry types in a table
### Geometry types in a table

```sql
SELECT ST_GeometryType(ST_GeomFromWKB(wkb_geometry)) AS geom_type, COUNT(*)
Expand All @@ -134,9 +75,9 @@ WHERE wkb_geometry IS NOT NULL
GROUP BY 1
```

### Bounding box filter (replaces ST_MakeEnvelope / ST_DWithin)
### Bounding-box filter (replaces ST_MakeEnvelope / ST_DWithin)

Use `ST_GeomFromText` with a closed WKT polygon ring:
Use `ST_GeomFromText` with a **closed** WKT polygon ring (repeat the first vertex):

```sql
WHERE ST_Within(
Expand All @@ -145,9 +86,7 @@ WHERE ST_Within(
)
```

**Vertex order:** `(minLon minLat, maxLon minLat, maxLon maxLat, minLon maxLat, minLon minLat)` — close the ring by repeating the first point.

**Faster alternative** using the precomputed bbox struct (no WKB parsing):
**Faster** on large tables — filter the precomputed bbox struct (no WKB parsing); use the `ST_Within` form when you need centroid-in-polygon precision:

```sql
WHERE wkb_geometry_bbox['xmin'] >= <minLon>
Expand All @@ -156,129 +95,72 @@ WHERE wkb_geometry_bbox['xmin'] >= <minLon>
AND wkb_geometry_bbox['ymax'] <= <maxLat>
```

Use the bbox approach for large tables where WKB parsing is expensive; use `ST_Within` when you need centroid-in-polygon precision.

### Point-in-polygon test
### Point-in-polygon

```sql
SELECT *
FROM <table>
WHERE ST_Contains(
ST_GeomFromWKB(wkb_geometry),
ST_MakePoint(<lon>, <lat>)
)
WHERE ST_Contains(ST_GeomFromWKB(wkb_geometry), ST_MakePoint(<lon>, <lat>))
```

### Nearest neighbors (closest N features to a point)
### Nearest neighbors (closest N to a point)

```sql
SELECT
<id_col>,
ST_Distance(
ST_Centroid(ST_GeomFromWKB(wkb_geometry)),
ST_MakePoint(<lon>, <lat>)
) * 111000 AS dist_meters
SELECT <id_col>,
ST_Distance(ST_Centroid(ST_GeomFromWKB(wkb_geometry)), ST_MakePoint(<lon>, <lat>)) * 111000 AS dist_meters
FROM <table>
WHERE wkb_geometry IS NOT NULL
ORDER BY dist_meters
LIMIT 10
```

### Distance between two known points
### Distance between two points

```sql
SELECT
ST_Distance(ST_MakePoint(<lon1>, <lat1>), ST_MakePoint(<lon2>, <lat2>)) * 111000 AS dist_meters,
ST_Distance(ST_MakePoint(<lon1>, <lat1>), ST_MakePoint(<lon2>, <lat2>)) * 69.0 AS dist_miles
```

### Area of polygon features
### Area of polygons (order-of-magnitude — see reference caveat)

```sql
SELECT
<id_col>,
ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 AS area_sqm,
ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 * 10.7639 AS area_sqft,
ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 / 4047 AS area_acres
SELECT <id_col>,
ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 AS area_sqm,
ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 / 4047 AS area_acres
FROM <table>
WHERE wkb_geometry IS NOT NULL
```

### Centroid coordinates

```sql
SELECT
<id_col>,
SELECT <id_col>,
ST_X(ST_Centroid(ST_GeomFromWKB(wkb_geometry))) AS lon,
ST_Y(ST_Centroid(ST_GeomFromWKB(wkb_geometry))) AS lat
FROM <table>
WHERE wkb_geometry IS NOT NULL
```

### Convert to WKT for export or inspection

```sql
SELECT <id_col>, ST_AsText(ST_GeomFromWKB(wkb_geometry)) AS wkt
FROM <table>
WHERE wkb_geometry IS NOT NULL
LIMIT 10
```

### Simplify geometry for faster rendering
### Export / simplify as WKT

```sql
SELECT <id_col>, ST_AsText(ST_Simplify(ST_GeomFromWKB(wkb_geometry), 0.0001)) AS simplified_wkt
FROM <table>
WHERE wkb_geometry IS NOT NULL
-- raw WKT
SELECT <id_col>, ST_AsText(ST_GeomFromWKB(wkb_geometry)) AS wkt FROM <table> WHERE wkb_geometry IS NOT NULL LIMIT 10
-- simplified (tolerance in degrees, ~11 m at mid-latitudes)
SELECT <id_col>, ST_AsText(ST_Simplify(ST_GeomFromWKB(wkb_geometry), 0.0001)) AS wkt FROM <table> WHERE wkb_geometry IS NOT NULL
```

Tolerance is in degrees (~11 m at mid-latitudes). Increase for coarser simplification, decrease for finer.

---

## Unit Conversion Reference

| To get | Multiply degrees by |
|---|---|
| Meters (distance) | × 111,000 |
| Kilometers (distance) | × 111 |
| Miles (distance) | × 69.0 |
| Feet (distance) | × 364,173 |
| m² (area) | × 111,000² = × 12,321,000,000 |
| ft² (area) | × 111,000² × 10.7639 |
| Acres (area) | × 111,000² ÷ 4,047 |

> These conversions assume ~37°N latitude. They are approximations — accuracy decreases significantly above 60°N or below 60°S.

---

## Workflow: Exploring a New Geospatial Dataset

1. **Check for geometry columns:**
```
hotdata tables list --connection-id <id>
```
Look for `Binary` (WKB) or `Struct` (bbox) typed columns.

2. **Verify geometry types:**
```sql
SELECT ST_GeometryType(ST_GeomFromWKB(wkb_geometry)) AS type, COUNT(*)
FROM <table> WHERE wkb_geometry IS NOT NULL GROUP BY 1
```

3. **Check coverage (bounding box of entire dataset):**
```sql
SELECT
MIN(wkb_geometry_bbox['xmin']) AS min_lon,
MIN(wkb_geometry_bbox['ymin']) AS min_lat,
MAX(wkb_geometry_bbox['xmax']) AS max_lon,
MAX(wkb_geometry_bbox['ymax']) AS max_lat
FROM <table>
WHERE wkb_geometry_bbox IS NOT NULL
```
## Workflow: explore a new geospatial table

4. **Sample WKT to understand geometry structure:**
1. **Find geometry columns** — `hotdata tables list --connection-id <id>`; look for `Binary` (WKB) / `Struct` (bbox) types.
2. **Geometry types** — run the "Geometry types in a table" pattern above.
3. **Coverage / extent** — aggregate the bbox struct:
```sql
SELECT ST_AsText(ST_GeomFromWKB(wkb_geometry)) FROM <table>
WHERE wkb_geometry IS NOT NULL LIMIT 3
SELECT MIN(wkb_geometry_bbox['xmin']) AS min_lon, MIN(wkb_geometry_bbox['ymin']) AS min_lat,
MAX(wkb_geometry_bbox['xmax']) AS max_lon, MAX(wkb_geometry_bbox['ymax']) AS max_lat
FROM <table> WHERE wkb_geometry_bbox IS NOT NULL
```
4. **Sample WKT** — run the "Export as WKT" pattern with `LIMIT 3` to see geometry structure.
Loading
Loading