From 1097880814bacc73ab4271eb155ebf93076edf22 Mon Sep 17 00:00:00 2001 From: Eddie A Tejeda <669988+eddietejeda@users.noreply.github.com> Date: Fri, 19 Jun 2026 17:12:42 -0700 Subject: [PATCH] docs(skills): improve accuracy, structure, and consistency across CLI skills A quality pass over all four agent skills, plus the recent-change updates (#170 unscoped indexes list incl. managed DBs; #167 KEDA cold-start note). Accuracy: - Drop the nonexistent `query status --output` flag (core + analytics). - Add `update` to the core top-level subcommand list. - Document `results list --limit` (default 100, max 1000) vs `queries list` (20). - Split `indexes list` (optional filters) from `indexes delete` (requires --connection-id + --schema + --table + --name). - Standardize `--catalog` as a catalog alias; fix Chain naming (uses --table/--catalog, not a --name flag); note embedding-providers create --output. Structure / progressive disclosure: - Reshape hotdata-geospatial (285 -> 166 lines): add the missing `hotdata query` execution layer + table-qualification/quoting note, sharpen the area-conversion caveat, and move the function catalog + unit tables to references/functions.md. - Slim core hotdata (373 -> 299 lines): collapse the ~5x-repeated context/DATAMODEL material into one section, de-bold prose, drop trailing workflow blocks already covered by references/WORKFLOWS.md. Consistency: - Unify context scoping language as database-scoped (was "workspace") across the core reference files. Verified all referenced commands/flags against the built CLI; all reference links resolve. Net ~184 fewer lines with higher accuracy and signal. --- skills/hotdata-analytics/SKILL.md | 7 +- .../hotdata-analytics/references/WORKFLOWS.md | 2 +- skills/hotdata-geospatial/SKILL.md | 230 +++++------------- .../references/functions.md | 94 +++++++ skills/hotdata-search/SKILL.md | 15 +- skills/hotdata-search/references/INDEXES.md | 2 +- skills/hotdata/SKILL.md | 102 ++------ .../hotdata/references/DATA_MODEL.template.md | 4 +- skills/hotdata/references/MODEL_BUILD.md | 8 +- skills/hotdata/references/WORKFLOWS.md | 2 +- 10 files changed, 188 insertions(+), 278 deletions(-) create mode 100644 skills/hotdata-geospatial/references/functions.md diff --git a/skills/hotdata-analytics/SKILL.md b/skills/hotdata-analytics/SKILL.md index da479ed..83ba06a 100644 --- a/skills/hotdata-analytics/SKILL.md +++ b/skills/hotdata-analytics/SKILL.md @@ -18,7 +18,7 @@ version: 0.5.0 ```bash hotdata query "" [--workspace-id ] [--database ] [--output table|json|csv] -hotdata query status [--output table|json|csv] +hotdata query status ``` - **PostgreSQL dialect.** Quote mixed-case identifiers: `"CustomerName"`. @@ -65,6 +65,7 @@ hotdata results [--workspace-id ] [--output table|json - Prefer **`results `** over re-running identical heavy queries. - Query footers may include `[result-id: rslt...]`; also available from `queries `. +- `results list --limit` defaults to **100** (max **1000**) — unlike `queries list`, which defaults to **20**. --- @@ -103,9 +104,9 @@ Full procedure: [references/WORKFLOWS.md](references/WORKFLOWS.md). For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hotdata-search`**): ```bash -hotdata indexes create --catalog --schema --table \ +hotdata indexes create --catalog --schema --table
\ --name idx_orders_created --column created_at --type sorted [--async] ``` -List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here. +List and delete use the same `hotdata indexes` commands as in the search skill; only **`--type sorted`** is the analytics focus here. With `--async`, track the build via **`hotdata jobs list`** (see **`hotdata`** skill → Jobs). diff --git a/skills/hotdata-analytics/references/WORKFLOWS.md b/skills/hotdata-analytics/references/WORKFLOWS.md index 2485283..17643f3 100644 --- a/skills/hotdata-analytics/references/WORKFLOWS.md +++ b/skills/hotdata-analytics/references/WORKFLOWS.md @@ -84,7 +84,7 @@ hotdata query "SELECT * FROM chain_db.public.revenue_slice WHERE ..." ### Naming and documentation -- Prefer predictable `--name` values: `chain__`. +- Prefer predictable `--catalog` / `--table` values, e.g. catalog `chain_db` + table `chain__` (the chain table is named by `databases load --table`, not a `--name` flag). - Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`database.schema.table`). - Promote join/grain findings to **context:DATAMODEL** when they should be shared or persisted (**`hotdata`** skill). diff --git a/skills/hotdata-geospatial/SKILL.md b/skills/hotdata-geospatial/SKILL.md index dafb07f..3f43b58 100644 --- a/skills/hotdata-geospatial/SKILL.md +++ b/skills/hotdata-geospatial/SKILL.md @@ -6,22 +6,34 @@ version: 0.5.0 # Hotdata Geospatial Skill -Use this skill when working with geospatial data in Hotdata. Hotdata supports a subset of PostGIS-style functions using **PostgreSQL dialect SQL**. This reference is dataset-agnostic — apply it to any table with geometry columns. +Hotdata supports a subset of PostGIS-style functions in **PostgreSQL-dialect SQL**. This skill is dataset-agnostic — apply it to any table with geometry columns. -**Related skills:** **`hotdata`** (core CLI), **`hotdata-search`** (BM25/vector), **`hotdata-analytics`** (OLAP SQL). +**Requires the core `hotdata` skill** for auth, workspace, and table discovery. **Related:** **`hotdata-analytics`** (OLAP SQL), **`hotdata-search`** (BM25/vector). + +## Running these queries + +All SQL below runs through the core CLI: + +```bash +hotdata query "" [--workspace-id ] [--database ] [--output table|json|csv] +``` + +- **Fully qualify tables** as `..
` (or `..
` for a managed database) — every `
` placeholder below means a qualified name. +- **PostgreSQL dialect:** double-quote any non-lowercase identifier (e.g. `"GeoID"`). +- Discover candidate tables/columns with **`hotdata tables list --connection-id `** (see core skill). --- -## Geometry Columns +## Geometry columns -Most geospatial datasets in Hotdata carry one or both of: +Most geospatial tables in Hotdata carry one or both of: | Column | Type | Description | |---|---|---| | `wkb_geometry` | `Binary` | WKB-encoded geometry (polygon, point, multipolygon, etc.) | -| `wkb_geometry_bbox` | `Struct` | Precomputed bounding box with fields `xmin`, `ymin`, `xmax`, `ymax` (Float32) | +| `wkb_geometry_bbox` | `Struct` | Precomputed bounding box: `xmin`, `ymin`, `xmax`, `ymax` (Float32) | -**Always parse `wkb_geometry` with `ST_GeomFromWKB()` before using it in any spatial function:** +**Parse `wkb_geometry` with `ST_GeomFromWKB()`** before any spatial function: ```sql ST_GeomFromWKB(wkb_geometry) @@ -34,98 +46,27 @@ wkb_geometry_bbox['xmin'] -- ✓ works (wkb_geometry_bbox).xmin -- ✗ not supported ``` -Discover geometry columns with: - -```sql -hotdata tables list --connection-id -``` +Find these columns by their `Binary` / `Struct` types in `hotdata tables list --connection-id `. --- -## Supported Functions - -### Input / Construction - -| Function | Example | -|---|---| -| `ST_GeomFromWKB(col)` | `ST_GeomFromWKB(wkb_geometry)` | -| `ST_GeomFromText(wkt)` | `ST_GeomFromText('POLYGON((...))')` | -| `ST_MakePoint(lon, lat)` | `ST_MakePoint(-122.27, 37.80)` | - -### Output - -| Function | Example | -|---|---| -| `ST_AsText(geom)` | `ST_AsText(ST_GeomFromWKB(wkb_geometry))` → WKT string | -| `ST_AsBinary(geom)` | `ST_AsBinary(ST_GeomFromWKB(wkb_geometry))` → WKB binary | - -### Accessors / Inspection - -| Function | Returns | -|---|---| -| `ST_GeometryType(geom)` | e.g. `ST_Polygon`, `ST_MultiPolygon`, `ST_Point` | -| `ST_IsValid(geom)` | boolean | -| `ST_NumPoints(geom)` | integer | -| `ST_NPoints(geom)` | integer (alias for ST_NumPoints) | -| `ST_X(point)` | longitude (float) | -| `ST_Y(point)` | latitude (float) | -| `ST_Centroid(geom)` | point geometry | +## Functions -### Measurement +Common building blocks (full catalog, the unsupported set + workarounds, and unit conversions: **[references/functions.md](references/functions.md)**): -| Function | Unit | Notes | -|---|---|---| -| `ST_Area(geom)` | degrees² | Multiply by `111000 * 111000` for m², then `* 10.7639` for ft² | -| `ST_Length(geom)` | degrees | Multiply by `111000` for approximate meters | -| `ST_Distance(geom_a, geom_b)` | degrees | Multiply by `111000` for approximate meters | - -> **No meter-native measurements:** `::geography` cast is not supported. All measurements are in decimal degrees. The conversion factor ~111,000 m/degree is accurate at mid-latitudes (~30–50°N/S) and degrades toward the poles. - -### Spatial Relationships - -All return `boolean`: - -| Function | Meaning | -|---|---| -| `ST_Within(a, b)` | `a` is completely inside `b` | -| `ST_Contains(a, b)` | `a` contains `b` | -| `ST_Covers(a, b)` | `a` covers `b` (includes boundary) | -| `ST_CoveredBy(a, b)` | `a` is covered by `b` | -| `ST_Intersects(a, b)` | geometries share any space | -| `ST_Overlaps(a, b)` | geometries overlap (same dimension) | -| `ST_Touches(a, b)` | share boundary only, no interior overlap | -| `ST_Crosses(a, b)` | geometries cross (different dimensions) | -| `ST_Disjoint(a, b)` | geometries share no space | -| `ST_Equals(a, b)` | geometries are spatially identical | - -### Processing / Geometry Operations - -| Function | Notes | -|---|---| -| `ST_ConvexHull(geom)` | Returns convex hull polygon | -| `ST_Simplify(geom, tolerance)` | Douglas-Peucker simplification; tolerance in degrees | -| `ST_OrientedEnvelope(geom)` | Minimum oriented bounding box | +- **Construct:** `ST_GeomFromWKB`, `ST_GeomFromText('POLYGON((...))')`, `ST_MakePoint(lon, lat)` +- **Inspect:** `ST_GeometryType`, `ST_IsValid`, `ST_X` / `ST_Y`, `ST_Centroid`, `ST_AsText` +- **Relate (boolean):** `ST_Within`, `ST_Contains`, `ST_Covers`, `ST_Intersects`, `ST_Overlaps`, `ST_Touches`, `ST_Disjoint`, `ST_Equals` +- **Measure (in degrees):** `ST_Distance`, `ST_Length`, `ST_Area` +- **Process:** `ST_ConvexHull`, `ST_Simplify(geom, tol)`, `ST_OrientedEnvelope` ---- - -## Not Supported - -| Category | Not Supported | Workaround | -|---|---|---| -| Output | `ST_AsGeoJSON`, `ST_AsEWKT` | Use `ST_AsText`; parse WKT client-side | -| Cast | `::geography` | Multiply degrees by ~111,000 for meters | -| Input | `ST_MakeEnvelope`, `ST_GeomFromGeoJSON`, `ST_MakeLine` | Use `ST_GeomFromText('POLYGON(...)')` for envelopes | -| Accessors | `ST_SRID`, `ST_IsEmpty`, `ST_NumGeometries`, `ST_GeometryN`, `ST_ExteriorRing`, `ST_PointN`, `ST_StartPoint`, `ST_EndPoint` | — | -| Measurement | `ST_Perimeter`, `ST_MaxDistance` | — | -| Relationships | `ST_DWithin` | Use `ST_Within` + `ST_GeomFromText('POLYGON(...)')` | -| Processing | `ST_Buffer`, `ST_Envelope`, `ST_Boundary`, `ST_Union`, `ST_Intersection`, `ST_Difference`, `ST_SymDifference`, `ST_Collect`, `ST_ClosestPoint`, `ST_Snap`, `ST_BoundingDiagonal`, `ST_Expand` | Use `ST_OrientedEnvelope` instead of `ST_Envelope` | -| Projection | `ST_Transform`, `ST_SetSRID`, `ST_FlipCoordinates` | — | +**Key limits** (see reference for the full list + workarounds): no `::geography` cast — **measurements are in decimal degrees**, convert with the factors in the reference (distance `× 111000` ≈ meters at mid-latitudes; **area is only order-of-magnitude**). No `ST_Buffer`, `ST_DWithin`, `ST_MakeEnvelope`, `ST_Union`, `ST_Transform`, or GeoJSON I/O — use the documented substitutes. --- -## Common Patterns +## Common patterns -### Check geometry types in a table +### Geometry types in a table ```sql SELECT ST_GeometryType(ST_GeomFromWKB(wkb_geometry)) AS geom_type, COUNT(*) @@ -134,9 +75,9 @@ WHERE wkb_geometry IS NOT NULL GROUP BY 1 ``` -### Bounding box filter (replaces ST_MakeEnvelope / ST_DWithin) +### Bounding-box filter (replaces ST_MakeEnvelope / ST_DWithin) -Use `ST_GeomFromText` with a closed WKT polygon ring: +Use `ST_GeomFromText` with a **closed** WKT polygon ring (repeat the first vertex): ```sql WHERE ST_Within( @@ -145,9 +86,7 @@ WHERE ST_Within( ) ``` -**Vertex order:** `(minLon minLat, maxLon minLat, maxLon maxLat, minLon maxLat, minLon minLat)` — close the ring by repeating the first point. - -**Faster alternative** using the precomputed bbox struct (no WKB parsing): +**Faster** on large tables — filter the precomputed bbox struct (no WKB parsing); use the `ST_Within` form when you need centroid-in-polygon precision: ```sql WHERE wkb_geometry_bbox['xmin'] >= @@ -156,35 +95,26 @@ WHERE wkb_geometry_bbox['xmin'] >= AND wkb_geometry_bbox['ymax'] <= ``` -Use the bbox approach for large tables where WKB parsing is expensive; use `ST_Within` when you need centroid-in-polygon precision. - -### Point-in-polygon test +### Point-in-polygon ```sql SELECT * FROM
-WHERE ST_Contains( - ST_GeomFromWKB(wkb_geometry), - ST_MakePoint(, ) -) +WHERE ST_Contains(ST_GeomFromWKB(wkb_geometry), ST_MakePoint(, )) ``` -### Nearest neighbors (closest N features to a point) +### Nearest neighbors (closest N to a point) ```sql -SELECT - , - ST_Distance( - ST_Centroid(ST_GeomFromWKB(wkb_geometry)), - ST_MakePoint(, ) - ) * 111000 AS dist_meters +SELECT , + ST_Distance(ST_Centroid(ST_GeomFromWKB(wkb_geometry)), ST_MakePoint(, )) * 111000 AS dist_meters FROM
WHERE wkb_geometry IS NOT NULL ORDER BY dist_meters LIMIT 10 ``` -### Distance between two known points +### Distance between two points ```sql SELECT @@ -192,14 +122,12 @@ SELECT ST_Distance(ST_MakePoint(, ), ST_MakePoint(, )) * 69.0 AS dist_miles ``` -### Area of polygon features +### Area of polygons (order-of-magnitude — see reference caveat) ```sql -SELECT - , - ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 AS area_sqm, - ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 * 10.7639 AS area_sqft, - ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 / 4047 AS area_acres +SELECT , + ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 AS area_sqm, + ST_Area(ST_GeomFromWKB(wkb_geometry)) * 111000 * 111000 / 4047 AS area_acres FROM
WHERE wkb_geometry IS NOT NULL ``` @@ -207,78 +135,32 @@ WHERE wkb_geometry IS NOT NULL ### Centroid coordinates ```sql -SELECT - , +SELECT , ST_X(ST_Centroid(ST_GeomFromWKB(wkb_geometry))) AS lon, ST_Y(ST_Centroid(ST_GeomFromWKB(wkb_geometry))) AS lat FROM
WHERE wkb_geometry IS NOT NULL ``` -### Convert to WKT for export or inspection - -```sql -SELECT , ST_AsText(ST_GeomFromWKB(wkb_geometry)) AS wkt -FROM
-WHERE wkb_geometry IS NOT NULL -LIMIT 10 -``` - -### Simplify geometry for faster rendering +### Export / simplify as WKT ```sql -SELECT , ST_AsText(ST_Simplify(ST_GeomFromWKB(wkb_geometry), 0.0001)) AS simplified_wkt -FROM
-WHERE wkb_geometry IS NOT NULL +-- raw WKT +SELECT , ST_AsText(ST_GeomFromWKB(wkb_geometry)) AS wkt FROM
WHERE wkb_geometry IS NOT NULL LIMIT 10 +-- simplified (tolerance in degrees, ~11 m at mid-latitudes) +SELECT , ST_AsText(ST_Simplify(ST_GeomFromWKB(wkb_geometry), 0.0001)) AS wkt FROM
WHERE wkb_geometry IS NOT NULL ``` -Tolerance is in degrees (~11 m at mid-latitudes). Increase for coarser simplification, decrease for finer. - --- -## Unit Conversion Reference - -| To get | Multiply degrees by | -|---|---| -| Meters (distance) | × 111,000 | -| Kilometers (distance) | × 111 | -| Miles (distance) | × 69.0 | -| Feet (distance) | × 364,173 | -| m² (area) | × 111,000² = × 12,321,000,000 | -| ft² (area) | × 111,000² × 10.7639 | -| Acres (area) | × 111,000² ÷ 4,047 | - -> These conversions assume ~37°N latitude. They are approximations — accuracy decreases significantly above 60°N or below 60°S. - ---- - -## Workflow: Exploring a New Geospatial Dataset - -1. **Check for geometry columns:** - ``` - hotdata tables list --connection-id - ``` - Look for `Binary` (WKB) or `Struct` (bbox) typed columns. - -2. **Verify geometry types:** - ```sql - SELECT ST_GeometryType(ST_GeomFromWKB(wkb_geometry)) AS type, COUNT(*) - FROM
WHERE wkb_geometry IS NOT NULL GROUP BY 1 - ``` - -3. **Check coverage (bounding box of entire dataset):** - ```sql - SELECT - MIN(wkb_geometry_bbox['xmin']) AS min_lon, - MIN(wkb_geometry_bbox['ymin']) AS min_lat, - MAX(wkb_geometry_bbox['xmax']) AS max_lon, - MAX(wkb_geometry_bbox['ymax']) AS max_lat - FROM
- WHERE wkb_geometry_bbox IS NOT NULL - ``` +## Workflow: explore a new geospatial table -4. **Sample WKT to understand geometry structure:** +1. **Find geometry columns** — `hotdata tables list --connection-id `; look for `Binary` (WKB) / `Struct` (bbox) types. +2. **Geometry types** — run the "Geometry types in a table" pattern above. +3. **Coverage / extent** — aggregate the bbox struct: ```sql - SELECT ST_AsText(ST_GeomFromWKB(wkb_geometry)) FROM
- WHERE wkb_geometry IS NOT NULL LIMIT 3 + SELECT MIN(wkb_geometry_bbox['xmin']) AS min_lon, MIN(wkb_geometry_bbox['ymin']) AS min_lat, + MAX(wkb_geometry_bbox['xmax']) AS max_lon, MAX(wkb_geometry_bbox['ymax']) AS max_lat + FROM
WHERE wkb_geometry_bbox IS NOT NULL ``` +4. **Sample WKT** — run the "Export as WKT" pattern with `LIMIT 3` to see geometry structure. diff --git a/skills/hotdata-geospatial/references/functions.md b/skills/hotdata-geospatial/references/functions.md new file mode 100644 index 0000000..26b3197 --- /dev/null +++ b/skills/hotdata-geospatial/references/functions.md @@ -0,0 +1,94 @@ +# Geospatial function reference + +Supported PostGIS-style functions, the unsupported set with workarounds, and degree→unit conversion factors. Hotdata uses **PostgreSQL-dialect SQL**; run everything through `hotdata query` (see the skill's "Running these queries"). Parse `wkb_geometry` with `ST_GeomFromWKB()` before passing it to any function. + +## Supported functions + +### Input / construction + +| Function | Example | +|---|---| +| `ST_GeomFromWKB(col)` | `ST_GeomFromWKB(wkb_geometry)` | +| `ST_GeomFromText(wkt)` | `ST_GeomFromText('POLYGON((...))')` | +| `ST_MakePoint(lon, lat)` | `ST_MakePoint(-122.27, 37.80)` | + +### Output + +| Function | Example | +|---|---| +| `ST_AsText(geom)` | `ST_AsText(ST_GeomFromWKB(wkb_geometry))` → WKT string | +| `ST_AsBinary(geom)` | `ST_AsBinary(ST_GeomFromWKB(wkb_geometry))` → WKB binary | + +### Accessors / inspection + +| Function | Returns | +|---|---| +| `ST_GeometryType(geom)` | e.g. `ST_Polygon`, `ST_MultiPolygon`, `ST_Point` | +| `ST_IsValid(geom)` | boolean | +| `ST_NumPoints(geom)` | integer | +| `ST_NPoints(geom)` | integer (alias for `ST_NumPoints`) | +| `ST_X(point)` | longitude (float) | +| `ST_Y(point)` | latitude (float) | +| `ST_Centroid(geom)` | point geometry | + +### Measurement + +All results are in **decimal degrees** (no `::geography` / meter-native support — see conversions below). + +| Function | Unit | Notes | +|---|---|---| +| `ST_Area(geom)` | degrees² | rough — see area caveat in conversions | +| `ST_Length(geom)` | degrees | `× 111000` ≈ meters (mid-latitude) | +| `ST_Distance(geom_a, geom_b)` | degrees | `× 111000` ≈ meters (mid-latitude) | + +### Spatial relationships (all return `boolean`) + +| Function | Meaning | +|---|---| +| `ST_Within(a, b)` | `a` is completely inside `b` | +| `ST_Contains(a, b)` | `a` contains `b` | +| `ST_Covers(a, b)` | `a` covers `b` (includes boundary) | +| `ST_CoveredBy(a, b)` | `a` is covered by `b` | +| `ST_Intersects(a, b)` | geometries share any space | +| `ST_Overlaps(a, b)` | geometries overlap (same dimension) | +| `ST_Touches(a, b)` | share boundary only, no interior overlap | +| `ST_Crosses(a, b)` | geometries cross (different dimensions) | +| `ST_Disjoint(a, b)` | geometries share no space | +| `ST_Equals(a, b)` | geometries are spatially identical | + +### Processing / geometry operations + +| Function | Notes | +|---|---| +| `ST_ConvexHull(geom)` | returns convex hull polygon | +| `ST_Simplify(geom, tolerance)` | Douglas–Peucker; tolerance in degrees (~11 m at mid-latitudes) | +| `ST_OrientedEnvelope(geom)` | minimum oriented bounding box | + +## Not supported (with workarounds) + +| Category | Not supported | Workaround | +|---|---|---| +| Output | `ST_AsGeoJSON`, `ST_AsEWKT` | use `ST_AsText`; parse WKT client-side | +| Cast | `::geography` | multiply degrees by ~111,000 for meters | +| Input | `ST_MakeEnvelope`, `ST_GeomFromGeoJSON`, `ST_MakeLine` | use `ST_GeomFromText('POLYGON(...)')` for envelopes | +| Accessors | `ST_SRID`, `ST_IsEmpty`, `ST_NumGeometries`, `ST_GeometryN`, `ST_ExteriorRing`, `ST_PointN`, `ST_StartPoint`, `ST_EndPoint` | — | +| Measurement | `ST_Perimeter`, `ST_MaxDistance` | — | +| Relationships | `ST_DWithin` | use `ST_Within` + `ST_GeomFromText('POLYGON(...)')` | +| Processing | `ST_Buffer`, `ST_Envelope`, `ST_Boundary`, `ST_Union`, `ST_Intersection`, `ST_Difference`, `ST_SymDifference`, `ST_Collect`, `ST_ClosestPoint`, `ST_Snap`, `ST_BoundingDiagonal`, `ST_Expand` | use `ST_OrientedEnvelope` instead of `ST_Envelope` | +| Projection | `ST_Transform`, `ST_SetSRID`, `ST_FlipCoordinates` | — | + +## Degree → unit conversions + +All measurements come back in decimal degrees. Convert with these factors — **approximations** that assume ~37°N and ignore longitude shrink (`cos(latitude)`); accuracy drops toward the poles. + +| To get | Multiply degrees by | +|---|---| +| Meters (distance) | × 111,000 | +| Kilometers (distance) | × 111 | +| Miles (distance) | × 69.0 | +| Feet (distance) | × 364,173 | +| m² (area) | × 111,000² = × 12,321,000,000 | +| ft² (area) | × 111,000² × 10.7639 | +| Acres (area) | × 111,000² ÷ 4,047 | + +> **Area is especially rough.** Distance error grows with latitude; area error grows with its *square* because the `111000²` factor ignores the `cos(latitude)` longitude shrink entirely. Treat square-meter/acre figures as order-of-magnitude, not precise — re-project externally when you need accurate area. diff --git a/skills/hotdata-search/SKILL.md b/skills/hotdata-search/SKILL.md index 015c952..87a23e4 100644 --- a/skills/hotdata-search/SKILL.md +++ b/skills/hotdata-search/SKILL.md @@ -34,18 +34,21 @@ hotdata search "" --table [--type vector] [--co | **`vector`** | Pass plain-text query; name the **source text column** (e.g. `title`). Server embeds using the same provider/metric/dimensions as the index. SQL uses `vector_distance(col, 'text')`. Results sort by distance (ascending). | - **Inference:** when `--type` or `--column` are omitted, the CLI fetches the table's indexes and selects the only BM25/vector index. If multiple exist, you must specify both flags. -- **No vector index, or custom embedding model?** Use raw SQL via `hotdata query` (e.g. `cosine_distance(col, [])`). The removed `--model` / stdin-vector paths hardcoded `l2_distance` and are not supported. +- **Custom embedding model, raw query vector, or no vector index?** Use `hotdata query` directly (e.g. `cosine_distance(col, [])`) — `search` only auto-embeds the query text via the index's own provider. - **Before search:** create the right index (`indexes create --type bm25` or `--type vector`). See [references/INDEXES.md](references/INDEXES.md). - Default `--limit` is 10. +- **Managed databases:** set the database active (`hotdata databases set `) and reference the table by its **SQL catalog** — `..
`, usually `default.public.
`. Do **not** use the internal `__db_` label that `indexes list` displays, nor a connection id — `bm25_search`/`vector_distance` resolve a catalog attached to the active database, so a `__db_…` or `conn…` prefix errors with *catalog … is not attached*. --- ## Indexes (BM25 and vector) -Create attaches to a table via its `--catalog` alias (a managed-database catalog or a connection name). `list` and `delete` accept `--connection-id` (+ `--schema` + `--table`) for connection-scoped operations. +Create attaches to a table via its `--catalog` alias (a managed-database catalog or a connection name). `list` filters by any of `--connection-id` / `--schema` / `--table` (all optional). `delete` **requires all of** `--connection-id` + `--schema` + `--table` + `--name`. + +**Unscoped `hotdata indexes list` (no `--connection-id`) scans the whole workspace — both regular connections *and* managed databases** — so managed-database indexes appear without any flags. In that whole-workspace view the `table` column shows a managed database under its internal `__db_..
` label (a connection-scoped `indexes list --connection-id ` shows the same rows). ```bash -# List — workspace scan (filter by connection, schema, or table) +# List — whole-workspace scan, incl. managed databases (filter by connection, schema, or table) hotdata indexes list [--connection-id ] [--schema ] [--table
] [--workspace-id ] [--output table|json|yaml] # Create — by catalog alias (resolves a managed-database catalog or a connection name) @@ -54,11 +57,11 @@ hotdata indexes create --catalog --schema --table
\ [--name ] [--metric l2|cosine|dot] [--async] \ [--embedding-provider-id ] [--dimensions ] [--output-column ] [--description ] -# Delete — connection table (--connection-id + --schema + --table) +# Delete — requires --connection-id + --schema + --table + --name hotdata indexes delete --connection-id --schema --table
--name ``` -- **`--type` is required** on create: `bm25` (one text column) or `vector` (exactly one column; often embeddings or auto-embedded text). +- **`--type` is required** on create: `bm25` (one text column) or `vector` (exactly one column; often embeddings or auto-embedded text). (`sorted` is also a valid `--type`, covered in **`hotdata-analytics`**.) - **`sorted`** indexes (range/equality for OLAP filters) are documented in **`hotdata-analytics`** — this skill focuses on retrieval types. - **`--async`:** poll with `hotdata jobs ` (see **`hotdata`** skill **Jobs**). - **Auto-embedding:** `--type vector` on a **text** column generates embeddings server-side. Optional `--embedding-provider-id`; default output column `{column}_embedding` (override with `--output-column`). @@ -73,7 +76,7 @@ Full workflow (gather workload → compare existing → create → verify): [ref hotdata embedding-providers list [--workspace-id ] [--output table|json|yaml] hotdata embedding-providers get [--workspace-id ] [--output table|json|yaml] hotdata embedding-providers create --name --provider-type service|local \ - [--config ''] [--provider-api-key | --secret-name ] [--workspace-id ] + [--config ''] [--provider-api-key | --secret-name ] [--workspace-id ] [--output table|json|yaml] hotdata embedding-providers update [--name ] [--config ''] [--provider-api-key | --secret-name ] [--workspace-id ] [--output table|json|yaml] hotdata embedding-providers delete [--workspace-id ] ``` diff --git a/skills/hotdata-search/references/INDEXES.md b/skills/hotdata-search/references/INDEXES.md index 49844ce..8eff510 100644 --- a/skills/hotdata-search/references/INDEXES.md +++ b/skills/hotdata-search/references/INDEXES.md @@ -25,7 +25,7 @@ High-cardinality **text** (`title`, `body`, …) → **bm25**. **Embedding** / f hotdata indexes list [--connection-id ] [--schema ] [--table
] ``` -Skip duplicates (same table, column, and purpose). +With no `--connection-id`, this is a whole-workspace scan that **includes managed-database indexes** (shown under the internal `__db_..
` label). Skip duplicates (same table, column, and purpose). ## 3. Create indexes diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index 7769637..778050c 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -48,41 +48,23 @@ If **`HOTDATA_WORKSPACE`** is set in the environment, the workspace is **locked* **Omit `--workspace-id` unless you need to target a specific workspace** (and it is not locked by env or session). -## Database context (API) - -**Notation `context:`:** In this skill, **`context:DATAMODEL`**, **`context:GLOSSARY`**, and **`context:`** mean the **authoritative Markdown document** stored on the server under that **stem** via the Hotdata **context API** (`/v1/databases/{database_id}/context`, `hotdata context …`). That is **not** the same as generic English (“a data model”, “a glossary”), and **not** the same as local `./DATAMODEL.md` except as **pull/push transport**. **CLI commands use the bare stem** (no `context:` prefix): e.g. `hotdata context show DATAMODEL`, `hotdata context push GLOSSARY`. - -Context is **scoped to the active database** (set via `hotdata databases set `). All context commands operate against the database returned by the active-database config unless you pass **`--database-id `** (short: **`-d`**) explicitly. The **authoritative** copy always lives on the server under the stem; common stems are **`context:DATAMODEL`** (semantic map) and **`context:GLOSSARY`** (glossary / runbooks). - -The CLI command **`hotdata context push`** reads **`./.md`** and **`pull`** writes that file in the **current working directory**—those files exist only as a **transport surface** for the API, not as a second source of truth. **`hotdata context show `** prints Markdown to stdout so agents can read **`context:`** without any local file. Stems follow SQL table–identifier rules (ASCII letters, digits, underscore; no dot in the API name; max 128 characters; SQL reserved words are not allowed). For **`show`**, **`pull`**, and **`push`**, the CLI accepts a trailing **`.md`** on the argument (e.g. **`USER.md`**) and treats it as stem **`USER`**—the database still stores **`USER`**, not `USER.md`. - -> **Agents: do not blindly run `hotdata context show DATAMODEL` on session start.** Run **`hotdata context list`** first (optional `--prefix DATAMODEL`). Call **`hotdata context show DATAMODEL` only if** the list includes the `DATAMODEL` stem. If **`show` exits 1** with *no context named …*, that is **normal** when nothing has been pushed yet—**not a hard failure**; do not retry in a loop, and **avoid speculative `show` in parallel** with other shell tools where one failure cancels sibling calls. Proceed without **context:DATAMODEL** until the user asks to create or load one. - -**Agents (Claude and similar):** use database context as the only durable store for **context:DATAMODEL**, **context:GLOSSARY**, and any other **`context:`** documents you introduce. Keep transient analysis notes in the conversation or local scratch until you **promote** them into **context:DATAMODEL** when they should guide the whole database ([details below](#analysis-modeling-vs-contextdatamodel)). - -1. **Before** planning non-trivial queries, explaining schema to others, or editing **context:DATAMODEL**, **discover** stored names with `hotdata context list` (and other stems such as **context:GLOSSARY** as needed). **Only if** `DATAMODEL` appears in the list, load it: `hotdata context show DATAMODEL`. If it is **absent**, skip `show` and treat **context:DATAMODEL** as unset—use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) when the user wants to bootstrap, then `push` when ready. -2. **After** you change **context:DATAMODEL**, persist with **`hotdata context push DATAMODEL`**. The CLI requires a local `./DATAMODEL.md` for that step: write the body there (from `context show`, the template, or your edits), then run `push` from the project directory. -3. Use **`hotdata context pull DATAMODEL`** when you intentionally want a local `./DATAMODEL.md` copy (for example a human editor); it still reflects API state for **context:DATAMODEL**, not a parallel document. +### Cold starts (worker wake-up) -The standard stem for the database semantic map is **`DATAMODEL`** (skill notation **context:DATAMODEL**). Add other stems the same way (e.g. **`GLOSSARY`** → **context:GLOSSARY**) for glossary or runbooks. +A workspace's query worker scales to zero after inactivity. The **first** command against an idle workspace (e.g. `databases list`, `query`, `search`) blocks while it wakes — typically ~10s, up to ~20s — and the spinner upgrades to `waking up worker after inactivity (this can take ~20s)…`. **This is normal, not a hang:** don't kill the command, retry, or treat the pause as an error. Subsequent commands return promptly; warm workspaces are unaffected. -### Analysis modeling vs context:DATAMODEL - -Keep two layers separate: - -- **Analysis modeling (day to day)** — Understanding data *for the current task*: exploratory SQL, join checks, column semantics for one report, hypotheses, scratch notes. Often conversational or short-lived. **The conversation or local scratch notes** are the right home while you explore; keep them there until you decide they are worth promoting. +## Database context (API) -- **context:DATAMODEL (Hotdata database data model)** — A **durable, database-scoped** map stored only via the **context API**: entities and tables across connections, PK/FK relationships, how derived tables tie back to sources, naming and query conventions the **whole team** should rely on. This is **higher-level shared structure**, not a transcript of one investigation. +**`context:`** (e.g. **context:DATAMODEL**, **context:GLOSSARY**) is an authoritative Markdown document stored server-side under that stem via the context API — *not* generic English ("a data model"), and *not* a local `./DATAMODEL.md` (local files are only `push`/`pull` transport). CLI commands take the bare stem: `hotdata context show DATAMODEL`. Context is scoped to the **active database** (`hotdata databases set `); target another with `--database-id` / `-d`. Stems follow SQL identifier rules and accept a trailing `.md` (stored without it). Command reference: [Database context (named Markdown)](#database-context-named-markdown). -**Promotion:** When analysis findings should **outlive the current session** and **guide everyone**, merge them into **context:DATAMODEL** (`hotdata context list` → if `DATAMODEL` is listed, `hotdata context show DATAMODEL` → reconcile → `hotdata context push DATAMODEL`). You do **not** need to update **context:DATAMODEL** after every ad-hoc query—only when the database story or join graph meaningfully changes. +**Agents — list before show.** Run `hotdata context list` (optionally `--prefix DATAMODEL`) first; run `hotdata context show DATAMODEL` *only if* the stem is listed. A missing stem makes `show` exit 1 — normal for a fresh database, not a failure: don't retry in a loop or run speculative `show` in parallel with other tools. Proceed without context:DATAMODEL until one exists. -Use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [references/MODEL_BUILD.md](references/MODEL_BUILD.md) for **what to write inside** the Markdown you store under **context:** stems. Never put database-specific model text inside agent skill install paths—only in **database context** (and transient `./.md` for push/pull when needed). +**context:DATAMODEL is the durable, shared store** — entities, keys, cross-connection joins, and the naming/query conventions the whole team relies on. Keep task-scoped exploration (scratch SQL, hypotheses, one-off join checks) in the conversation or local notes; **promote** to context:DATAMODEL only when findings should outlive the session and guide everyone — reconcile against `context show DATAMODEL` (if listed), write `./DATAMODEL.md`, then `hotdata context push DATAMODEL`. No need to update it after every ad-hoc query. What to write inside the document: [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [references/MODEL_BUILD.md](references/MODEL_BUILD.md). ## Multi-step workflows These are **patterns** built from the commands below—not separate CLI subcommands: -- **Model (`context:DATAMODEL`)** — The **shared** Markdown semantic map of the active database (entities, keys, joins across connections). **Store and read it only via database context** (`hotdata context list`, then `hotdata context show DATAMODEL` **only when listed**, `context push DATAMODEL`); refresh using `connections`, `connections refresh`, and `tables list`. For a **deep** pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). Contrast **analysis modeling** in the conversation or local scratch (see [Analysis modeling vs context:DATAMODEL](#analysis-modeling-vs-contextdatamodel)). +- **Model (`context:DATAMODEL`)** — The shared semantic map of the active database (entities, keys, joins across connections). Store and read it only via database context (`hotdata context list`, then `show DATAMODEL` **only when listed**, `push DATAMODEL`); refresh using `connections`, `connections refresh`, and `tables list`. For a deep pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). - **History / Chain / OLAP SQL** — See **`hotdata-analytics`** and [references/WORKFLOWS.md](references/WORKFLOWS.md). - **Search / retrieval indexes** — See **`hotdata-search`**. @@ -90,7 +72,7 @@ Catalog, skill decision tree, epic flows (onboard, chain, retrieval), and manage ## Available Commands -Top-level subcommands (each detailed below): **`auth`**, **`query`**, **`workspaces`**, **`connections`**, **`databases`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`context`**, **`completions`**. Search, indexes (bm25/vector), and embedding providers are documented in **`hotdata-search`**; query history, results, Chain, and OLAP patterns in **`hotdata-analytics`**. +Top-level subcommands (each detailed below): **`auth`**, **`query`**, **`workspaces`**, **`connections`**, **`databases`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`context`**, **`completions`**, **`update`**. Search, indexes (bm25/vector), and embedding providers are documented in **`hotdata-search`**; query history, results, Chain, and OLAP patterns in **`hotdata-analytics`**. Global CLI options: **`--api-key`**, **`-v` / `--version`**, **`-h` / `--help`**, **`--no-input`** (disable interactive prompts; commands that require input will error instead — useful in CI or non-TTY environments). Hidden developer flag: **`--debug`** (verbose HTTP logs). @@ -258,7 +240,7 @@ hotdata context push [--database-id ] [--dry-run] ``` hotdata query "" [--workspace-id ] [--database ] [--output table|json|csv] -hotdata query status [--output table|json|csv] +hotdata query status ``` - Default output is `table` (row count and execution time). @@ -308,62 +290,10 @@ hotdata auth status # Check current auth status hotdata auth logout # Remove saved auth for the default profile ``` -### Analysis notes and promotion to context:DATAMODEL - -Exploratory analysis notes (keys, joins, open questions for the current task) belong in **the conversation or local scratch notes**. When those findings should guide the whole database and be shared with everyone, promote them to **context:DATAMODEL**: save consolidated Markdown as `./DATAMODEL.md` and run `hotdata context push DATAMODEL` (merge with `hotdata context show DATAMODEL` first if `DATAMODEL` is already listed in `hotdata context list`). - -**Also available:** `hotdata connections new` — interactive connection wizard (no substitute for the programmatic **`connections create`** flow above). - -## Workflow: Running a Query - -0. (Recommended for agents) When the query depends on **workspace-wide** table relationships or naming conventions, run **`hotdata context list`** first; **only if** `DATAMODEL` is listed, run `hotdata context show DATAMODEL` to load **context:DATAMODEL**. If it is **not** listed, **do not** run `show`—ad-hoc analysis does not require populated **context:DATAMODEL**. -1. List connections: - ``` - hotdata connections list - ``` -2. Inspect available tables: - ``` - hotdata tables list - ``` -3. Inspect columns for a specific connection: - ``` - hotdata tables list --connection-id - ``` -4. Run SQL, quoting **mixed-case or upper-case** column names with **double quotes** (PostgreSQL treats unquoted identifiers as lowercased): - ``` - hotdata query "SELECT 1" - hotdata query "SELECT \"CustomerName\" FROM mydb.public.customers LIMIT 10" - ``` - -## Workflow: Creating a managed database (parquet) - -1. Create the database with a catalog alias: - ``` - hotdata databases create --catalog mydb - ``` -2. Load parquet per table (tables are auto-declared if needed): - ``` - hotdata databases load --catalog mydb --table events --file ./events.parquet - hotdata databases load --catalog mydb --table events --url https://example.com/events.parquet - ``` -3. Confirm tables and query: - ``` - hotdata databases tables list - hotdata query "SELECT * FROM mydb.public.events LIMIT 10" - ``` - -## Workflow: Creating a Connection - -1. List available connection types: - ``` - hotdata connections create list - ``` -2. Inspect the schema for the desired type: - ``` - hotdata connections create list --output json - ``` -3. Collect required config and auth field values from the user or environment. **Never hardcode credentials — use env vars or files.** -4. Create the connection: - ``` - hotdata connections create --name "my-connection" --type --config '' - ``` +### Interactive connection wizard + +`hotdata connections new` creates a connection interactively (human-friendly); agents should prefer the programmatic `connections create` flow above. + +## Workflows + +End-to-end recipes — onboard a workspace, run a query, build a managed database (parquet), chain/materialize, add retrieval indexes — live in [references/WORKFLOWS.md](references/WORKFLOWS.md). The command sections above are the per-command reference; the workflows stitch them into sequences. diff --git a/skills/hotdata/references/DATA_MODEL.template.md b/skills/hotdata/references/DATA_MODEL.template.md index 92f102c..df04008 100644 --- a/skills/hotdata/references/DATA_MODEL.template.md +++ b/skills/hotdata/references/DATA_MODEL.template.md @@ -1,6 +1,6 @@ # Data model — `` -> **Storage:** This Markdown structure is **context:DATAMODEL**—the document stored in the workspace **context API** under stem `DATAMODEL`. Use **`hotdata context list`** first; **only if** `DATAMODEL` appears, use `hotdata context show DATAMODEL` to read it (otherwise there is nothing to show yet). Maintain `./DATAMODEL.md` in your **project directory** (where you run `hotdata`) only when editing, then `hotdata context push DATAMODEL`. Do not use `docs/DATA_MODEL.md` or other repo paths as the source of truth. (**`context:DATAMODEL`** in skills means that API document, not generic “data model” prose.) +> **Storage:** This Markdown structure is **context:DATAMODEL**—the document stored in the database-scoped **context API** (the active database; `-d`/`--database-id` to target another) under stem `DATAMODEL`. Use **`hotdata context list`** first; **only if** `DATAMODEL` appears, use `hotdata context show DATAMODEL` to read it (otherwise there is nothing to show yet). Maintain `./DATAMODEL.md` in your **project directory** (where you run `hotdata`) only when editing, then `hotdata context push DATAMODEL`. Do not use `docs/DATA_MODEL.md` or other repo paths as the source of truth. (**`context:DATAMODEL`** in skills means that API document, not generic “data model” prose.) > Do not commit workspace-specific content into agent skill folders. > For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s. @@ -58,7 +58,7 @@ Document safe join paths and caveats (fan-out, timing, different refresh cadence |-------|--------|--------------------------|--------------|-------| | | | | | | -_Use `hotdata indexes list` for connection tables (see **hotdata-search** skill). Record bm25/vector indexes here; sorted indexes for OLAP filters in **hotdata-analytics**._ +_Use `hotdata indexes list` (no flags — covers connection tables **and** managed databases; see **hotdata-search** skill). Record bm25/vector indexes here; sorted indexes for OLAP filters in **hotdata-analytics**._ ## Managed databases (uploaded) diff --git a/skills/hotdata/references/MODEL_BUILD.md b/skills/hotdata/references/MODEL_BUILD.md index c109b6d..a007d72 100644 --- a/skills/hotdata/references/MODEL_BUILD.md +++ b/skills/hotdata/references/MODEL_BUILD.md @@ -1,10 +1,10 @@ -# Building a workspace data model (advanced) +# Building a database data model (advanced) -Optional **deep pass** for a single authoritative markdown document stored as **`context:DATAMODEL`** (workspace **context API**). For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md). +Optional **deep pass** for a single authoritative markdown document stored as **`context:DATAMODEL`** (database-scoped **context API** — the active database). For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md). **Notation:** **`context:DATAMODEL`** is the live server document; **not** the same phrase as “building a data model” for a one-off analysis. **CLI** uses the bare stem: `hotdata context show DATAMODEL`. -**Output:** After **`hotdata context list`** confirms `DATAMODEL` exists, read **context:DATAMODEL** with `hotdata context show DATAMODEL`; edit `./DATAMODEL.md` in the **project directory** where you run `hotdata`, then **`hotdata context push DATAMODEL`**. Do not use `docs/`, `DATA_MODEL.md`, or other repo-only paths as the system of record. Never store workspace-specific model text inside agent skill folders. +**Output:** After **`hotdata context list`** confirms `DATAMODEL` exists, read **context:DATAMODEL** with `hotdata context show DATAMODEL`; edit `./DATAMODEL.md` in the **project directory** where you run `hotdata`, then **`hotdata context push DATAMODEL`**. Do not use `docs/`, `DATA_MODEL.md`, or other repo-only paths as the system of record. Never store database-specific model text inside agent skill folders. --- @@ -92,7 +92,7 @@ Per table when you only need one: hotdata indexes list -c --schema --table
[-w ] ``` -Managed-database tables (`--catalog`) are covered by the same `indexes list` scan; filter with `--connection-id` / `--schema` / `--table` as above. +Managed-database indexes are included in the no-flag whole-workspace `indexes list` (shown under the internal `__db_..
` label); narrow to one with `--connection-id` (the database's `default_connection_id`) / `--schema` / `--table` as above. Note: diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md index 9fdc415..0fe5b04 100644 --- a/skills/hotdata/references/WORKFLOWS.md +++ b/skills/hotdata/references/WORKFLOWS.md @@ -1,6 +1,6 @@ # Hotdata CLI workflows -**Notation:** **`context:`** (e.g. **`context:DATAMODEL`**) means the workspace document stored via the **context API**—CLI uses bare stems: `hotdata context show DATAMODEL`. +**Notation:** **`context:`** (e.g. **`context:DATAMODEL`**) means the database-scoped document stored via the **context API** (active database; `-d`/`--database-id` to target another)—CLI uses bare stems: `hotdata context show DATAMODEL`. ---