diff --git a/README.md b/README.md
index dc8290d..3470c79 100644
--- a/README.md
+++ b/README.md
@@ -67,12 +67,11 @@ API key priority (lowest to highest): config file → `HOTDATA_API_KEY` env var
| `connections` | `list`, `create`, `refresh`, `new` | Manage connections |
| `databases` | `list`, `create`, `delete`, `tables` | Managed databases (create and load tables via parquet) |
| `tables` | `list` | List tables and columns |
-| `datasets` | `list`, `create`, `update` | Manage uploaded datasets |
| `context` | `list`, `show`, `pull`, `push` | Workspace Markdown context (e.g. data model `DATAMODEL`) via the context API |
| `query` | | Execute a SQL query |
| `queries` | `list` | Inspect query run history |
| `search` | | Full-text search across a table column |
-| `indexes` | `list`, `create`, `delete` | Manage indexes on a table or dataset |
+| `indexes` | `list`, `create`, `delete` | Manage indexes on a table |
| `embedding-providers` | `list`, `get`, `create`, `update`, `delete` | Manage embedding providers used by vector indexes |
| `results` | `list` | Retrieve stored query results |
| `jobs` | `list` | Manage background jobs |
@@ -155,7 +154,7 @@ hotdata databases tables delete
[--database ] [--schema publ
- `load` (top-level shorthand) — loads a parquet file into `--catalog.--schema.--table`. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.
- `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode).
- `run` mints a database-scoped JWT and execs `` with `HOTDATA_DATABASE_TOKEN`, `HOTDATA_DATABASE_REFRESH_TOKEN`, `HOTDATA_DATABASE`, `HOTDATA_WORKSPACE`, and `HOTDATA_API_URL` injected into its environment.
-- For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`).
+- Managed table loads accept **parquet** only — convert CSV/JSON to parquet first.
Example:
@@ -176,26 +175,6 @@ hotdata tables list [--workspace-id ] [--connection-id ] [--schema ..
` — use this format in SQL queries.
-## Datasets
-
-```sh
-hotdata datasets list [--workspace-id ] [--limit ] [--offset ] [--format table|json|yaml]
-hotdata datasets [--workspace-id ] [--format table|json|yaml]
-hotdata datasets create --file data.csv [--label "My Dataset"] [--table-name my_dataset]
-hotdata datasets create --sql "SELECT ..." --label "My Dataset"
-hotdata datasets create --url "https://example.com/data.parquet" --label "My Dataset"
-hotdata datasets update [--label "New Label"] [--table-name new_table]
-hotdata datasets refresh [--workspace-id ] [--async]
-```
-
-- Datasets are queryable as `datasets.main.`.
-- `--file`, `--sql`, `--query-id`, and `--url` are mutually exclusive.
-- `--url` imports data directly from a URL (supports csv, json, parquet).
-- Format is auto-detected from file extension or content.
-- Piped stdin is supported: `cat data.csv | hotdata datasets create --label "My Dataset"`
-- `refresh` re-runs the dataset's source (URL fetch or saved query) and creates a new version. Not supported for upload-source datasets.
-- `--async` submits the refresh as a background job and returns a job ID; poll with `hotdata jobs `.
-
## Workspace context
Named Markdown documents for a workspace (data model, glossary, etc.) are stored in the **context API**. The CLI treats the server as the **source of truth**; local files are only used where the tool requires a path on disk.
@@ -260,25 +239,20 @@ hotdata search "" --table
--name
```
- `--type` is **required** — choose `sorted` (B-tree-like), `bm25` (full-text), or `vector` (similarity).
@@ -319,7 +293,7 @@ hotdata jobs [--workspace-id ] [--format table|json|yaml]
```
- `list` shows only active jobs (`pending` and `running`) by default. Use `--all` to see all jobs.
-- `--job-type` accepts: `data_refresh_table`, `data_refresh_connection`, `dataset_refresh`, `create_index`, `create_dataset_index`.
+- `--job-type` accepts: `data_refresh_table`, `data_refresh_connection`, `create_index`.
- `--status` accepts: `pending`, `running`, `succeeded`, `partially_succeeded`, `failed`.
## Configuration
diff --git a/skills/hotdata-analytics/SKILL.md b/skills/hotdata-analytics/SKILL.md
index 6e18249..da479ed 100644
--- a/skills/hotdata-analytics/SKILL.md
+++ b/skills/hotdata-analytics/SKILL.md
@@ -1,6 +1,6 @@
---
name: hotdata-analytics
-description: Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain via datasets or managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, datasets, and auth.
+description: Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain into managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, and auth.
version: 0.5.0
---
@@ -8,7 +8,7 @@ version: 0.5.0
**OLAP-style analytics** in Hotdata: PostgreSQL-dialect SQL, query execution, run history, stored results, **Chain** materializations, and **sorted** indexes for filters and joins.
-**Prerequisites:** Authenticate, workspace, and catalog discovery via the **`hotdata`** skill (`connections`, `tables`, `datasets`, `databases`).
+**Prerequisites:** Authenticate, workspace, and catalog discovery via the **`hotdata`** skill (`connections`, `tables`, `databases`).
**Related skills:** **`hotdata-search`** (BM25, vector, retrieval indexes), **`hotdata-geospatial`** (spatial SQL).
@@ -23,7 +23,7 @@ hotdata query status [--output table|json|csv]
- **PostgreSQL dialect.** Quote mixed-case identifiers: `"CustomerName"`.
- Use **`hotdata tables list`** for schema discovery — not `information_schema` via `query`.
-- Fully qualified names: `..
`, `datasets..
`, `..
`.
+- Fully qualified names: `..
`, `..
`.
- Long-running queries may return `query_run_id` → poll with **`query status`** (exit `2` = still running). Do not re-run identical heavy SQL while polling.
- For **workspace-wide** joins and naming, load **context:DATAMODEL** when listed (`hotdata context list` → `show DATAMODEL`) — see **`hotdata`** skill.
@@ -79,24 +79,16 @@ hotdata results [--workspace-id ] [--output table|json
hotdata query status # if async
```
-2. **Materialize** (pick one)
-
- ```bash
- hotdata datasets create --name chain_slice [--description "chain slice"] --sql "SELECT ..."
- hotdata datasets create --name chain_from_saved [--description "from saved"] --query-id
- ```
-
- Or managed parquet:
+2. **Materialize** into a managed database (parquet)
```bash
hotdata databases create --catalog analytics
hotdata databases load --catalog analytics --table slice --file ./slice.parquet
```
-3. **Chain query** — use printed **`full_name`** or `datasets list` **FULL NAME** column:
+3. **Chain query** — use the catalog-qualified name `.public.
`:
```bash
- hotdata query "SELECT * FROM datasets.main.chain_slice WHERE ..."
hotdata query "SELECT * FROM analytics.public.slice WHERE ..."
```
@@ -111,7 +103,7 @@ Full procedure: [references/WORKFLOWS.md](references/WORKFLOWS.md).
For equality, range, and sort-heavy OLAP — not full-text or vector (see **`hotdata-search`**):
```bash
-hotdata indexes create --connection-id --schema --table
`):
+Land a smaller table in a **managed database** (parquet → `..
`):
```bash
hotdata databases create --catalog chain_db
hotdata databases load --catalog chain_db --table revenue_slice --file ./revenue_slice.parquet
```
-Note the printed **`full_name`** (e.g. `datasets.main.chain_revenue_slice` or `chain_db.public.revenue_slice`). For datasets, **`FULL NAME`** from `datasets list` is authoritative.
+The table is then addressable as `chain_db.public.revenue_slice`. Confirm with `hotdata databases tables list`.
### 3. Chain query
-Query using the actual `full_name` from create or list — do not hardcode `datasets.main`; use whatever qualified name was printed:
+Query using the catalog-qualified name `.public.
`:
```bash
-hotdata datasets list
-hotdata query "SELECT * FROM datasets.main.chain_revenue_slice WHERE ..."
-# Managed database:
-# hotdata query "SELECT * FROM chain_db.public.revenue_slice WHERE ..."
+hotdata databases tables list
+hotdata query "SELECT * FROM chain_db.public.revenue_slice WHERE ..."
```
### Naming and documentation
- Prefer predictable `--name` values: `chain__`.
-- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`datasets.…` or `database.schema.table`).
+- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`database.schema.table`).
- Promote join/grain findings to **context:DATAMODEL** when they should be shared or persisted (**`hotdata`** skill).
### Guardrails
- Materialize when the base scan is large and the follow-up runs many times.
- Keep Chain tables focused; avoid wide `SELECT *` materializations when a narrow projection suffices.
-- For upload format choice (datasets vs databases), see **`hotdata`** WORKFLOWS — [Datasets vs managed databases](../../hotdata/references/WORKFLOWS.md#datasets-vs-managed-databases).
+- For managed-database uploads, see **`hotdata`** WORKFLOWS — [Managed databases](../../hotdata/references/WORKFLOWS.md#managed-databases).
diff --git a/skills/hotdata-search/SKILL.md b/skills/hotdata-search/SKILL.md
index ef7a2ae..015c952 100644
--- a/skills/hotdata-search/SKILL.md
+++ b/skills/hotdata-search/SKILL.md
@@ -42,25 +42,20 @@ hotdata search "" --table [--type vector] [--co
## Indexes (BM25 and vector)
-Indexes attach to a **managed database table** (`--catalog`) or a **dataset** (`--dataset-id`). Create is not supported on raw connection tables via CLI. `list` and `delete` accept `--connection-id` for connection-scoped operations.
+Create attaches to a table via its `--catalog` alias (a managed-database catalog or a connection name). `list` and `delete` accept `--connection-id` (+ `--schema` + `--table`) for connection-scoped operations.
```bash
-# List — workspace scan (filter by connection, schema, table, or dataset)
+# List — workspace scan (filter by connection, schema, or table)
hotdata indexes list [--connection-id ] [--schema ] [--table
] [--workspace-id ] [--output table|json|yaml]
-hotdata indexes list --dataset-id [--workspace-id ] [--output table|json|yaml]
-# Create — managed database table (catalog alias)
+# Create — by catalog alias (resolves a managed-database catalog or a connection name)
hotdata indexes create --catalog --schema --table
\
--name idx_chunks_embedding --column embedding --type vector --metric cosine
```
diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md
index ea2a2f4..7769637 100644
--- a/skills/hotdata/SKILL.md
+++ b/skills/hotdata/SKILL.md
@@ -1,6 +1,6 @@
---
name: hotdata
-description: Use this skill when the user wants to run core hotdata CLI commands — auth, workspaces, connections, managed databases, datasets, tables, basic SQL query, database context (context:DATAMODEL), jobs, and skill install. Activate for "run hotdata", "list workspaces", "list connections", "create a connection", "list databases", "managed database", "load parquet", "list tables", "list datasets", "create a dataset", "execute a query", "database context", "context:DATAMODEL", or general Hotdata CLI usage. For full-text/vector search and retrieval indexes use hotdata-search; for OLAP analytics, query history, stored results, and Chain materializations use hotdata-analytics; for geospatial/GIS use hotdata-geospatial.
+description: Use this skill when the user wants to run core hotdata CLI commands — auth, workspaces, connections, managed databases, tables, basic SQL query, database context (context:DATAMODEL), jobs, and skill install. Activate for "run hotdata", "list workspaces", "list connections", "create a connection", "list databases", "managed database", "load parquet", "list tables", "execute a query", "database context", "context:DATAMODEL", or general Hotdata CLI usage. For full-text/vector search and retrieval indexes use hotdata-search; for OLAP analytics, query history, stored results, and Chain materializations use hotdata-analytics; for geospatial/GIS use hotdata-geospatial.
version: 0.5.0
---
@@ -20,7 +20,7 @@ Install all skills with **`hotdata skills install`**. Load specialized skills on
| Skill | Use for |
|-------|---------|
-| **`hotdata`** (this file) | Auth, workspaces, connections, databases, datasets, tables, basic `query`, context, jobs |
+| **`hotdata`** (this file) | Auth, workspaces, connections, databases, tables, basic `query`, context, jobs |
| **`hotdata-search`** | BM25, vector search, `hotdata search`, bm25/vector indexes, embedding providers |
| **`hotdata-analytics`** | OLAP SQL, aggregations, query/results history, Chain materializations, sorted indexes |
| **`hotdata-geospatial`** | PostGIS-style `ST_*`, WKB, spatial joins |
@@ -72,7 +72,7 @@ Keep two layers separate:
- **Analysis modeling (day to day)** — Understanding data *for the current task*: exploratory SQL, join checks, column semantics for one report, hypotheses, scratch notes. Often conversational or short-lived. **The conversation or local scratch notes** are the right home while you explore; keep them there until you decide they are worth promoting.
-- **context:DATAMODEL (Hotdata database data model)** — A **durable, database-scoped** map stored only via the **context API**: entities and tables across connections, PK/FK relationships, how datasets tie back to sources, naming and query conventions the **whole team** should rely on. This is **higher-level shared structure**, not a transcript of one investigation.
+- **context:DATAMODEL (Hotdata database data model)** — A **durable, database-scoped** map stored only via the **context API**: entities and tables across connections, PK/FK relationships, how derived tables tie back to sources, naming and query conventions the **whole team** should rely on. This is **higher-level shared structure**, not a transcript of one investigation.
**Promotion:** When analysis findings should **outlive the current session** and **guide everyone**, merge them into **context:DATAMODEL** (`hotdata context list` → if `DATAMODEL` is listed, `hotdata context show DATAMODEL` → reconcile → `hotdata context push DATAMODEL`). You do **not** need to update **context:DATAMODEL** after every ad-hoc query—only when the database story or join graph meaningfully changes.
@@ -82,15 +82,15 @@ Use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [
These are **patterns** built from the commands below—not separate CLI subcommands:
-- **Model (`context:DATAMODEL`)** — The **shared** Markdown semantic map of the active database (entities, keys, joins across connections). **Store and read it only via database context** (`hotdata context list`, then `hotdata context show DATAMODEL` **only when listed**, `context push DATAMODEL`); refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). Contrast **analysis modeling** in the conversation or local scratch (see [Analysis modeling vs context:DATAMODEL](#analysis-modeling-vs-contextdatamodel)).
+- **Model (`context:DATAMODEL`)** — The **shared** Markdown semantic map of the active database (entities, keys, joins across connections). **Store and read it only via database context** (`hotdata context list`, then `hotdata context show DATAMODEL` **only when listed**, `context push DATAMODEL`); refresh using `connections`, `connections refresh`, and `tables list`. For a **deep** pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). Contrast **analysis modeling** in the conversation or local scratch (see [Analysis modeling vs context:DATAMODEL](#analysis-modeling-vs-contextdatamodel)).
- **History / Chain / OLAP SQL** — See **`hotdata-analytics`** and [references/WORKFLOWS.md](references/WORKFLOWS.md).
- **Search / retrieval indexes** — See **`hotdata-search`**.
-Catalog, skill decision tree, epic flows (onboard, chain, retrieval), and datasets vs databases: [references/WORKFLOWS.md](references/WORKFLOWS.md).
+Catalog, skill decision tree, epic flows (onboard, chain, retrieval), and managed databases: [references/WORKFLOWS.md](references/WORKFLOWS.md).
## Available Commands
-Top-level subcommands (each detailed below): **`auth`**, **`datasets`**, **`query`**, **`workspaces`**, **`connections`**, **`databases`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`context`**, **`completions`**. Search, indexes (bm25/vector), and embedding providers are documented in **`hotdata-search`**; query history, results, Chain, and OLAP patterns in **`hotdata-analytics`**.
+Top-level subcommands (each detailed below): **`auth`**, **`query`**, **`workspaces`**, **`connections`**, **`databases`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`context`**, **`completions`**. Search, indexes (bm25/vector), and embedding providers are documented in **`hotdata-search`**; query history, results, Chain, and OLAP patterns in **`hotdata-analytics`**.
Global CLI options: **`--api-key`**, **`-v` / `--version`**, **`-h` / `--help`**, **`--no-input`** (disable interactive prompts; commands that require input will error instead — useful in CI or non-TTY environments). Hidden developer flag: **`--debug`** (verbose HTTP logs).
@@ -181,7 +181,7 @@ hotdata connections create \
**Managed databases** are Hotdata-owned catalogs you create and populate yourself — no remote source to sync. Query them in SQL as **`..
`**. Prefer **`hotdata databases`** for this workflow.
-**Parquet vs datasets:** `databases tables load` accepts **parquet only**. For SQL-query or saved-query materializations, use **`hotdata datasets create`**.
+**Parquet only:** `databases tables load` accepts **parquet** files (local `--file`, remote `--url`, or a pre-staged `--upload-id`).
**Active database:** `hotdata databases set ` saves the active database to config. All `databases tables` subcommands and all `context` commands default to the active database; pass **`--database `** to override per-command.
@@ -236,64 +236,6 @@ hotdata tables list [--workspace-id ] [--connection-id ] [--limit ] [--offset ] [--output table|json|yaml]
-```
-- Default format is `table`.
-- Returns `id`, `label`, and `created_at`; table output includes a **`FULL NAME`** column (`datasets..
`).
-- Results are paginated (default 100). Use `--offset` to fetch further pages.
-- `datasets list` always returns **all** datasets in the workspace. Read **`FULL NAME`** to identify the schema: the middle segment is usually **`main`** (e.g. `datasets.main.my_table`) for ordinary uploads.
-
-#### Get dataset details
-```
-hotdata datasets [--workspace-id ] [--output table|json|yaml]
-```
-- Shows dataset metadata and a full column listing with `name`, `data_type`, `nullable`.
-- Use this to inspect schema before querying.
-- For the **qualified SQL name**, prefer **`FULL NAME` from `datasets list`** or the **`full_name` printed by `datasets create`**—do not assume `datasets.main`.
-
-#### Update a dataset
-```
-hotdata datasets update [--description