Skip to content

feat(datasets)!: remove datasets commands and dataset feature surface#166

Merged
eddietejeda merged 3 commits into
mainfrom
remove-datasets-commands
Jun 19, 2026
Merged

feat(datasets)!: remove datasets commands and dataset feature surface#166
eddietejeda merged 3 commits into
mainfrom
remove-datasets-commands

Conversation

@eddietejeda

Copy link
Copy Markdown
Contributor

Summary

Removes the datasets feature from the CLI entirely (full feature removal across code + docs).

Net: 17 files changed, ~126 insertions, ~971 deletions. Clean build, clippy clean, 194 unit + 12 integration tests passing.

What changed

Code

  • Deleted src/datasets.rs (441 lines) and the datasets command group (command.rs enum + main.rs handler + module decl/import).
  • indexes — dropped the --dataset-id scope everywhere: removed IndexScope::Dataset, list_one_dataset, the dataset path arms, and its unit test; simplified the create/delete/list handlers to connection/catalog scope only.
  • jobs — removed dataset_refresh and create_dataset_index from parse_job_type and the --job-type value list.
  • sdk.rs — the datasets() client lives in the external hotdata crate (not editable here), so its one X-Workspace-Id-header regression test was repointed off datasets().list onto jobs().list (another generated-client workspace-scoped call) to preserve the assertion.
  • databases.rs — reworded the parquet-only error hint that pointed at hotdata datasets create.
  • tests/workspace_env.rs — repointed the workspace-lock test from datasets list to indexes list.

Docs

  • README + all skill docs scrubbed of the datasets command, --dataset-id, the removed job types, and datasets.main.* query examples (replaced with <catalog>.public.*). "Datasets vs managed databases" sections collapsed to managed-databases-only with consistent anchors.
  • Generic-English "dataset" usage in the geospatial skill was intentionally left untouched.

Notes for reviewers

  • CHANGELOG.md not hand-edited — the changelog is generated from conventional-commit messages at release time; this PR's feat(datasets)!: commit carries the breaking-change note.
  • Pre-existing doc bug also fixed: several index docs showed hotdata indexes create --connection-id ..., but create has no --connection-id flag (the binary rejects it). Index-on-a-connection-table is done via --catalog <connection-name-or-id> (resolve_connection_id resolves names/IDs/catalog aliases). Since the dataset removal forced rewriting the "index scopes" prose anyway, these were corrected to --catalog rather than shipping broken examples.

Test plan

  • cargo build — clean, no new warnings
  • cargo clippy --all-targets — clean (only pre-existing style nits in untouched code)
  • cargo test — 194 unit + 12 integration tests pass
  • Smoke test: hotdata datasets rejected as unrecognized; indexes list has no --dataset-id; jobs --job-type shows only data_refresh_table, data_refresh_connection, create_index

🤖 Generated with Claude Code

Removes the datasets feature from the CLI entirely:

- Delete the `datasets` command group and src/datasets.rs
- Drop the `--dataset-id` index scope (IndexScope::Dataset,
  list_one_dataset, dataset path arms); indexes now scope to
  connection tables / managed-database catalogs only
- Remove `dataset_refresh` and `create_dataset_index` from
  `jobs --job-type`
- Repoint the SDK X-Workspace-Id header regression test off
  `datasets().list` onto `jobs().list`
- Scrub README and all skill docs of dataset commands,
  `--dataset-id`, removed job types, and `datasets.main.*`
  query examples

BREAKING CHANGE: `hotdata datasets ...`, `hotdata indexes --dataset-id`,
and the `dataset_refresh`/`create_dataset_index` job-type filters are
removed.

Also fixes pre-existing doc drift: `indexes create --connection-id`
(not a valid flag) corrected to `--catalog <connection-name-or-id>`.
claude[bot]
claude Bot previously approved these changes Jun 19, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean feature removal. Code changes (command/main/indexes/jobs/sdk) are internally consistent — match arms stay exhaustive, error messages preserved, job display unaffected by the parse_job_type narrowing, and no dangling dataset references remain. Docs scrubbed thoroughly and the --catalog connection-scope correction matches the resolve_connection_id code path.

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 18.51852% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/main.rs 0.00% 22 Missing ⚠️

📢 Thoughts on this report? Let us know!

# Conflicts:
#	src/datasets.rs
#	src/indexes.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean feature removal. Verified no leftover dataset references in src/ (module, command enum, handlers, IndexScope::Dataset, job types all removed), docs scrubbed consistently, and the repointed sdk.rs/workspace_env.rs tests preserve their original assertions. The jobs().list repoint matches the 4-arg signature, and indexes create/list retain prior connection-scope behavior.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped removal of the datasets feature. Verified: no leftover dataset references in code/tests; indexes::list match remains exhaustive with the workspace-scan fallback intact; indexes create/delete collapse to connection scope with unchanged resolution behavior and the public schema default preserved; the sdk.rs workspace-header regression test was correctly repointed to another workspace-scoped call. Docs updated consistently.

@eddietejeda eddietejeda merged commit 714cd8d into main Jun 19, 2026
14 checks passed
@eddietejeda eddietejeda deleted the remove-datasets-commands branch June 19, 2026 21:44
eddietejeda added a commit that referenced this pull request Jun 19, 2026
0.4.0 removes the datasets API/models and adds a usage API. The CLI
already dropped its datasets surface in #166, so the removal is a no-op
here and the new usage API is simply unused. No source changes needed.

Verified: `cargo build`/`clippy` clean, full suite (206) green, and a
production smoke test (auth status, databases list, an arrow-decoded
query) against workspace AgentRyan succeeds.

Co-authored-by: Eddie A Tejeda <669988+eddietejeda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant