Skip to content

Improve Unity Catalog, Structured Streaming, Vector Search skills; add Terraform skill#210

Closed
CheeYuTan wants to merge 3 commits intodatabricks-solutions:mainfrom
CheeYuTan:feat/skill-improvements-uc-ss-vs-tf
Closed

Improve Unity Catalog, Structured Streaming, Vector Search skills; add Terraform skill#210
CheeYuTan wants to merge 3 commits intodatabricks-solutions:mainfrom
CheeYuTan:feat/skill-improvements-uc-ss-vs-tf

Conversation

@CheeYuTan
Copy link
Contributor

@CheeYuTan CheeYuTan commented Mar 5, 2026

Summary

Major improvements to 3 existing skills and a brand new Terraform skill, adding 3,090 lines across 13 files. All MCP tool examples were validated against a live Databricks workspace.

Unity Catalog (addresses #103)

  • Expanded SKILL.md from 120 → 170+ lines with full MCP tool table and governance quick start
  • Added 4 new reference files documenting 7 previously undocumented MCP tools:
    • 1-objects-and-governance.mdmanage_uc_objects, manage_uc_grants (catalog/schema/volume/function CRUD, permissions)
    • 2-tags-and-classification.mdmanage_uc_tags (PII classification, data discovery, compliance tagging)
    • 3-security-policies.mdmanage_uc_security_policies (row filters, column masks, security functions)
    • 4-sharing-and-federation.mdmanage_uc_sharing, manage_uc_connections, manage_uc_storage (Delta Sharing, Lakehouse Federation, storage credentials)

Structured Streaming

  • Expanded SKILL.md from 66 → 247 lines — was just a table of contents despite having 8 good reference files
  • Added actionable quick starts: Kafka-to-Delta, foreachBatch MERGE, availableNow scheduled streaming
  • Added trigger selection guide, watermark essentials, stream join patterns, common issues table

Vector Search (addresses #106)

  • Fixed incorrect MCP tool names — replaced non-existent consolidated tools (create_or_update_vs_endpoint, manage_vs_data) with actual individual tools (create_vs_endpoint, list_vs_endpoints, sync_vs_index, upsert_vs_data, etc.)
  • Added end-to-end-rag.md — complete walkthrough from source table → endpoint → index → query → agent integration
  • Added columns_to_sync and filter syntax guidance (Standard vs Storage-Optimized)

Terraform (addresses #145)

  • Brand new skill with SKILL.md + 4 reference files (1,478 lines total)
  • 1-provider-and-auth.md — PAT, service principal, Azure/AWS/GCP auth, multi-provider config
  • 2-core-resources.md — clusters, jobs, SQL warehouses, DLT pipelines, model serving, secrets
  • 3-unity-catalog.md — catalogs, schemas, volumes, grants, storage credentials, Delta Sharing
  • 4-best-practices.md — project structure, modules, remote state, CI/CD, lifecycle management

Test Plan

  • All 7 Unity Catalog MCP tools tested against live workspace (list catalogs, get grants, query tags, list storage credentials, list shares, list connections)
  • Vector Search MCP tools tested (list endpoints, list indexes, get index, query index)
  • Verified actual MCP tool names match what's documented (fixed stale consolidated names)
  • Terraform provider version verified (v1.110.0 as of Feb 2026)
  • All cross-references between skills verified

…d Terraform skill

Unity Catalog (addresses databricks-solutions#103):
- Expand SKILL.md from 120 to 170+ lines with full MCP tool table and governance quick start
- Add 4 new reference files documenting 7 previously undocumented MCP tools:
  1-objects-and-governance.md (manage_uc_objects, manage_uc_grants)
  2-tags-and-classification.md (manage_uc_tags)
  3-security-policies.md (manage_uc_security_policies)
  4-sharing-and-federation.md (manage_uc_sharing, manage_uc_connections, manage_uc_storage)

Structured Streaming:
- Expand SKILL.md from 66 to 247 lines — was just a table of contents despite having 8 good reference files
- Add actionable quick starts (Kafka-to-Delta, foreachBatch MERGE, availableNow)
- Add trigger selection guide, watermark essentials, join patterns, common issues table

Vector Search (addresses databricks-solutions#106):
- Fix MCP tool names: replace non-existent consolidated tools with actual individual tools
  (create_vs_endpoint, list_vs_endpoints, sync_vs_index, upsert_vs_data, etc.)
- Add end-to-end-rag.md: complete walkthrough from source table to agent integration
- Add columns_to_sync and filter syntax guidance

Terraform (addresses databricks-solutions#145):
- New skill with SKILL.md + 4 reference files (1,478 lines total)
- Covers provider auth (AWS/Azure/GCP), core resources, Unity Catalog IaC, best practices
- Includes modules, CI/CD patterns, state management, and multi-environment structure

All MCP tool examples were validated against a live Databricks workspace.
- 4-sharing-and-federation.md: Rewrite Delta Sharing examples to use correct
  MCP tool actions (add_table, remove_table, grant_to_recipient) and flat
  parameters instead of incorrect nested updates/changes arrays. Fix
  manage_uc_connections to use options dict instead of top-level host/port/user.
  Add create_foreign_catalog action documentation.

- 3-security-policies.md: Correct misleading claim that ALL_PRIVILEGES bypasses
  row filters/column masks. Only metastore admins and account admins bypass them.

- Structured Streaming SKILL.md: Fix realTime trigger syntax from realTime=True
  (invalid) to realTime="5 minutes" (correct, DBR 16.4+). Fix production
  checklist wording about default trigger behavior.

- end-to-end-rag.md: Replace non-existent VectorSearchRetrieverUDF with
  VectorSearchRetrieverTool. Clarify filter example context (Standard vs
  Storage-Optimized endpoint). Fix ChatAgent message access to use attribute
  style.

- Terraform 2-core-resources.md: Replace deprecated pipeline target attribute
  with schema. SKILL.md: Remove invalid serverless cluster profile spark_conf.
- Add databricks-terraform to DATABRICKS_SKILLS list
- Add databricks-terraform description and extra files
- Update databricks-unity-catalog extra files (add 4 new reference files + 6-volumes.md + 7-data-profiling.md)
- Update databricks-vector-search extra files (add end-to-end-rag.md)
- Update databricks-unity-catalog description
@CheeYuTan
Copy link
Contributor Author

Splitting into 4 smaller PRs for easier review:

All content is identical to this PR, just separated by skill.

@CheeYuTan CheeYuTan closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant