Improve Unity Catalog, Structured Streaming, Vector Search skills; add Terraform skill#210
Closed
CheeYuTan wants to merge 3 commits intodatabricks-solutions:mainfrom
Closed
Conversation
…d Terraform skill Unity Catalog (addresses databricks-solutions#103): - Expand SKILL.md from 120 to 170+ lines with full MCP tool table and governance quick start - Add 4 new reference files documenting 7 previously undocumented MCP tools: 1-objects-and-governance.md (manage_uc_objects, manage_uc_grants) 2-tags-and-classification.md (manage_uc_tags) 3-security-policies.md (manage_uc_security_policies) 4-sharing-and-federation.md (manage_uc_sharing, manage_uc_connections, manage_uc_storage) Structured Streaming: - Expand SKILL.md from 66 to 247 lines — was just a table of contents despite having 8 good reference files - Add actionable quick starts (Kafka-to-Delta, foreachBatch MERGE, availableNow) - Add trigger selection guide, watermark essentials, join patterns, common issues table Vector Search (addresses databricks-solutions#106): - Fix MCP tool names: replace non-existent consolidated tools with actual individual tools (create_vs_endpoint, list_vs_endpoints, sync_vs_index, upsert_vs_data, etc.) - Add end-to-end-rag.md: complete walkthrough from source table to agent integration - Add columns_to_sync and filter syntax guidance Terraform (addresses databricks-solutions#145): - New skill with SKILL.md + 4 reference files (1,478 lines total) - Covers provider auth (AWS/Azure/GCP), core resources, Unity Catalog IaC, best practices - Includes modules, CI/CD patterns, state management, and multi-environment structure All MCP tool examples were validated against a live Databricks workspace.
- 4-sharing-and-federation.md: Rewrite Delta Sharing examples to use correct MCP tool actions (add_table, remove_table, grant_to_recipient) and flat parameters instead of incorrect nested updates/changes arrays. Fix manage_uc_connections to use options dict instead of top-level host/port/user. Add create_foreign_catalog action documentation. - 3-security-policies.md: Correct misleading claim that ALL_PRIVILEGES bypasses row filters/column masks. Only metastore admins and account admins bypass them. - Structured Streaming SKILL.md: Fix realTime trigger syntax from realTime=True (invalid) to realTime="5 minutes" (correct, DBR 16.4+). Fix production checklist wording about default trigger behavior. - end-to-end-rag.md: Replace non-existent VectorSearchRetrieverUDF with VectorSearchRetrieverTool. Clarify filter example context (Standard vs Storage-Optimized endpoint). Fix ChatAgent message access to use attribute style. - Terraform 2-core-resources.md: Replace deprecated pipeline target attribute with schema. SKILL.md: Remove invalid serverless cluster profile spark_conf.
- Add databricks-terraform to DATABRICKS_SKILLS list - Add databricks-terraform description and extra files - Update databricks-unity-catalog extra files (add 4 new reference files + 6-volumes.md + 7-data-profiling.md) - Update databricks-vector-search extra files (add end-to-end-rag.md) - Update databricks-unity-catalog description
Contributor
Author
|
Splitting into 4 smaller PRs for easier review:
All content is identical to this PR, just separated by skill. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major improvements to 3 existing skills and a brand new Terraform skill, adding 3,090 lines across 13 files. All MCP tool examples were validated against a live Databricks workspace.
Unity Catalog (addresses #103)
SKILL.mdfrom 120 → 170+ lines with full MCP tool table and governance quick start1-objects-and-governance.md—manage_uc_objects,manage_uc_grants(catalog/schema/volume/function CRUD, permissions)2-tags-and-classification.md—manage_uc_tags(PII classification, data discovery, compliance tagging)3-security-policies.md—manage_uc_security_policies(row filters, column masks, security functions)4-sharing-and-federation.md—manage_uc_sharing,manage_uc_connections,manage_uc_storage(Delta Sharing, Lakehouse Federation, storage credentials)Structured Streaming
SKILL.mdfrom 66 → 247 lines — was just a table of contents despite having 8 good reference filesVector Search (addresses #106)
create_or_update_vs_endpoint,manage_vs_data) with actual individual tools (create_vs_endpoint,list_vs_endpoints,sync_vs_index,upsert_vs_data, etc.)end-to-end-rag.md— complete walkthrough from source table → endpoint → index → query → agent integrationcolumns_to_syncand filter syntax guidance (Standard vs Storage-Optimized)Terraform (addresses #145)
SKILL.md+ 4 reference files (1,478 lines total)1-provider-and-auth.md— PAT, service principal, Azure/AWS/GCP auth, multi-provider config2-core-resources.md— clusters, jobs, SQL warehouses, DLT pipelines, model serving, secrets3-unity-catalog.md— catalogs, schemas, volumes, grants, storage credentials, Delta Sharing4-best-practices.md— project structure, modules, remote state, CI/CD, lifecycle managementTest Plan