feat: generalized ABAC Terraform module for Genie Space onboarding#641
feat: generalized ABAC Terraform module for Genie Space onboarding#641louiscsq wants to merge 35 commits intodatabrickslabs:mainfrom
Conversation
|
All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
- Add genie_space_acls.tf to run set-acls via null_resource - Update genie_space.sh to support Service Principal OAuth M2M auth - Add null provider for local-exec provisioner - Add genie_space_id variable to trigger ACL setup - Update outputs with genie_space_acls_applied and groups Co-authored-by: Cursor <cursoragent@cursor.com>
… depends_on - Remove warehouse_grants.tf; Genie embeds on warehouse, no end-user CAN_USE needed - Update docs (README, GENIE_SPACE_PERMISSIONS, IMPORT_EXISTING, variables) - demo_user_junior_us_id/senior_eu_id -> demo_user_junior_us_ids/senior_eu_ids (list) - group_members.tf: for_each over IDs, add depends_on for groups and assignments Co-authored-by: Cursor <cursoragent@cursor.com>
- entity_tag_assignments.tf: apply finance ABAC tags to tables/columns (from 3.ApplyFinanceSetTags.sql) - fgac_policies.tf: catalog-level ABAC policies for PII, PCI, AML, US/EU region (from 4.CreateFinanceABACPolicies.sql) Co-authored-by: Cursor <cursoragent@cursor.com>
- Switch from databricks_grants (declarative, overwrites all) to databricks_grant (additive, per-principal) to avoid stripping existing catalog permissions - Grant the Terraform SP explicit USE_CATALOG, USE_SCHEMA, EXECUTE, MANAGE on the catalog so it can create FGAC policies referencing masking UDFs - Add depends_on for mws_permission_assignment and grant resources to all policy_info resources to fix race conditions - Add missing "Global" value to data_residency tag policy Co-authored-by: Cursor <cursoragent@cursor.com>
…functions Refactor the finance-specific ABAC module into a generic, variable-driven design that supports any domain. Users can now bring their own tables, masking functions, groups, tag policies, and FGAC policies via terraform.tfvars. Key changes: - Drive all resources (groups, tag policies, tag assignments, FGAC policies) from input variables with for_each - Use additive databricks_grant to avoid clobbering existing permissions - Auto-prefix entity_name and function_name with catalog.schema so tfvars use short, relative names - Add masking_functions_library.sql with reusable UDF templates - Add ABAC_PROMPT.md for AI-assisted tfvars generation - Add examples/ with finance.tfvars and the original SQL demo files - Rewrite README with Quick Start, Pick-and-Mix, and AI-Assisted workflows Co-authored-by: Cursor <cursoragent@cursor.com>
…form apply Python validator that cross-checks terraform.tfvars and masking SQL: - Groups, tag keys/values, entity_name format, policy_type validity - fgac_policies principals reference existing groups - Tag conditions reference defined tag_policies and allowed values - function_name is relative (no catalog.schema prefix) - SQL functions match fgac_policies references - Warns about unused functions and empty auth fields Also documents the validation step in README.md and ABAC_PROMPT.md. Co-authored-by: Cursor <cursoragent@cursor.com>
End-to-end Tier 3 example with 4 healthcare tables (Patients, Encounters, Prescriptions, Billing). DDL and generated SQL use <YOUR_CATALOG> placeholder so users substitute their own catalog. Also adds a "MY CATALOG AND SCHEMA" input section to ABAC_PROMPT.md so the AI knows which catalog/schema to use in its output. Co-authored-by: Cursor <cursoragent@cursor.com>
Reorganize flat examples/ folder into industry-specific subdirectories. Add time_sleep for tag policy eventual consistency and healthcare example files (masking_functions.sql, ABAC prompt). Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
… validation - Add generate_abac.py for automated LLM-driven ABAC config generation (Databricks FMAPI, Anthropic, OpenAI providers) - Add auth.auto.tfvars.example to separate credentials from ABAC config - Add ddl/ and generated/ folders for AI-assisted workflow - Add healthcare DDL examples (patients, encounters, prescriptions, billing) - Update ABAC_PROMPT.md with valid condition syntax rules (forbid columnName/tableName) - Add condition syntax validation in validate_abac.py - Increase tag propagation wait from 10s to 30s for eventual consistency - Update README with visual flow chart and three-tier workflow docs Co-authored-by: Cursor <cursoragent@cursor.com>
Generalize Genie Space ACLs to use configured groups, harden ignores for user/state files, and add onboarding helpers (retrying generator, import script, e2e test, and Make targets). Co-authored-by: Cursor <cursoragent@cursor.com>
- Makefile apply target uses -parallelism=1 to avoid tag policy race conditions - All user-facing instructions (README, TUNING.md, generate_abac.py output) updated to show terraform apply -parallelism=1 - validate_abac.py auto-discovers auth.auto.tfvars from module root when validating files in generated/ - Align generated output and documentation with validate-then-copy workflow Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…prompt consistency - Remove hardcoded when_condition = match_condition for column masks in fgac_policies.tf; when_condition is optional per Databricks provider docs - Add CRITICAL Internal Consistency section to ABAC_PROMPT.md to prevent tag value mismatches between tag_policies, tag_assignments, and fgac_policies - Show table-level tags as optional example in prompt and generated output - Clarify when_condition is optional for both column masks and row filters - Remove when_condition from ROW_FILTER_REQUIRED in validator Co-authored-by: Cursor <cursoragent@cursor.com>
…estroy support - Auto-fetch DDLs from Databricks SDK (uc_tables in auth.auto.tfvars) - Multi-catalog/schema support with per-policy catalog/function_schema - Schema-aware UDF generation (functions deployed only where needed) - Auto-deploy masking functions via Terraform (sql_warehouse_id opt-in) - Destroy-time provisioner to drop UDFs on terraform destroy - Simplified workflow: generate → tune → make apply (3 steps) - Add --promote flag to generate_abac.py for 2-step workflow - Makefile: promote, validate, apply, destroy targets with -auto-approve - Remove uc_catalog_name/uc_schema_name (derived from fully-qualified names) - Fix deploy_masking_functions.py parsing of USE CATALOG/SCHEMA directives - Add CREATE_FUNCTION to SP grants for end-to-end lifecycle - Remove ignore_changes on tag policy values to allow updates - Industry-agnostic ABAC prompt (not limited to healthcare/finance) - Add roadmap: multi Genie Space, multi steward, AI tuning, policy import Made-with: Cursor
…ame terraform.tfvars to abac.auto.tfvars - Add dual-mode Genie Space management (auto-create or ACLs-only) in genie_space.tf, replacing separate genie_space_acls.tf and genie_warehouse.tf - Add auto-created SQL warehouse support in warehouse.tf - Rename terraform.tfvars to abac.auto.tfvars for git tracking (secrets stay in auth.auto.tfvars) - Fix Databricks provider tag policy ordering bug with lifecycle ignore_changes and auto-import retry in Makefile - Fix SQL parsing error in deploy_masking_functions.py for inline comments after semicolons - Simplify README with quick-start-first structure and consistent make targets - Improve ABAC_PROMPT.md with SQL formatting rules and cross-tag-policy consistency guidance - Update all references across docs, scripts, and examples - Remove stale DDL files and old genie_warehouse/genie_space_acls modules Made-with: Cursor
…in README Made-with: Cursor
- Add AI-generated Genie Space config (sample questions, instructions, benchmarks, title, description) via serialized_space API - Split auth.auto.tfvars into auth (secrets, gitignored) and env (tables/warehouse/genie, checked in) for safe git tracking - Rebuild genie_space.sh with Python-based JSON builder for proper serialized_space construction (version 2, sorted IDs, 32-char hex) - Simplify README: remove reference tables, trim troubleshooting, clean up Advanced Usage, update flowchart with two-box layout - Improve generate_abac.py output with clickable file paths and clearer next-step guidance - Update all docs and examples for three-file config pattern Made-with: Cursor
- Add genie_sql_filters, genie_sql_measures, genie_sql_expressions, and genie_join_specs to serialized_space for better Genie SQL generation - Update ABAC_PROMPT.md with unambiguous benchmark rules, business default instructions, and domain-adaptive generation guidance - Use two-step create-then-patch for Genie Space (CREATE endpoint doesn't support sql_snippets/join_specs) - Restructure TUNING.md to prioritize Genie accuracy review checklist - Increase Databricks FMAPI timeout to 600s for larger prompt responses - Remove abac.auto.tfvars from make setup (generated by make generate) Made-with: Cursor
3eec755 to
cfbae68
Compare
…policies Add explicit "One Mask Per Column Per Group" rule to ABAC_PROMPT.md with concrete anti-patterns (e.g., tagging names, emails, and account IDs with the same tag value then creating separate policies). Update tag_policies.tf comment to clarify the provider value-ordering behavior.
|
Why not put TF part into the https://github.com/databricks/terraform-databricks-examples? also, maybe because of the code size it makes sense to put it as a separate repo in databricks-soltions? (you can request repo creation via FEIP Jira) |
|
Thanks for the advice, @alexott. I will contribute the TF part to terraform-databricks-examples. I also submitted an FEIP two weeks ago, but I am still waiting for a reply. I agree that it makes sense to move this project to a separate repo, but I need advice on whether it's a better fit for databricks-solutions or databrickslab. I do want to track usage stats to measure the impact - not sure if we can do that for repos in databricks-solutions. |
Set product identifier (genie-abac-quickstart/0.1.0) in Python SDK Config, Makefile env var (DATABRICKS_USER_AGENT_EXTRA), and curl User-Agent headers. Auto-upgrade databricks-sdk if version is too old for databricks.sdk.config.
Switch from per-client Config(product=...) to global ua.with_extra() and ua.with_product() calls, matching the pattern used by DQX and other Databricks Labs projects. Simplifies WorkspaceClient creation and ensures telemetry is set once at module load time.
…liability - Add ignore_changes=[values] to tag_policies.tf with sync-tags workflow to permanently fix Databricks provider value reordering bug - Add scripts/sync_tag_policies.py to update tag policy values via SDK - Add autofix_tag_policies() to generate_abac.py to auto-add missing tag values the LLM forgets to declare in tag_policies - Update Makefile: sync-tags + reimport before apply, remove broken retry - Improve ABAC_PROMPT.md with common mistake warnings and final checks - Gitignore generated/ folder and promoted masking_functions.sql
Prerequisites now appear before Quick Start since they apply to the entire tool, not just advanced usage. Added detailed Metastore Admin privileges, improved troubleshooting with bulk reimport script, and added "Import existing groups" to roadmap.
Update product name, User-Agent strings, and README intro across all files to reflect the new GenieRails identity — guardrails for Genie onboarding at scale.
…lemetry WorkspaceClient defaults product="unknown", which overrides the global ua.with_product() call. Passing product/product_version explicitly to each constructor ensures the User-Agent header starts with genie-abac-quickstart/0.1.0 instead of unknown/0.0.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep genierails product name from generalize-abac-module branch, with the WorkspaceClient(product=...) fix for control plane telemetry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plicate User-Agent entries product= on WorkspaceClient constructor is the only mechanism needed. The ua.with_extra() and ua.with_product() calls were redundant and caused genierails/0.1.0 to appear twice in the User-Agent header. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
uc-quickstart/utils/genie/aws/) that automates the full lifecycle of Genie Space provisioning with Attribute-Based Access Control (ABAC) — including group creation, tag policies, FGAC (row filters & column masks), entity tag assignments, masking functions, and UC grants.generate_abac.pywhich uses an LLM to auto-generateabac.auto.tfvarsconfigurations from DDL schemas, along withvalidate_abac.pyto verify correctness beforeterraform apply.Changes
uc-quickstart/utils/generate_abac.py), validator (validate_abac.py), masking function deployermake plan,make apply,make destroy,make generate-abac,make validateTest plan
make validateagainst example tfvars to verify config correctnessterraform planwith finance example to confirm no errorsterraform planwith healthcare example to confirm no errorsgenerate_abac.pywith sample DDL to confirm valid outputMade with Cursor