Skip to content

feat: generalized ABAC Terraform module for Genie Space onboarding#641

Open
louiscsq wants to merge 35 commits intodatabrickslabs:mainfrom
louiscsq:feature/generalize-abac-module
Open

feat: generalized ABAC Terraform module for Genie Space onboarding#641
louiscsq wants to merge 35 commits intodatabrickslabs:mainfrom
louiscsq:feature/generalize-abac-module

Conversation

@louiscsq
Copy link
Contributor

Summary

  • Genie Space ABAC automation: Adds a reusable Terraform module (uc-quickstart/utils/genie/aws/) that automates the full lifecycle of Genie Space provisioning with Attribute-Based Access Control (ABAC) — including group creation, tag policies, FGAC (row filters & column masks), entity tag assignments, masking functions, and UC grants.
  • AI-assisted config generation: Includes generate_abac.py which uses an LLM to auto-generate abac.auto.tfvars configurations from DDL schemas, along with validate_abac.py to verify correctness before terraform apply.
  • Multi-catalog support with examples: Supports multi-catalog ABAC deployments with deploy/destroy workflows. Includes fully worked examples for finance and healthcare domains with walkthrough documentation.

Changes

  • 55 new files, ~10,000 lines added under uc-quickstart/utils/
  • Terraform resources: Genie Space, groups, tag policies, FGAC policies, entity tags, masking functions, warehouse, UC grants
  • AI workflow: ABAC prompt template, config generator (generate_abac.py), validator (validate_abac.py), masking function deployer
  • Makefile-driven workflow: make plan, make apply, make destroy, make generate-abac, make validate
  • Domain examples: finance (ABAC functions, schema, groups, tag policies, test scripts) and healthcare (DDL, masking functions, walkthrough)
  • Documentation: README, ABAC_PROMPT, GENIE_SPACE_PERMISSIONS, IMPORT_EXISTING guides

Test plan

  • Run make validate against example tfvars to verify config correctness
  • Run terraform plan with finance example to confirm no errors
  • Run terraform plan with healthcare example to confirm no errors
  • Deploy to a test workspace and verify Genie Space is created with correct ABAC policies
  • Verify masking functions are deployed and applied correctly
  • Test generate_abac.py with sample DDL to confirm valid output

Made with Cursor

@github-actions
Copy link

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

louiscsq and others added 25 commits February 27, 2026 15:12
- Add genie_space_acls.tf to run set-acls via null_resource
- Update genie_space.sh to support Service Principal OAuth M2M auth
- Add null provider for local-exec provisioner
- Add genie_space_id variable to trigger ACL setup
- Update outputs with genie_space_acls_applied and groups

Co-authored-by: Cursor <cursoragent@cursor.com>
… depends_on

- Remove warehouse_grants.tf; Genie embeds on warehouse, no end-user CAN_USE needed
- Update docs (README, GENIE_SPACE_PERMISSIONS, IMPORT_EXISTING, variables)
- demo_user_junior_us_id/senior_eu_id -> demo_user_junior_us_ids/senior_eu_ids (list)
- group_members.tf: for_each over IDs, add depends_on for groups and assignments

Co-authored-by: Cursor <cursoragent@cursor.com>
- entity_tag_assignments.tf: apply finance ABAC tags to tables/columns (from 3.ApplyFinanceSetTags.sql)
- fgac_policies.tf: catalog-level ABAC policies for PII, PCI, AML, US/EU region (from 4.CreateFinanceABACPolicies.sql)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Switch from databricks_grants (declarative, overwrites all) to
  databricks_grant (additive, per-principal) to avoid stripping
  existing catalog permissions
- Grant the Terraform SP explicit USE_CATALOG, USE_SCHEMA, EXECUTE,
  MANAGE on the catalog so it can create FGAC policies referencing
  masking UDFs
- Add depends_on for mws_permission_assignment and grant resources
  to all policy_info resources to fix race conditions
- Add missing "Global" value to data_residency tag policy

Co-authored-by: Cursor <cursoragent@cursor.com>
…functions

Refactor the finance-specific ABAC module into a generic, variable-driven
design that supports any domain. Users can now bring their own tables,
masking functions, groups, tag policies, and FGAC policies via terraform.tfvars.

Key changes:
- Drive all resources (groups, tag policies, tag assignments, FGAC policies)
  from input variables with for_each
- Use additive databricks_grant to avoid clobbering existing permissions
- Auto-prefix entity_name and function_name with catalog.schema so tfvars
  use short, relative names
- Add masking_functions_library.sql with reusable UDF templates
- Add ABAC_PROMPT.md for AI-assisted tfvars generation
- Add examples/ with finance.tfvars and the original SQL demo files
- Rewrite README with Quick Start, Pick-and-Mix, and AI-Assisted workflows

Co-authored-by: Cursor <cursoragent@cursor.com>
…form apply

Python validator that cross-checks terraform.tfvars and masking SQL:
- Groups, tag keys/values, entity_name format, policy_type validity
- fgac_policies principals reference existing groups
- Tag conditions reference defined tag_policies and allowed values
- function_name is relative (no catalog.schema prefix)
- SQL functions match fgac_policies references
- Warns about unused functions and empty auth fields

Also documents the validation step in README.md and ABAC_PROMPT.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
End-to-end Tier 3 example with 4 healthcare tables (Patients,
Encounters, Prescriptions, Billing). DDL and generated SQL use
<YOUR_CATALOG> placeholder so users substitute their own catalog.

Also adds a "MY CATALOG AND SCHEMA" input section to ABAC_PROMPT.md
so the AI knows which catalog/schema to use in its output.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reorganize flat examples/ folder into industry-specific subdirectories.
Add time_sleep for tag policy eventual consistency and healthcare
example files (masking_functions.sql, ABAC prompt).

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
… validation

- Add generate_abac.py for automated LLM-driven ABAC config generation
  (Databricks FMAPI, Anthropic, OpenAI providers)
- Add auth.auto.tfvars.example to separate credentials from ABAC config
- Add ddl/ and generated/ folders for AI-assisted workflow
- Add healthcare DDL examples (patients, encounters, prescriptions, billing)
- Update ABAC_PROMPT.md with valid condition syntax rules (forbid columnName/tableName)
- Add condition syntax validation in validate_abac.py
- Increase tag propagation wait from 10s to 30s for eventual consistency
- Update README with visual flow chart and three-tier workflow docs

Co-authored-by: Cursor <cursoragent@cursor.com>
Generalize Genie Space ACLs to use configured groups, harden ignores for user/state files, and add onboarding helpers (retrying generator, import script, e2e test, and Make targets).

Co-authored-by: Cursor <cursoragent@cursor.com>
- Makefile apply target uses -parallelism=1 to avoid tag policy race conditions
- All user-facing instructions (README, TUNING.md, generate_abac.py output)
  updated to show terraform apply -parallelism=1
- validate_abac.py auto-discovers auth.auto.tfvars from module root when
  validating files in generated/
- Align generated output and documentation with validate-then-copy workflow

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…prompt consistency

- Remove hardcoded when_condition = match_condition for column masks in
  fgac_policies.tf; when_condition is optional per Databricks provider docs
- Add CRITICAL Internal Consistency section to ABAC_PROMPT.md to prevent
  tag value mismatches between tag_policies, tag_assignments, and fgac_policies
- Show table-level tags as optional example in prompt and generated output
- Clarify when_condition is optional for both column masks and row filters
- Remove when_condition from ROW_FILTER_REQUIRED in validator

Co-authored-by: Cursor <cursoragent@cursor.com>
…estroy support

- Auto-fetch DDLs from Databricks SDK (uc_tables in auth.auto.tfvars)
- Multi-catalog/schema support with per-policy catalog/function_schema
- Schema-aware UDF generation (functions deployed only where needed)
- Auto-deploy masking functions via Terraform (sql_warehouse_id opt-in)
- Destroy-time provisioner to drop UDFs on terraform destroy
- Simplified workflow: generate → tune → make apply (3 steps)
- Add --promote flag to generate_abac.py for 2-step workflow
- Makefile: promote, validate, apply, destroy targets with -auto-approve
- Remove uc_catalog_name/uc_schema_name (derived from fully-qualified names)
- Fix deploy_masking_functions.py parsing of USE CATALOG/SCHEMA directives
- Add CREATE_FUNCTION to SP grants for end-to-end lifecycle
- Remove ignore_changes on tag policy values to allow updates
- Industry-agnostic ABAC prompt (not limited to healthcare/finance)
- Add roadmap: multi Genie Space, multi steward, AI tuning, policy import

Made-with: Cursor
…ame terraform.tfvars to abac.auto.tfvars

- Add dual-mode Genie Space management (auto-create or ACLs-only) in
  genie_space.tf, replacing separate genie_space_acls.tf and genie_warehouse.tf
- Add auto-created SQL warehouse support in warehouse.tf
- Rename terraform.tfvars to abac.auto.tfvars for git tracking (secrets
  stay in auth.auto.tfvars)
- Fix Databricks provider tag policy ordering bug with lifecycle
  ignore_changes and auto-import retry in Makefile
- Fix SQL parsing error in deploy_masking_functions.py for inline comments
  after semicolons
- Simplify README with quick-start-first structure and consistent make
  targets
- Improve ABAC_PROMPT.md with SQL formatting rules and cross-tag-policy
  consistency guidance
- Update all references across docs, scripts, and examples
- Remove stale DDL files and old genie_warehouse/genie_space_acls modules

Made-with: Cursor
- Add AI-generated Genie Space config (sample questions, instructions,
  benchmarks, title, description) via serialized_space API
- Split auth.auto.tfvars into auth (secrets, gitignored) and env
  (tables/warehouse/genie, checked in) for safe git tracking
- Rebuild genie_space.sh with Python-based JSON builder for proper
  serialized_space construction (version 2, sorted IDs, 32-char hex)
- Simplify README: remove reference tables, trim troubleshooting,
  clean up Advanced Usage, update flowchart with two-box layout
- Improve generate_abac.py output with clickable file paths and
  clearer next-step guidance
- Update all docs and examples for three-file config pattern

Made-with: Cursor
- Add genie_sql_filters, genie_sql_measures, genie_sql_expressions, and
  genie_join_specs to serialized_space for better Genie SQL generation
- Update ABAC_PROMPT.md with unambiguous benchmark rules, business default
  instructions, and domain-adaptive generation guidance
- Use two-step create-then-patch for Genie Space (CREATE endpoint doesn't
  support sql_snippets/join_specs)
- Restructure TUNING.md to prioritize Genie accuracy review checklist
- Increase Databricks FMAPI timeout to 600s for larger prompt responses
- Remove abac.auto.tfvars from make setup (generated by make generate)

Made-with: Cursor
@louiscsq louiscsq force-pushed the feature/generalize-abac-module branch from 3eec755 to cfbae68 Compare February 27, 2026 04:12
…policies

Add explicit "One Mask Per Column Per Group" rule to ABAC_PROMPT.md with
concrete anti-patterns (e.g., tagging names, emails, and account IDs with
the same tag value then creating separate policies). Update tag_policies.tf
comment to clarify the provider value-ordering behavior.
@alexott
Copy link
Collaborator

alexott commented Feb 27, 2026

Why not put TF part into the https://github.com/databricks/terraform-databricks-examples? also, maybe because of the code size it makes sense to put it as a separate repo in databricks-soltions? (you can request repo creation via FEIP Jira)

@louiscsq
Copy link
Contributor Author

louiscsq commented Mar 1, 2026

Thanks for the advice, @alexott. I will contribute the TF part to terraform-databricks-examples. I also submitted an FEIP two weeks ago, but I am still waiting for a reply. I agree that it makes sense to move this project to a separate repo, but I need advice on whether it's a better fit for databricks-solutions or databrickslab. I do want to track usage stats to measure the impact - not sure if we can do that for repos in databricks-solutions.

louiscsq and others added 9 commits March 2, 2026 21:13
Set product identifier (genie-abac-quickstart/0.1.0) in Python SDK
Config, Makefile env var (DATABRICKS_USER_AGENT_EXTRA), and curl
User-Agent headers. Auto-upgrade databricks-sdk if version is too
old for databricks.sdk.config.
Switch from per-client Config(product=...) to global ua.with_extra()
and ua.with_product() calls, matching the pattern used by DQX and
other Databricks Labs projects. Simplifies WorkspaceClient creation
and ensures telemetry is set once at module load time.
…liability

- Add ignore_changes=[values] to tag_policies.tf with sync-tags workflow
  to permanently fix Databricks provider value reordering bug
- Add scripts/sync_tag_policies.py to update tag policy values via SDK
- Add autofix_tag_policies() to generate_abac.py to auto-add missing
  tag values the LLM forgets to declare in tag_policies
- Update Makefile: sync-tags + reimport before apply, remove broken retry
- Improve ABAC_PROMPT.md with common mistake warnings and final checks
- Gitignore generated/ folder and promoted masking_functions.sql
Prerequisites now appear before Quick Start since they apply to the
entire tool, not just advanced usage. Added detailed Metastore Admin
privileges, improved troubleshooting with bulk reimport script, and
added "Import existing groups" to roadmap.
Update product name, User-Agent strings, and README intro across all
files to reflect the new GenieRails identity — guardrails for Genie
onboarding at scale.
…lemetry

WorkspaceClient defaults product="unknown", which overrides the global
ua.with_product() call. Passing product/product_version explicitly to
each constructor ensures the User-Agent header starts with
genie-abac-quickstart/0.1.0 instead of unknown/0.0.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep genierails product name from generalize-abac-module branch,
with the WorkspaceClient(product=...) fix for control plane telemetry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plicate User-Agent entries

product= on WorkspaceClient constructor is the only mechanism needed.
The ua.with_extra() and ua.with_product() calls were redundant and
caused genierails/0.1.0 to appear twice in the User-Agent header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants