diff --git a/README.md b/README.md index fb759ff..ba3affc 100644 --- a/README.md +++ b/README.md @@ -1,201 +1,94 @@ -# Databricks Document Intelligence Agent — Reference Implementation +# Databricks Document Intelligence Agent [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) -[![Databricks CLI ≥0.298](https://img.shields.io/badge/Databricks_CLI-%E2%89%A50.298-orange)](https://docs.databricks.com/aws/en/dev-tools/cli/install) +[![Databricks CLI >=0.298](https://img.shields.io/badge/Databricks_CLI-%3E%3D0.298-orange)](https://docs.databricks.com/aws/en/dev-tools/cli/install) [![Status: reference](https://img.shields.io/badge/status-reference%20implementation-informational)](./PRODUCTION_READINESS.md) -[![Built with Spec-Kit](https://img.shields.io/badge/built%20with-Spec--Kit-purple)](https://github.com/github/spec-kit) -[![Built with Claude Code](https://img.shields.io/badge/built%20with-Claude%20Code-D97757)](https://claude.com/claude-code) -A **Databricks-native document intelligence + agent** stack: parse PDFs once with `ai_parse_document`, classify and extract structured KPIs with `ai_classify` / `ai_extract`, score quality on a 5-dimension rubric, index high-quality summaries into Mosaic AI Vector Search, and serve a cited-answer agent through a Streamlit app on Databricks Apps. **Demonstrated on synthetic SEC 10-K filings**, but the architecture works for any structured document corpus (contracts, invoices, research reports, regulatory filings). +A Databricks-native reference implementation for document intelligence agents. It parses PDFs with Databricks AI Functions, extracts structured KPIs into governed Delta tables, indexes high-quality document summaries with Mosaic AI Vector Search, and serves cited answers through Agent Bricks and Databricks Apps. -> [!IMPORTANT] -> Open-source **reference implementation** for production-grade Databricks patterns. Read [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md), [`SECURITY.md`](./SECURITY.md), and [`VALIDATION.md`](./VALIDATION.md) before pointing real users at it. +The demo corpus is synthetic SEC 10-K filings, but the pattern applies to other enterprise document sets such as contracts, invoices, research reports, and regulatory filings. ``` - SEC 10-K PDF Analyst's question - (e.g., ACME_10K_2024.pdf) "What were ACME's top 3 risks in FY24?" - │ │ - ▼ ▼ - ┌─────────────────────┐ ┌──────────────────────┐ - │ Pipeline (offline) │ ───────────▶ │ Agent (online) │ - │ Parse → KPIs │ indexed │ Retrieve → Answer │ - │ Quality scoring │ knowledge │ with citations │ - └─────────────────────┘ └──────────────────────┘ - │ - ▼ - "ACME cited supply-chain - risk [1], AI competition - [2], regulation [3]…" +PDFs in UC volume + | + v +Lakeflow pipeline: parse -> classify/extract -> quality score + | + +--> Gold KPI tables + | + +--> Vector Search index + | + v +Agent Bricks Knowledge Assistant + Supervisor Agent + | + v +Databricks App with citations, feedback, and Lakebase history ``` -For architecture and deploy ordering, see [**`docs/design.md`**](./docs/design.md). For operations, validation, and troubleshooting, see [**`docs/runbook.md`**](./docs/runbook.md). +## Where To Read ---- - -## Table of contents - -- [Features](#features) -- [Readiness levels](#readiness-levels) -- [How Agent Bricks is used](#how-agent-bricks-is-used) -- [Prerequisites](#prerequisites) -- [Getting started](#getting-started) -- [CLEARS quality gate](#clears-quality-gate) -- [Configuration](#configuration) -- [Testing & validation](#testing--validation) -- [Deployment](#deployment) -- [Repo layout](#repo-layout) -- [Limitations](#limitations) -- [Contributing](#contributing) -- [Security](#security) -- [License](#license) -- [Acknowledgments](#acknowledgments) - ---- - -## Features - -- **End-to-end document intelligence pipeline** — Auto Loader ingest → `ai_parse_document` → section explosion → `ai_classify` + `ai_extract` → 5-dim quality rubric → Vector Search Delta-Sync index (the endpoint is DAB-managed; the index is created/synced by `jobs/index_refresh/sync_index.py`). SQL-only pipeline (Lakeflow Spark Declarative Pipelines). -- **Cited-answer agent** — Agent Bricks Knowledge Assistant for cited document Q&A, Supervisor Agent for cross-company orchestration, and a deterministic KPI tool for structured comparisons. -- **Streamlit chat UI on Databricks Apps** — citation chips, thumbs feedback, conversation history persisted to Lakebase Postgres. -- **Eval-gated promotion** — `mlflow.evaluate(model_type="databricks-agent")` against a 30-question set with thresholds for Correctness, Adherence, Relevance, Execution, Safety, Latency p95. -- **Reproducible synthetic corpus** — `samples/synthesize.py` generates ACME / BETA / GAMMA 10-Ks plus a deliberately-low-quality `garbage_10K_2024.pdf` for the rubric-exclusion test (SC-006). No EDGAR dependency in CI. -- **Staged deploy with chicken-egg resolution** — `scripts/bootstrap-demo.sh` orchestrates foundation → data production → consumers so a fresh workspace deploys cleanly with no "errors tolerated." -- **Lakehouse Monitoring + AI/BI dashboard** — drift on extraction confidence, p95 latency by company, ungrounded-answer rate. - -## Readiness levels - -| Level | Meaning | Required evidence | -|---|---|---| -| Reference-ready | Synthetic corpus deploys and demonstrates the architecture end-to-end | Demo bundle validates, bootstrap succeeds, synthetic CLEARS passes | -| Pilot-ready | Real 10-K filings validate parse/extract/cited-answer behavior | Reference-ready + small real EDGAR corpus + reviewed costs/latency | -| Production-ready | Analysts can use it under governed identity and operational SLOs | Pilot-ready + app-level OBO enabled, audit proof, alerts/dashboards, rollback tested | - -Full checklists in [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md). - -> Latest demo status, 2026-04-26: Agent Bricks bootstrap, Databricks App deploy, direct Supervisor endpoint smoke, and Vector Search index-refresh smoke passed. Reference-ready remains blocked by CLEARS thresholds. Prod readiness still requires user-token passthrough/OBO evidence. See [`VALIDATION.md`](./VALIDATION.md). - ---- - -## How Agent Bricks is used - -Databricks creation path: [Create an AI agent](https://docs.databricks.com/aws/en/generative-ai/agent-framework/create-agent) → Knowledge Assistant for document Q&A, with Supervisor Agent coordinating hosted tools. - -The Agent Bricks path is: - -1. `jobs/index_refresh/sync_index.py` creates/syncs the Mosaic AI Vector Search Delta-Sync index over `gold_filing_sections_indexable`. -2. `agent/document_intelligence_agent.py` creates or updates the Agent Bricks Knowledge Assistant with that Vector Search index as its knowledge source. The source uses `summary` as the searchable text column and `filename` as the document URI column. -3. `agent/document_intelligence_agent.py` creates or updates the UC SQL function `lookup_10k_kpis`. -4. `agent/document_intelligence_agent.py` creates or updates the Agent Bricks Supervisor Agent with two tools: the Knowledge Assistant for cited document Q&A and the UC function for deterministic KPI lookups. -5. Agent Bricks generates concrete serving endpoint names. Resolve the live Supervisor endpoint with `./scripts/resolve-agent-endpoint.sh `. -6. The Databricks App receives the resolved endpoint through the `agent_endpoint_name` bundle variable as `DOCINTEL_AGENT_ENDPOINT`. -7. The app invokes `POST /serving-endpoints/{endpoint}/invocations` directly. Prod uses each user's OBO token. Demo uses the App service principal when `DOCINTEL_OBO_REQUIRED=false`. `WorkspaceClient.serving_endpoints.query()` is not used for Agent Bricks invocation because validation showed it did not preserve the needed Agent Bricks response shape. -8. Knowledge Assistant citations currently arrive as markdown footnotes in Agent Bricks output messages. `app/agent_bricks_response.py` normalizes the final answer and extracts citation chips from those footnotes. +| Need | Doc | +|---|---| +| Architecture and Agent Bricks design | [`docs/design.md`](./docs/design.md) | +| Setup, deploy, operations, troubleshooting | [`docs/runbook.md`](./docs/runbook.md) | +| Validation commands and latest workspace evidence | [`VALIDATION.md`](./VALIDATION.md) | +| Production readiness gates | [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md) | +| Identity, OBO, grants, and secrets | [`SECURITY.md`](./SECURITY.md) | +| Streamlit app local development | [`app/README.md`](./app/README.md) | -Useful Databricks references: +## What This Builds -- [Create an AI agent](https://docs.databricks.com/aws/en/generative-ai/agent-framework/create-agent) -- [Knowledge Assistant](https://docs.databricks.com/aws/en/generative-ai/agent-bricks/knowledge-assistant) -- [Supervisor Agent](https://docs.databricks.com/aws/en/generative-ai/agent-bricks/multi-agent-supervisor) +- Lakeflow Spark Declarative Pipeline over raw PDF filings. +- Gold tables for parsed sections, structured KPIs, and quality scoring. +- Mosaic AI Vector Search Delta-Sync index over high-quality section summaries. +- Agent Bricks Knowledge Assistant grounded in the Vector Search index. +- Agent Bricks Supervisor Agent with Knowledge Assistant and UC SQL KPI tools. +- Streamlit Databricks App with citation chips, thumbs feedback, and Lakebase Postgres persistence. +- MLflow CLEARS eval gate for correctness, latency, execution, adherence, relevance, and safety. ---- +The Agent Bricks code artifact is [`agent/document_intelligence_agent.py`](./agent/document_intelligence_agent.py). It creates or updates the Knowledge Assistant, UC SQL function, Supervisor Agent, and serving-endpoint permissions. ## Prerequisites -### Software - -| Tool | Version | Why | -|---|---|---| -| Python | 3.11 or 3.12 | Agent + app runtime; tests; eval gate | -| Databricks CLI | ≥ 0.298 | DAB `--strict` validation, `bundle run` for apps, UC permissions API, Lakebase + serving-endpoint resource schemas | -| Git | any recent | Repo + Spec-Kit commit hooks | -| `jq` | any recent | Workspace ID discovery in step 2 of Getting Started (CLI-only fallback shown inline if you don't have it) | -| `make` (optional) | any | Convenience targets if you choose to add them | - -macOS install: - -```bash -brew install python@3.12 jq -brew install databricks/tap/databricks -``` - -Linux: see [Databricks CLI install docs](https://docs.databricks.com/aws/en/dev-tools/cli/install). - -### Databricks workspace - -You need a workspace with **all** of the following enabled: - -- Serverless SQL warehouse (AI Functions GA — `ai_parse_document`, `ai_classify`, `ai_extract`, `ai_query`) -- Mosaic AI Vector Search (endpoint + Delta-Sync index) -- Agent Bricks Knowledge Assistant and Supervisor Agent -- AI Gateway with OBO / identity enforcement -- Lakebase Postgres (preview / GA depending on region) -- Databricks Apps (Streamlit runtime) -- Lakehouse Monitoring -- Unity Catalog with permission to create catalogs/schemas/volumes (or an existing schema you can write to) - -**Required for production identity:** - -- Databricks Apps **user token passthrough** (workspace admin setting). Prod requires user-scoped Agent Bricks calls — see [`SECURITY.md`](./SECURITY.md). +- Python 3.11 or 3.12. +- Databricks CLI >= 0.298. +- A Databricks workspace with Serverless SQL, Unity Catalog, Document Intelligence AI Functions, Mosaic AI Vector Search, Agent Bricks, Databricks Apps, Lakebase, and Lakehouse Monitoring enabled. +- A serverless SQL warehouse ID. -### Free trial signup +For production identity, Databricks Apps user token passthrough must be enabled. Demo can run with App service-principal fallback; prod must use OBO. See [`SECURITY.md`](./SECURITY.md). -Don't have a workspace? The fastest path is the **14-day Premium trial** at . Verify each entitlement above is enabled in your trial workspace and region — Mosaic AI Vector Search, Lakebase, Databricks Apps, Agent Bricks, and AI Gateway rollout varies by cloud and region, so a Premium tier doesn't automatically guarantee every feature is on. Workspace settings → Previews / Compute → Mosaic AI is the place to check. +## Quickstart -> Note: **Free Edition** at databricks.com/learn/free-edition does not include the required governed agent services and **cannot run this implementation**. Use the Premium trial. - -After signup: - -```bash -databricks auth login --host https://.cloud.databricks.com -databricks auth profiles # verify the DEFAULT profile is configured -``` - ---- - -## Getting started - -### 1. Clone and install +Install local dependencies: ```bash -git clone https://github.com//databricks-document-intelligence-agent.git -cd databricks-document-intelligence-agent python -m venv .venv .venv/bin/pip install -r agent/requirements.txt -r evals/requirements.txt pytest ``` -### 2. Discover your workspace IDs +Find a serverless warehouse: ```bash -# With jq: -databricks warehouses list --output json | jq '.[] | {id, name, state}' - -# Without jq (CLI-only fallback): databricks warehouses list ``` -Pick the ID of a serverless warehouse (state can be `STOPPED` — it auto-starts). You'll need it as `DOCINTEL_WAREHOUSE_ID`. - -### 3. Validate the bundle +Validate the bundle: ```bash databricks bundle validate --strict -t demo ``` -If this prints `Validation OK!`, every YAML resource is schema-correct. - -### 4. First-time stand-up (staged bootstrap, ~15–25 min) +Bring up a demo workspace: ```bash DOCINTEL_CATALOG=workspace \ DOCINTEL_SCHEMA=docintel_10k_demo \ -DOCINTEL_WAREHOUSE_ID= \ +DOCINTEL_WAREHOUSE_ID= \ ./scripts/bootstrap-demo.sh ``` -The script handles the chicken-egg ordering automatically — see [`docs/design.md` § Deploy ordering](./docs/design.md#deploy-ordering-foundation--consumers). - -### 5. Run the eval gate +Run the eval gate: ```bash DOCINTEL_CATALOG=workspace DOCINTEL_SCHEMA=docintel_10k_demo \ @@ -204,212 +97,63 @@ DOCINTEL_CATALOG=workspace DOCINTEL_SCHEMA=docintel_10k_demo \ --dataset evals/dataset.jsonl ``` -Exit 0 means every CLEARS axis met its threshold. - -### 6. Open the app - -In the workspace UI: **Apps → `doc-intel-analyst-demo`**. Ask: +Open **Apps -> `doc-intel-analyst-demo`** in the Databricks workspace and ask: -> What were the top 3 risk factors disclosed by ACME in their FY24 10-K? - -You should see a grounded answer with citation chips linking to `ACME_10K_2024.pdf` / `Risk`. +> What was ACME's revenue in fiscal year 2024? Example deployed Databricks App validation: ![Deployed 10-K Analyst app showing an ACME revenue answer with a structured KPI citation chip](./docs/databricks-app-dogfood.png) -### 7. Steady-state deploys - -After the first bring-up, iteration depends on what changed: +## Common Commands ```bash -# YAML / pipeline / job / app config changes -AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" -databricks bundle deploy -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" -databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app - -# Agent Bricks configuration / tool glue changes -DOCINTEL_CATALOG=workspace \ -DOCINTEL_SCHEMA=docintel_10k_demo \ -DOCINTEL_WAREHOUSE_ID= \ -python -m agent.document_intelligence_agent --target demo -AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" -databricks bundle deploy -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" -databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app - -# Pipeline SQL changes that need to re-process existing filings -databricks bundle run -t demo doc_intel_pipeline -``` - -You can also re-run `./scripts/bootstrap-demo.sh` — it auto-detects steady-state and does the full cycle (deploy → refresh data → update Agent Bricks → app run → grants → smoke) in one command. - -For a guided 30-minute tour, see [`specs/001-doc-intel-10k/quickstart.md`](./specs/001-doc-intel-10k/quickstart.md). - ---- - -## CLEARS quality gate - -Before any deploy reaches production, an evaluation must pass (constitution principle V — eval-gated agents). - -``` - evals/dataset.jsonl (30 questions: 20 single-filing P2 + 10 cross-company P3) - │ - ▼ - evals/clears_eval.py ──▶ hits the demo endpoint, scores 6 axes: - - ┌─────────────────────────────────────────────────────┐ - │ C - Correctness ≥ 0.80 (factual accuracy) │ - │ L - Latency p95 ≤ 8000 ms (responsiveness) │ - │ E - Execution ≥ 0.95 (no crashes) │ - │ A - Adherence ≥ 0.90 (cites sources) │ - │ R - Relevance ≥ 0.80 (retrieved good docs) │ - │ S - Safety ≥ 0.99 (no harmful output) │ - └─────────────────────────────────────────────────────┘ - - Any axis fails ▶ exit 1 ▶ deploy blocked. -``` - -The bar is hard-coded; changing it requires editing `.specify/memory/constitution.md`, which is its own small ceremony (PR + version bump + Sync Impact Report). - -Implementation uses `mlflow.evaluate(model_type="databricks-agent")` for the LLM-judged axes; Execution and Latency are computed from the raw response stream. When the active MLflow/databricks-agents version exposes per-row correctness in `result.tables['eval_results']`, the runner also logs SC-002/SC-003 P2 vs P3 slices. Current 1.x aggregate outputs may omit those slice columns, so the aggregate CLEARS gate remains the required pass/fail signal. - ---- - -## Configuration - -### Bundle variables (`databricks.yml`) - -| Variable | Default | Purpose | -|---|---|---| -| `catalog` | `workspace` | UC catalog for all resources | -| `schema` | `docintel_10k` (prod) / `docintel_10k_demo` (demo) | Schema under the catalog | -| `lakebase_instance` | per-target | Lakebase database instance name | -| `lakebase_stopped` | `false` | Flip to `true` only after instance exists | -| `service_principal_id` | `""` | **Required** for `-t prod`; `bundle validate -t prod` fails loudly without it | -| `warehouse_id` | looked up from `Serverless Starter Warehouse` | Used by index-refresh + dashboards | -| `embedding_model_endpoint_name` | `databricks-bge-large-en` | Vector Search embeddings | -| `quality_threshold` | `22` | Section quality cutoff (0-30) for index inclusion | -| `max_pdf_bytes` | `52428800` (50 MB) | Reject filings larger than this | -| `analyst_group` | `account users` | UC group granted SELECT/USE on schema, READ/WRITE on volume | -| `agent_endpoint_name` | `UNSET_AGENT_BRICKS_ENDPOINT` | Generated Agent Bricks Supervisor endpoint resolved by `scripts/resolve-agent-endpoint.sh`; pass it on deploy/app-run commands after bootstrap | -| `app_obo_required` | `true` (prod) / `false` (demo) | Controls Databricks Apps user-token passthrough. Demo can use the App SP when passthrough is unavailable; prod requires OBO. | - -Override via `--var name=value` on any `bundle` command. - -### Environment variables (bootstrap + CI) - -| Variable | Required | Used by | -|---|---|---| -| `DOCINTEL_CATALOG` | yes | Bootstrap, CI, eval | -| `DOCINTEL_SCHEMA` | yes | Same | -| `DOCINTEL_WAREHOUSE_ID` | yes | Bootstrap (passed to bundle as `--var warehouse_id`, used by kpi-poll + smoke); `agent/tools.py` structured KPI tool | -| `DOCINTEL_TARGET` | no (default `demo`) | Bootstrap | -| `DOCINTEL_ANALYST_GROUP` | no (default `account users`) | UC grants in bootstrap + CI | -| `DOCINTEL_WAIT_SECONDS` | no (default 600) | Bootstrap KPI-table poll timeout | -| `DOCINTEL_LAKEBASE_TIMEOUT` | no (default 600) | Bootstrap Lakebase-AVAILABLE poll | -| `DATABRICKS_HOST` / `DATABRICKS_TOKEN` | yes (CI only) | GitHub Actions auth | - ---- - -## Testing & validation - -```bash -# Unit tests for Agent Bricks tool glue and app helpers +# Unit tests .venv/bin/python -m pytest agent/tests/ -q -# Bundle schema + interpolation -databricks bundle validate --strict -t demo -databricks bundle validate --strict -t prod # expected to FAIL without --var service_principal_id (intended safety) - -# Bash syntax +# Static checks bash -n scripts/bootstrap-demo.sh - -# Compile checks for all modified Python .venv/bin/python -m py_compile \ agent/document_intelligence_agent.py agent/tools.py \ app/app.py app/agent_bricks_client.py app/agent_bricks_response.py app/lakebase_client.py \ - evals/clears_eval.py \ - scripts/wait_for_kpis.py samples/synthesize.py -``` - -End-to-end is exercised by [`./scripts/bootstrap-demo.sh`](./scripts/bootstrap-demo.sh) against a real workspace; see [`VALIDATION.md`](./VALIDATION.md) for the full procedure with expected outputs. - ---- + evals/clears_eval.py scripts/wait_for_kpis.py samples/synthesize.py -## Deployment - -| Path | When | -|---|---| -| `./scripts/bootstrap-demo.sh` | Fresh-workspace bring-up (or after `bundle destroy`). Auto-detects FIRST-DEPLOY vs STEADY-STATE; handles staged deploy + data production + UC grants in either mode. | -| `databricks bundle deploy -t demo --var "agent_endpoint_name=$(./scripts/resolve-agent-endpoint.sh demo)"` | YAML / pipeline / job / app config changes after the first bring-up. | -| `databricks bundle run -t demo --var "agent_endpoint_name=$(./scripts/resolve-agent-endpoint.sh demo)" analyst_app` | After any change to `app/` or `resources/consumers/analyst.app.yml` — required to apply runtime config + restart the app. | -| `databricks bundle deploy -t prod --var service_principal_id= --var "agent_endpoint_name=$(./scripts/resolve-agent-endpoint.sh prod)"` | Production deploy, run as the prod SP after prod Agent Bricks bootstrap. | -| GitHub Actions on push to `main` | Steady-state CI: full `bundle deploy` → wait for Lakebase AVAILABLE → upload samples + run pipeline → Agent Bricks / AI Gateway validation → UC grants → `bundle run analyst_app` → CLEARS eval gate. (The first-ever bring-up of a workspace must be done locally with `./scripts/bootstrap-demo.sh`.) | +# Deploy app/config changes after first bring-up +AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" +databricks bundle deploy -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" +databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app -For day-2 ops (Agent Bricks configuration validation, debugging low quality scores, inspecting CLEARS metrics in MLflow), see [`docs/runbook.md`](./docs/runbook.md). For the production-readiness checklist, see [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md). +# Apply Agent Bricks definition changes +DOCINTEL_CATALOG=workspace \ +DOCINTEL_SCHEMA=docintel_10k_demo \ +DOCINTEL_WAREHOUSE_ID= \ +python -m agent.document_intelligence_agent --target demo +``` ---- +More deploy paths and failure handling live in [`docs/runbook.md`](./docs/runbook.md). -## Repo layout +## Repo Layout ``` databricks/ -├── databricks.yml # Bundle root — variables + demo/prod targets -├── pipelines/sql/ # Lakeflow SDP — Bronze → Silver → Gold (SQL only) -├── agent/ # Agent Bricks deterministic tool glue -├── app/ # Streamlit on Databricks Apps + Lakebase client -├── evals/ # MLflow CLEARS eval gate (dataset + runner) -├── jobs/ # Lakeflow Jobs (retention, index refresh) -├── resources/foundation/ # DAB resources with no data deps -├── resources/consumers/ # DAB resources that depend on foundation data -├── scripts/ # bootstrap-demo.sh + helpers -├── samples/ # Synthetic 10-K PDFs (regenerable) -├── specs/001-doc-intel-10k/ # Spec-Kit artifacts (spec, plan, tasks, etc.) -├── docs/ # design.md (this repo's "why") + runbook.md (day-2 ops) -└── .specify/ # Spec-Kit machinery (constitution, hooks) +├── databricks.yml # Bundle root: variables and demo/prod targets +├── pipelines/sql/ # Bronze, Silver, Gold Lakeflow SQL +├── agent/ # Agent Bricks definition and tool glue +├── app/ # Streamlit Databricks App and Lakebase client +├── evals/ # CLEARS dataset and MLflow eval runner +├── jobs/ # Retention and Vector Search index refresh jobs +├── resources/foundation/ # DAB resources without data dependencies +├── resources/consumers/ # App, jobs, monitors, dashboards +├── scripts/ # Bootstrap and workspace helper scripts +├── samples/ # Regenerable synthetic 10-K PDFs +├── specs/001-doc-intel-10k/ # Spec-Kit artifacts +└── docs/ # Design and runbook ``` -Top-level docs: [`CLAUDE.md`](./CLAUDE.md) (runtime guidance for Claude Code), [`CONTRIBUTING.md`](./CONTRIBUTING.md), [`SECURITY.md`](./SECURITY.md), [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md), [`VALIDATION.md`](./VALIDATION.md), [`REAL_10K_PILOT.md`](./REAL_10K_PILOT.md), [`LICENSE`](./LICENSE). +## Current Status ---- - -## Limitations - -This is a production-oriented reference implementation with conservative scale defaults: - -| Limit | Value | Source | -|---|---|---| -| Filings in demo | ~500 | spec.md scale | -| Filings in prod | ~5,000 | spec.md scale | -| Concurrent app users | ~20 | spec.md scale | -| PDF size cap | 50 MB | FR / `bronze_filings_rejected` | -| Raw retention | 90 days | spec clarification | -| Compute | CPU only | constitution add'l constraints | -| Languages | English filings | implicit (foundation model) | -| Eval set size | 30 questions | spec clarification | -| Prod OBO end-to-end | Requires workspace-level `Databricks Apps - user token passthrough` feature | [`SECURITY.md`](./SECURITY.md) | - -Latency SLOs: P95 ≤ 8s for single-filing, ≤ 20s for cross-company. End-to-end pipeline ≤ 10 min P95 on a 30 MB PDF. - ---- - -## Contributing - -Bug reports, doc fixes, and pattern improvements are welcome. The constitution at [`.specify/memory/constitution.md`](./.specify/memory/constitution.md) defines what the project will and won't accept; PRs that conflict need a constitution amendment first. - -See [`CONTRIBUTING.md`](./CONTRIBUTING.md) for local setup, the spec-kit workflow, skill alignment expectations, and the deploy-ordering gotchas reviewers will check for. - -## Security - -See [`SECURITY.md`](./SECURITY.md) for the target-specific identity model, required UC grants, secrets-handling guidance, and how to report security issues in a fork or deployment. +Latest demo evidence is tracked in [`VALIDATION.md`](./VALIDATION.md#latest-demo-snapshot). Readiness gates are tracked in [`PRODUCTION_READINESS.md`](./PRODUCTION_READINESS.md). ## License -Released under the [**MIT License**](./LICENSE) — Copyright (c) 2026 Sathish Krishnan. Use it, fork it, learn from it; just keep the copyright notice. - -## Acknowledgments - -- [**Spec-Kit**](https://github.com/github/spec-kit) — spec-driven development workflow for AI coding agents. -- [**Claude Code**](https://claude.com/claude-code) — Anthropic's CLI for AI-assisted development. -- [**Agent Skills**](https://github.com/anthropics/skills) — general-purpose Claude Code skill bundles. -- [**Databricks**](https://www.databricks.com/) — Unity Catalog, Document Intelligence AI Functions, Lakeflow Spark Declarative Pipelines, Mosaic AI Vector Search, Agent Bricks, AI Gateway, Databricks Apps, Lakebase, Lakehouse Monitoring. +Released under the [MIT License](./LICENSE). diff --git a/app/README.md b/app/README.md index 774d9ff..211cf63 100644 --- a/app/README.md +++ b/app/README.md @@ -11,7 +11,7 @@ Source for the Databricks App `doc-intel-analyst-${target}`. Streamlit chat UI o | `lakebase_client.py` | psycopg-based persistence to Lakebase Postgres. | | `requirements.txt` | Python deps installed by the Apps runtime. | -## Running deployed (canonical) +## Running deployed ```bash AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" @@ -22,6 +22,8 @@ databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" The first request creates the `conversation_history`, `query_logs`, and `feedback` tables in Lakebase. Tables are owned by the App's bound service principal (auto-granted `CAN_CONNECT_AND_CREATE` per `resources/consumers/analyst.app.yml`). +For full deploy paths and generated endpoint handling, see [`../docs/runbook.md`](../docs/runbook.md#deploy-paths). + ## Running locally For iteration speed you may run the Streamlit app on your laptop against a deployed demo workspace for Lakebase UI work. Authenticate as the App's bound service principal so Lakebase schema init produces the same ownership as the deployed App: @@ -58,14 +60,10 @@ DROP SCHEMA IF EXISTS docintel_app CASCADE; -- next streamlit run will re-init under the App SP ``` -## OBO (on-behalf-of) flow - -When `DOCINTEL_OBO_REQUIRED=true`, the app builds a `WorkspaceClient(token=...)` from each user's `x-forwarded-access-token` header (`app.py:_user_client`) and invokes the Agent Bricks Supervisor endpoint through `POST /serving-endpoints/{endpoint}/invocations` (`agent_bricks_client.py`). Agent Bricks, Knowledge Assistant, and the UC KPI function run under the invoking user's identity. - -When `DOCINTEL_OBO_REQUIRED=false`, the app uses its App service principal client. This is for demo workspaces that do not have Databricks Apps user-token passthrough enabled. +## Auth flow -The endpoint name is generated by Agent Bricks, resolved with `scripts/resolve-agent-endpoint.sh`, and injected into the app as `DOCINTEL_AGENT_ENDPOINT` by `resources/consumers/analyst.app.yml`. +When `DOCINTEL_OBO_REQUIRED=true`, `app.py` builds a user-scoped `WorkspaceClient` from the Databricks Apps `x-forwarded-access-token` header. When false, it uses the App service principal client for demo workspaces. -`user_api_scopes` is declared only on the prod target in `databricks.yml` (`serving.serving-endpoints`, `sql`, `iam.access-control:read`, `iam.current-user:read`) and requires the workspace-level "Databricks Apps - user token passthrough" feature. Demo leaves scopes unset and grants the App service principal `CAN_QUERY` on the generated Supervisor endpoint after deploy. +The app invokes the generated Agent Bricks Supervisor endpoint through `POST /serving-endpoints/{endpoint}/invocations` in `agent_bricks_client.py`. Security policy, required scopes, and prod verification are owned by [`../SECURITY.md`](../SECURITY.md) and [`../docs/runbook.md`](../docs/runbook.md#verifying-end-to-end-obo). **Streamlit gotcha** (per the [Databricks Apps runtime docs](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/app-runtime)): the OBO token is captured at the initial HTTP request; the connection then upgrades to WebSocket and the token never refreshes. If a user's UC permissions change mid-session, ask them to reload the page. diff --git a/docs/design.md b/docs/design.md index 0b2c5e7..0a80db6 100644 --- a/docs/design.md +++ b/docs/design.md @@ -1,6 +1,6 @@ # Design — Databricks Document Intelligence Agent -This document covers the *why*, the architecture, and the build workflow behind the repo. For setup and day-to-day use, see [`README.md`](../README.md). For day-2 ops, see [`runbook.md`](./runbook.md). +This document covers the why and architecture behind the repo. For setup, deployment commands, and troubleshooting, see [`runbook.md`](./runbook.md). ## Table of contents @@ -261,7 +261,7 @@ AI-driven here means Claude carries the boring parts (boilerplate YAML, retry-lo ## Deploy ordering: foundation → consumers -DABs deploy *everything in one shot*. But our resources have a chicken-and-egg problem on a fresh workspace: +DABs reconcile workspace resources from YAML, but a fresh workspace has real data dependencies that cannot all be satisfied in one pass: ``` ┌────────────────────────────────────────────────┐ @@ -282,11 +282,10 @@ DABs deploy *everything in one shot*. But our resources have a chicken-and-egg p Knowledge Assistant needs the Vector Search index. Monitor needs the table populated. Table needs the pipeline to run. - - ▶ Single `bundle deploy` → 4+ errors on a fresh workspace. + ▶ Single `bundle deploy` cannot create the whole stack from scratch. ``` -The fix is a **staged deploy** orchestrated by `scripts/bootstrap-demo.sh`. Resources are split into two directories by data dependency: +The repo keeps this ordering explicit by splitting resources by dependency: ``` resources/ @@ -305,49 +304,9 @@ The fix is a **staged deploy** orchestrated by `scripts/bootstrap-demo.sh`. Reso └── lakebase_catalog.yml (needs instance AVAILABLE) ``` -**The bootstrap script auto-detects which mode to run** by checking whether the Agent Bricks Supervisor exists and has generated a serving endpoint: - -``` - does doc-intel-supervisor-${target} have endpoint_name? - │ - no ◀───────┴───────▶ yes - │ │ - ▼ ▼ - ┌──────────────────┐ ┌──────────────────┐ - │ FIRST-DEPLOY │ │ STEADY-STATE │ - │ (staged) │ │ (full deploy) │ - ├──────────────────┤ ├──────────────────┤ - │ 1. temp-rename │ │ 1. bundle deploy │ - │ consumers/* │ │ (full bundle) │ - │ .yml.skip │ │ │ - │ 2. bundle deploy │ │ 2. refresh data: │ - │ (foundation) │ │ upload, run │ - │ 3. produce data: │ │ pipeline, │ - │ upload, run, │ │ sync index, │ - │ sync index, │ │ update Agent │ - │ Agent Bricks │ │ Bricks │ - │ 4. wait Lakebase │ │ serving in- │ - │ AVAILABLE │ │ place │ - │ 5. restore yamls │ │ │ - │ 6. bundle deploy │ │ │ - │ (full bundle) │ │ │ - └────────┬─────────┘ └────────┬─────────┘ - │ │ - └───────────┬───────────┘ - ▼ - ┌──────────────────────────┐ - │ Common to both: │ - │ • bundle run analyst_app│ - │ • UC grants chain │ - │ • smoke check │ - └──────────────────────────┘ -``` - -**Why two modes?** DAB tracks resource state; if you run the temp-rename trick against an existing deployment, DAB sees the consumer YAMLs as removed and plans to delete the app, monitor, dashboard, etc. Use FIRST-DEPLOY only for a fresh workspace; use STEADY-STATE after resources exist. - -CI (`.github/workflows/deploy.yml`) assumes steady-state — the first-ever bring-up of a workspace must be done locally with `./scripts/bootstrap-demo.sh`. After that, every push to `main` runs the steady-state path: full `bundle deploy` → refresh data → sync index → update Agent Bricks → grants → CLEARS gate. +`scripts/bootstrap-demo.sh` is the operational entry point for first bring-up and steady-state demo deploys. It stages foundation resources, materializes the data/index/Agent Bricks dependencies, restores consumer resources, restarts the app, grants access, and runs a smoke query. -For the per-step procedure and known failure modes, see [`runbook.md` § Known deploy ordering gaps](./runbook.md#known-deploy-ordering-gaps). +The design point is not the script itself; it is that resource dependencies are explicit and repeatable. The exact command flow and failure modes are owned by [`runbook.md` § Deploy paths](./runbook.md#deploy-paths) and [`runbook.md` § Known deploy ordering gaps](./runbook.md#known-deploy-ordering-gaps). --- diff --git a/docs/runbook.md b/docs/runbook.md index fb3c065..b46bfaa 100644 --- a/docs/runbook.md +++ b/docs/runbook.md @@ -1,6 +1,101 @@ # Operating Runbook — 10-K Analyst -This runbook covers day-2 operations for the deployed demo/prod stacks. For first-time setup follow [`specs/001-doc-intel-10k/quickstart.md`](../specs/001-doc-intel-10k/quickstart.md). +This runbook owns setup commands, deploy paths, configuration reference, and day-2 operations for demo/prod stacks. Architecture belongs in [`design.md`](./design.md); validation evidence belongs in [`../VALIDATION.md`](../VALIDATION.md). + +## Prerequisites + +- Python 3.11 or 3.12. +- Databricks CLI >= 0.298. +- A Databricks workspace with Serverless SQL, Unity Catalog, Document Intelligence AI Functions, Mosaic AI Vector Search, Agent Bricks, Databricks Apps, Lakebase, and Lakehouse Monitoring enabled. +- A serverless SQL warehouse ID. + +Prod requires Databricks Apps user token passthrough. Demo can run with `app_obo_required=false` and App service-principal endpoint access. + +## Deploy paths + +Fresh demo workspace: + +```bash +DOCINTEL_CATALOG= \ +DOCINTEL_SCHEMA= \ +DOCINTEL_WAREHOUSE_ID= \ +./scripts/bootstrap-demo.sh +``` + +The same script also handles steady-state demo deploys. It auto-detects whether the Agent Bricks Supervisor already exists and avoids deleting consumer resources on existing deployments. + +App, YAML, pipeline, or job config changes after first bring-up: + +```bash +AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" +databricks bundle deploy -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" +databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app +``` + +Agent Bricks definition changes: + +```bash +DOCINTEL_CATALOG= \ +DOCINTEL_SCHEMA= \ +DOCINTEL_WAREHOUSE_ID= \ +python -m agent.document_intelligence_agent --target demo + +AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh demo)" +databricks bundle deploy -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" +databricks bundle run -t demo --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app +``` + +Pipeline SQL changes that must re-process existing filings: + +```bash +databricks bundle run -t demo doc_intel_pipeline +``` + +Prod deploys must pass `service_principal_id` and use an OBO-enabled target: + +```bash +TARGET=prod +AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh "$TARGET")" +databricks bundle deploy -t "$TARGET" \ + --var service_principal_id= \ + --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" +databricks bundle run -t "$TARGET" \ + --var service_principal_id= \ + --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" \ + analyst_app +``` + +## Configuration reference + +Bundle variables in `databricks.yml`: + +| Variable | Default | Purpose | +|---|---|---| +| `catalog` | `workspace` | UC catalog for all resources | +| `schema` | `docintel_10k` / `docintel_10k_demo` | Schema under the catalog | +| `lakebase_instance` | per-target | Lakebase database instance name | +| `lakebase_stopped` | `false` | Set to `true` only after the instance exists | +| `service_principal_id` | `""` | Required for prod deploys | +| `warehouse_id` | lookup by name | Used by index refresh, dashboards, and KPI tool | +| `embedding_model_endpoint_name` | `databricks-bge-large-en` | Vector Search embeddings | +| `quality_threshold` | `22` | Section quality cutoff for index inclusion | +| `max_pdf_bytes` | `52428800` | Reject filings larger than 50 MB | +| `analyst_group` | `account users` | UC group for demo grants | +| `agent_endpoint_name` | `UNSET_AGENT_BRICKS_ENDPOINT` | Generated Supervisor endpoint resolved by `scripts/resolve-agent-endpoint.sh` | +| `app_obo_required` | `true` prod, `false` demo | Controls user-token passthrough requirement | + +Bootstrap and CI environment variables: + +| Variable | Required | Used by | +|---|---|---| +| `DOCINTEL_CATALOG` | yes | Bootstrap, CI, eval | +| `DOCINTEL_SCHEMA` | yes | Bootstrap, CI, eval | +| `DOCINTEL_WAREHOUSE_ID` | yes | Bootstrap, KPI polling, structured KPI tool | +| `DOCINTEL_TARGET` | no | Bootstrap target, defaults to `demo` | +| `DOCINTEL_ANALYST_GROUP` | no | UC grants, defaults to `account users` | +| `DOCINTEL_WAIT_SECONDS` | no | KPI-table poll timeout | +| `DOCINTEL_LAKEBASE_TIMEOUT` | no | Lakebase `AVAILABLE` poll timeout | +| `DATABRICKS_HOST` / `DATABRICKS_TOKEN` | CI only | GitHub Actions auth | ## Add a sample filing @@ -174,15 +269,6 @@ resolve on a fresh workspace. Each needs a phase-2 step after a prior side effec - **Fix**: bootstrap waits for Lakebase to reach `AVAILABLE` before the full consumer deploy. -A clean fresh-workspace bring-up is a single command: - -```bash -DOCINTEL_CATALOG= \ -DOCINTEL_SCHEMA= \ -DOCINTEL_WAREHOUSE_ID= \ -./scripts/bootstrap-demo.sh -``` - The script implements a **staged deploy**: resources are split into `resources/foundation/` (no data deps) and `resources/consumers/` (need data). Stage 1 temporarily renames consumer YAMLs to `*.yml.skip` so the