docs(skill/databricks-metric-views): add Update section + UDF gotcha#528
Draft
jacksandom wants to merge 1 commit into
Draft
docs(skill/databricks-metric-views): add Update section + UDF gotcha#528jacksandom wants to merge 1 commit into
jacksandom wants to merge 1 commit into
Conversation
Two additive doc fixes for non-MCP metric view authoring on experimental: 1. New "Update an Existing Metric View" subsection under SQL Operations. Metric views don't support ALTER VIEW ... ADD MEASURE — the only path is CREATE OR REPLACE VIEW with the complete updated YAML. Includes a worked example (adding Average Order Value to orders_metrics) with line-by-line ← unchanged / ← new annotations so the full-replacement requirement is visually obvious. 2. New row in Common Issues: Python UDFs are not supported in measure expressions (use built-in SQL aggregates or SQL UDFs; for custom logic push into the source or wrap as a SQL UDF in UC). stf compare A/B vs origin/experimental (L3 static + L5 output, agent-model claude-sonnet-4-6 via Anthropic OAuth): Composite: A=0.641 vs B=0.617 (+0.024) L3 static: A=0.8753 vs B=0.8752 (≈0) L5 output: A=0.407 vs B=0.359 (+0.047) Feedback pass rate: 166/250 vs 146/249 (66% vs 59%, +8 pts) Both targeted test cases improved: metric-views_alter_010: 4/8 vs 2/8 (+25 pts) metric-views_udf_not_supported_021: 9/10 vs 7/10 (+20 pts) Judge verdict: TIE (low confidence 0.20) — judge sampled a single test case where artifacts were near-identical. Aggregated pass-rate across all 250 feedback rows shows the small but consistent positive lift above. Composite below 0.7 gate on both branches is due to (a) an install-banner hook polluting agent subprocess responses (separate bug worth filing) and (b) ground_truth scaffold staleness, not the fix itself. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two additive doc fixes for non-MCP metric view authoring (matches the
experimentalbranch's CLI-only posture):New "Update an Existing Metric View" subsection under SQL Operations. Metric views don't support
ALTER VIEW … ADD MEASURE— the only path isCREATE OR REPLACE VIEWwith the complete updated YAML. The worked example annotates each line← unchanged, repeated verbatim/← newto make the full-replacement requirement visually obvious. Cross-links toSHOW CREATE TABLEso an agent fetches the current YAML before editing.New row in Common Issues: Python UDFs are not supported in measure expressions. Workaround: use built-in SQL aggregates (
SUM,COUNT,AVG) or SQL UDFs; for custom logic, push transformation into the source table or wrap as a UC-registered SQL UDF.Targets two known footguns surfaced by the metric-views
ground_truth.yaml:metric-views_alter_010— "Add a new measure 'Average Order Value' to my existing orders_metrics metric view"metric-views_udf_not_supported_021— "Can I use a Python UDF inside a metric view measure expression?"Evaluation
Ran
stf compare(L3 static + L5 output) againstorigin/experimental. Agent modelclaude-sonnet-4-6via Anthropic OAuth (had to unsetllm.ai_gateway_host— thefevm-jss-sandboxworkspace's/anthropicendpoint returns 404, breaking the gateway agent path; L3 judges still ran via gateway fine).Both target test cases hit:
metric-views_alter_010metric-views_udf_not_supported_021alter_010's expected response includes the MCPmanage_metric_views(action="alter", …)call, which is intentionally not added onexperimental(CLI-only posture). The fix still scores higher because it teaches the agent to keep all existing measures when updating — the full-replacement requirement.Other notable L5 swings (probably mostly noise — non-target cases vary across runs):
+62 ptsonmetric-views_window_rolling_avg_018+100 ptsonmetric-views_yaml_spec_005+27 ptsonmetric-views_conversational_support_tickets_020−88 ptsonmetric-views_filtered_measure_013(regression, but not a target — likely noise)−86 ptsonmetric-views_star_schema_006(same)L3 judge winner was TIE because the judge sampled a single test case where both artifacts were near-identical. The aggregated pass-rate across all 250 feedback rows is what shows the small but consistent positive lift.
Caveats — independent bugs surfaced during the eval (worth filing separately)
Install-banner contamination of agent responses. Every L5 agent's
with_response/without_responsestarts with theDatabricks AI Dev Kit — update available!banner. TheSessionStartinstall-banner hook is firing inside agent subprocesses and getting captured as the agent's response. This explains why composite scores stay well below 0.7 on both branches — half of every captured response is install-banner garbage. The +0.05 L5 lift is real through this noise.Gateway routing broken on
fevm-jss-sandbox.llm.ai_gateway_host: https://fevm-jss-sandbox.cloud.databricks.comcauses L5 agent subprocesses to injectANTHROPIC_BASE_URL=<host>/anthropic, which 404s and surfaces as the misleading "model may not exist or you may not have access to it" error on every agent run. L3 judges still work via the gateway. Workaround: unsetai_gateway_hostand pass--agent-model claude-sonnet-4-6tostf compare.skill-evaluatorSKILL.md doesn't documentstf compare. The reference table listsevaluate(full pyramid) andaudit(L3 only) but not the dedicated A/B command. Discovered only because @jacksandom flagged it.How to reproduce
Test plan
stf lint databricks-skills/databricks-metric-viewscleanstf compareagainstorigin/experimentalshows positive L5 lift.claude/skills/synced locally (gitignored, not in commit)