Skip to content

Commit 0228fe8

Browse files
Feature/improve data gen (#184)
* Rewrite synthetic-data-generation for improved performance and features * Fix databricks-connect version requirements for Python compatibility The serverless() method requires databricks-connect 15.1.0+, but version 17.x only supports Python 3.12. Updated documentation to specify: - Python 3.10/3.11: use >=15.1,<16.2 - Python 3.12: use >=16.2 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Improve synthetic-data-generation skill with Spark preference and catalog management - Strongly recommend Spark + Faker for all data generation (default approach) - Only use Polars for <10K rows if user explicitly prefers local generation - Add volume upload instructions using databricks fs commands - Remove CREATE CATALOG statements - assume catalogs already exist - Update decision guides and examples to reflect Spark-first approach - Consolidate and simplify execution options and installation instructions - Update best practices and common issues sections Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Cleanup data gen skill * Add stronger guidance to use Databricks Connect * Update data gen for different run modes * Small updates to databricks-connect and environments * Updates to improve serverless dbconnect and polars local for data gen * Add guidance on cache with serverless * Update data gen for better cluster/job guidance * Update classic library install * Suggest uv and improve python task job payload * Add new data gen tests (first 3) * Update data gen ground_truth and baseline * Remove default catalog setting * Add window syntax common issue * Rename and overhaul data gen skill and tests timeouts * Fix skill name mismatch and add missing skills to install scripts - Rename databricks-synthetic-data-generation to databricks-synthetic-data-gen across all install scripts, documentation, and cross-references to match the actual skill directory name - Add missing skills (databricks-iceberg, databricks-parsing) to install.sh and install.ps1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix PR review issues for databricks-synthetic-data-gen skill Bugs: - Remove .cache()/.unpersist() in generate_synthetic_data.py (serverless incompatible) - Fix .gitignore formatting (restore blank line separator) Design: - Refactor ground_truth.yaml to use external response files (1127 → 347 lines) - Change timeout from 480s to 240s with explanatory comment - Add Windows timeout warning in mlflow_eval.py Nits: - Fix hardcoded catalog name (dustin_vannoy_catalog → my_catalog) - Fix DatabricksEnv import path (databricks.connect.session → databricks.connect) - Add EOF newline to 1-setup-and-execution.md - Remove unused imports in evaluate.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Simplify serverless job config in test response Remove new_cluster section and use environment_key at task level for cleaner serverless job definition. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Python 3.12+ requirement to run instructions DatabricksEnv requires databricks-connect>=16.4 which requires Python 3.12+. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove commented out lines from manifest.yaml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update databricks-connect version range and fix version detection - Expand version constraint from >=16.4,<17.0 to >=16.4,<17.4 to support databricks-connect 17.x versions - Fix get_databricks_connect_version() to use importlib.metadata.version() instead of non-existent databricks.connect.__version__ attribute Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Reduce guidelines for faster tests with mlflwo --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 6930632 commit 0228fe8

43 files changed

Lines changed: 3831 additions & 699 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Databricks AI Dev Kit
22
.ai-dev-kit/
33
.claude/
4-
4+
.local
55

66
# Python
77
__pycache__/

.test/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,3 +233,17 @@ uv pip install -e ".test/"
233233
uv run pytest .test/tests/
234234
uv run python .test/scripts/regression.py <skill-name>
235235
```
236+
237+
---
238+
239+
## Troubleshooting
240+
241+
### MLflow evaluation not returning results
242+
243+
If `/skill-test <skill-name> mlflow` hangs or doesn't return results, run manually with debug logging:
244+
245+
```bash
246+
MLFLOW_LOG_LEVEL=DEBUG uv run python .test/scripts/mlflow_eval.py <skill-name>
247+
```
248+
249+
This will show detailed MLflow API calls and help identify connection or authentication issues.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
run_id: '20260303_071721'
2+
created_at: '2026-03-03T07:17:21.838623'
3+
skill_name: databricks-synthetic-data-gen
4+
metrics:
5+
pass_rate: 1.0
6+
total_tests: 4
7+
passed_tests: 4
8+
failed_tests: 0
9+
test_results:
10+
- id: grp_20260302_113344
11+
passed: true
12+
execution_mode: local
13+
- id: gen_serverless_job_catalog_json_002
14+
passed: true
15+
execution_mode: local
16+
- id: grp_20260302_retail_csv_3tables_003
17+
passed: true
18+
execution_mode: local
19+
- id: grp_20260303_manufacturing_delta_streaming_004
20+
passed: true
21+
execution_mode: local

.test/scripts/mlflow_eval.py

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,29 +2,65 @@
22
"""Run MLflow evaluation for a skill.
33
44
Usage:
5-
python mlflow_eval.py <skill_name> [--filter-category <category>] [--run-name <name>]
5+
python mlflow_eval.py <skill_name> [--filter-category <category>] [--run-name <name>] [--timeout <seconds>]
66
77
Environment Variables:
88
DATABRICKS_CONFIG_PROFILE - Databricks CLI profile (default: "DEFAULT")
99
MLFLOW_TRACKING_URI - Set to "databricks" for Databricks MLflow
1010
MLFLOW_EXPERIMENT_NAME - Experiment path (e.g., "/Users/{user}/skill-test")
11+
MLFLOW_LLM_JUDGE_TIMEOUT - Timeout in seconds for LLM judge evaluation (default: 120)
1112
"""
13+
import os
1214
import sys
15+
import signal
1316
import argparse
1417

18+
# Close stdin and disable tqdm progress bars when run non-interactively
19+
# This fixes hanging issues with tqdm/MLflow progress bars in background tasks
20+
if not sys.stdin.isatty():
21+
try:
22+
sys.stdin.close()
23+
sys.stdin = open(os.devnull, 'r')
24+
except Exception:
25+
pass
26+
# Disable tqdm progress bars
27+
os.environ.setdefault("TQDM_DISABLE", "1")
28+
1529
# Import common utilities
1630
from _common import setup_path, print_result, handle_error
1731

1832

33+
class TimeoutException(Exception):
34+
pass
35+
36+
37+
def timeout_handler(signum, frame):
38+
raise TimeoutException("MLflow evaluation timed out")
39+
40+
1941
def main():
2042
parser = argparse.ArgumentParser(description="Run MLflow evaluation for a skill")
2143
parser.add_argument("skill_name", help="Name of skill to evaluate")
2244
parser.add_argument("--filter-category", help="Filter by test category")
2345
parser.add_argument("--run-name", help="Custom MLflow run name")
46+
parser.add_argument(
47+
"--timeout",
48+
type=int,
49+
default=120,
50+
help="Timeout in seconds for evaluation (default: 120)",
51+
)
2452
args = parser.parse_args()
2553

2654
setup_path()
2755

56+
# Set up signal-based timeout (Unix only)
57+
if hasattr(signal, 'SIGALRM'):
58+
signal.signal(signal.SIGALRM, timeout_handler)
59+
signal.alarm(args.timeout)
60+
else:
61+
# Windows: SIGALRM not available - no timeout enforcement
62+
print("WARNING: Timeout not supported on Windows - test may run indefinitely", file=sys.stderr)
63+
2864
try:
2965
from skill_test.runners import evaluate_skill
3066

@@ -34,6 +70,10 @@ def main():
3470
run_name=args.run_name,
3571
)
3672

73+
# Cancel the alarm if we succeeded
74+
if hasattr(signal, 'SIGALRM'):
75+
signal.alarm(0)
76+
3777
# Convert to standard result format
3878
if result.get("run_id"):
3979
result["success"] = True
@@ -42,7 +82,19 @@ def main():
4282

4383
sys.exit(print_result(result))
4484

85+
except TimeoutException as e:
86+
result = {
87+
"success": False,
88+
"skill_name": args.skill_name,
89+
"error": f"Evaluation timed out after {args.timeout} seconds. This may indicate LLM judge endpoint issues.",
90+
"error_type": "timeout",
91+
}
92+
sys.exit(print_result(result))
93+
4594
except Exception as e:
95+
# Cancel alarm on any exception
96+
if hasattr(signal, 'SIGALRM'):
97+
signal.alarm(0)
4698
sys.exit(handle_error(e, args.skill_name))
4799

48100

.test/skills/_routing/ground_truth.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ test_cases:
9999
prompt: "Generate synthetic customer data and evaluate the agent quality with MLflow scorers"
100100
expectations:
101101
expected_skills:
102-
- "databricks-synthetic-data-generation"
102+
- "databricks-synthetic-data-gen"
103103
- "databricks-mlflow-evaluation"
104104
is_multi_skill: true
105105
metadata:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Candidates for databricks-synthetic-data-gen skill
2+
# Test cases pending review before promotion to ground_truth.yaml
3+
#
4+
# Use `/skill-test databricks-synthetic-data-gen add` to create new candidates
5+
# Use `/skill-test databricks-synthetic-data-gen review` to promote candidates to ground truth
6+
7+
candidates: []

0 commit comments

Comments
 (0)