Complete command-line interface reference for the Agentic Data Engineering Platform.
The Healthcare Data Platform CLI provides a unified interface for managing pipelines, monitoring data quality, and controlling platform operations. Built with Click, it offers intuitive commands with comprehensive help and validation.
The CLI is automatically available after installing the platform:
# Verify installation
python -m src.cli --help
# Or use the shorthand (if configured)
healthcare-cli --helpAll commands support these global options:
python -m src.cli [GLOBAL_OPTIONS] COMMAND [ARGS]
Global Options:
-c, --config PATH Configuration file path (default: config/config.yaml)
-v, --verbose Enable verbose output
-q, --quiet Suppress non-error output
--dry-run Show what would be done without executing
--format FORMAT Output format: table, json, yaml (default: table)
--no-color Disable colored output
--help Show help message and exitManage Unity Catalog structure and metadata.
Initialize Unity Catalog structure for healthcare data.
python -m src.cli catalog init [OPTIONS]
Options:
--catalog-name TEXT Catalog name (default: healthcare_data)
--force Force recreation if catalog exists
--skip-permissions Skip permission setup
--dry-run Show what would be created
Examples:
# Initialize default catalog structure
python -m src.cli catalog init
# Initialize with custom catalog name
python -m src.cli catalog init --catalog-name my_healthcare_data
# Force recreation of existing catalog
python -m src.cli catalog init --forceWhat it creates:
- Catalog:
healthcare_data - Schemas:
bronze,silver,gold,monitoring - Tables: Monitoring and metadata tables
- Permissions: Default access controls
Check Unity Catalog health and permissions.
python -m src.cli catalog health [CATALOG_NAME] [OPTIONS]
Arguments:
CATALOG_NAME Catalog to check (default: healthcare_data)
Options:
--detailed Show detailed health information
--check-permissions Verify user permissions
--format FORMAT Output format: table, json
Examples:
# Basic health check
python -m src.cli catalog health
# Detailed health check with permissions
python -m src.cli catalog health --detailed --check-permissions
# JSON format output
python -m src.cli catalog health --format jsonList catalogs, schemas, and tables.
python -m src.cli catalog list [OPTIONS]
Options:
--type TYPE What to list: catalogs, schemas, tables
--catalog TEXT Catalog to list schemas/tables from
--schema TEXT Schema to list tables from
--pattern TEXT Filter by name pattern
Examples:
# List all catalogs
python -m src.cli catalog list --type catalogs
# List schemas in healthcare catalog
python -m src.cli catalog list --type schemas --catalog healthcare_data
# List tables with pattern
python -m src.cli catalog list --type tables --pattern "*claims*"Create, manage, and monitor data processing pipelines.
Create a new healthcare data pipeline.
python -m src.cli pipeline create PIPELINE_NAME [OPTIONS]
Arguments:
PIPELINE_NAME Name of the pipeline
Options:
--source-path PATH Source data path (required)
--target-table NAME Target Unity Catalog table (required)
--schedule TEXT Cron schedule expression
--description TEXT Pipeline description
--data-type TYPE Healthcare data type: medicaid_claims, medicare_claims, provider_data
--state TEXT State code for state-specific processing
--cluster-profile Cluster profile: small, medium, large, custom
Examples:
# Basic Medicare claims pipeline
python -m src.cli pipeline create medicare_claims \
--source-path s3://healthcare-raw/medicare/claims/ \
--target-table healthcare_data.silver.medicare_claims \
--schedule "0 */4 * * *"
# California Medicaid pipeline with state-specific validation
python -m src.cli pipeline create ca_medicaid_claims \
--source-path s3://data/ca/medicaid/ \
--target-table healthcare_data.silver.ca_medicaid_claims \
--data-type medicaid_claims \
--state CA \
--cluster-profile medium
# Provider data pipeline with description
python -m src.cli pipeline create provider_updates \
--source-path s3://npi-registry/updates/ \
--target-table healthcare_data.silver.providers \
--description "Daily NPI registry updates" \
--schedule "0 6 * * *"List all configured pipelines.
python -m src.cli pipeline list [OPTIONS]
Options:
--status STATUS Filter by status: running, stopped, failed, all
--format FORMAT Output format: table, json, yaml
--sort-by FIELD Sort by: name, status, last_run, quality_score
--limit INTEGER Limit number of results
Examples:
# List all pipelines
python -m src.cli pipeline list
# List only running pipelines
python -m src.cli pipeline list --status running
# List with JSON output, sorted by quality score
python -m src.cli pipeline list --format json --sort-by quality_scoreTrigger a pipeline run manually.
python -m src.cli pipeline run PIPELINE_NAME [OPTIONS]
Arguments:
PIPELINE_NAME Name of the pipeline to run
Options:
--wait Wait for run completion
--timeout INTEGER Timeout in minutes (default: 60)
--parameters JSON Runtime parameters as JSON
--force Force run even if another run is active
Examples:
# Simple pipeline run
python -m src.cli pipeline run medicare_claims
# Run with wait and timeout
python -m src.cli pipeline run medicare_claims --wait --timeout 30
# Run with custom parameters
python -m src.cli pipeline run medicare_claims \
--parameters '{"start_date": "2024-01-01", "end_date": "2024-01-31"}'
# Force run (stop existing run first)
python -m src.cli pipeline run medicare_claims --forceStop a running pipeline.
python -m src.cli pipeline stop PIPELINE_NAME [OPTIONS]
Options:
--force Force stop without waiting for graceful shutdown
--wait Wait for stop completion
Examples:
# Graceful stop
python -m src.cli pipeline stop medicare_claims
# Force stop with wait
python -m src.cli pipeline stop medicare_claims --force --waitGet pipeline health and performance metrics.
python -m src.cli pipeline metrics PIPELINE_NAME [OPTIONS]
Options:
--days INTEGER Number of days of history (default: 7)
--granularity TEXT Data granularity: hour, day, week
--metrics LIST Specific metrics: runtime, quality, cost, throughput
Examples:
# Basic metrics for last 7 days
python -m src.cli pipeline metrics medicare_claims
# Detailed metrics for last 30 days
python -m src.cli pipeline metrics medicare_claims --days 30
# Only quality and cost metrics
python -m src.cli pipeline metrics medicare_claims \
--metrics quality,cost --granularity dayView pipeline execution logs.
python -m src.cli pipeline logs PIPELINE_NAME [OPTIONS]
Options:
--lines INTEGER Number of recent log lines (default: 100)
--follow Follow log output (like tail -f)
--run-id TEXT Specific run ID to view
--level LEVEL Log level: DEBUG, INFO, WARNING, ERROR
--since TIMESTAMP Show logs since timestamp
Examples:
# Recent logs
python -m src.cli pipeline logs medicare_claims
# Follow live logs
python -m src.cli pipeline logs medicare_claims --follow
# Logs for specific run
python -m src.cli pipeline logs medicare_claims --run-id run_12345
# Only error logs from last 24 hours
python -m src.cli pipeline logs medicare_claims \
--level ERROR --since "24 hours ago"Delete a pipeline and its configuration.
python -m src.cli pipeline delete PIPELINE_NAME [OPTIONS]
Options:
--force Force deletion without confirmation
--keep-data Keep processed data tables
--backup Create backup before deletion
Examples:
# Delete with confirmation
python -m src.cli pipeline delete old_pipeline
# Force delete, keep data
python -m src.cli pipeline delete old_pipeline --force --keep-data
# Delete with backup
python -m src.cli pipeline delete old_pipeline --backupMonitor and manage data quality across healthcare datasets.
Run comprehensive data quality assessment.
python -m src.cli quality check TABLE_NAME [OPTIONS]
Arguments:
TABLE_NAME Unity Catalog table name
Options:
--assessment-type Type: quick, standard, comprehensive
--focus-areas LIST Areas: completeness, validity, consistency, accuracy
--custom-rules FILE Custom validation rules file
--output-file PATH Save results to file
--threshold FLOAT Quality score threshold for success
Examples:
# Quick quality check
python -m src.cli quality check healthcare_data.silver.medicaid_claims
# Comprehensive assessment with focus areas
python -m src.cli quality check healthcare_data.silver.medicare_claims \
--assessment-type comprehensive \
--focus-areas completeness,validity,healthcare_codes
# Quality check with custom rules
python -m src.cli quality check healthcare_data.silver.providers \
--custom-rules config/custom_provider_rules.yaml \
--output-file quality_report.jsonView data quality trends over time.
python -m src.cli quality trends [OPTIONS]
Options:
--table TEXT Table name to analyze
--days INTEGER Number of days (default: 30)
--dimensions LIST Quality dimensions to show
--format FORMAT Output format: table, chart, json
Examples:
# Quality trends for claims table
python -m src.cli quality trends \
--table healthcare_data.silver.medicaid_claims \
--days 14
# Specific dimensions with chart output
python -m src.cli quality trends \
--table healthcare_data.silver.providers \
--dimensions completeness,validity \
--format chartManage data quality alerts and thresholds.
python -m src.cli quality alerts TABLE_NAME [OPTIONS]
Options:
--min-quality FLOAT Minimum quality score threshold
--max-anomaly FLOAT Maximum anomaly percentage
--max-stale FLOAT Maximum stale data percentage
--list List existing alerts
--create Create new alert rules
--delete ALERT_ID Delete specific alert
Examples:
# Create quality alerts
python -m src.cli quality alerts healthcare_data.silver.claims \
--min-quality 0.85 \
--max-anomaly 5.0 \
--max-stale 10.0
# List existing alerts
python -m src.cli quality alerts --list
# Create alerts with custom thresholds
python -m src.cli quality alerts healthcare_data.silver.providers \
--min-quality 0.95 \
--max-anomaly 2.0Generate comprehensive quality reports.
python -m src.cli quality report [OPTIONS]
Options:
--tables LIST Specific tables to include
--output-dir PATH Output directory for reports
--format FORMAT Report format: html, pdf, json
--include-charts Include visualizations
--email-recipients Email report to recipients
Examples:
# Generate report for all tables
python -m src.cli quality report --format html --include-charts
# Report for specific tables with PDF output
python -m src.cli quality report \
--tables healthcare_data.silver.claims,healthcare_data.silver.providers \
--format pdf \
--output-dir reports/
# Generate and email report
python -m src.cli quality report \
--format html \
--email-recipients durai@infinidatum.netMonitor platform health, performance, and costs.
Create a lakehouse monitor for automated monitoring.
python -m src.cli monitor create TABLE_NAME [OPTIONS]
Arguments:
TABLE_NAME Table to monitor
Options:
--monitor-name TEXT Custom monitor name
--baseline-table Baseline table for comparison
--schedule TEXT Monitoring schedule (cron format)
--metrics LIST Metrics to monitor
Examples:
# Create basic monitor
python -m src.cli monitor create healthcare_data.silver.claims
# Monitor with baseline and custom schedule
python -m src.cli monitor create healthcare_data.silver.claims \
--baseline-table healthcare_data.gold.claims_baseline \
--schedule "0 */2 * * *" \
--monitor-name claims_quality_monitorRefresh monitoring data and generate new insights.
python -m src.cli monitor refresh MONITOR_NAME [OPTIONS]
Options:
--wait Wait for refresh completion
--full-refresh Force full data refresh
Examples:
# Refresh specific monitor
python -m src.cli monitor refresh claims_quality_monitor
# Full refresh with wait
python -m src.cli monitor refresh claims_quality_monitor \
--full-refresh --waitList all configured monitors and their status.
python -m src.cli monitor list [OPTIONS]
Options:
--status STATUS Filter by status
--table TEXT Filter by table name
Examples:
# List all monitors
python -m src.cli monitor list
# List only active monitors
python -m src.cli monitor list --status activeManage and optimize Databricks clusters for cost and performance.
Create an adaptive cluster with optimization.
python -m src.cli cluster create CLUSTER_NAME WORKLOAD_TYPE [OPTIONS]
Arguments:
CLUSTER_NAME Name for the new cluster
WORKLOAD_TYPE Workload type: ingestion, quality, analytics
Options:
--cost-level LEVEL Cost optimization level: aggressive, balanced, performance
--node-type TYPE Databricks node type
--min-workers INT Minimum number of workers
--max-workers INT Maximum number of workers
--auto-terminate Auto-termination timeout in minutes
Examples:
# Create ingestion cluster with balanced cost optimization
python -m src.cli cluster create medicaid-ingestion-cluster ingestion \
--cost-level balanced
# Create high-performance analytics cluster
python -m src.cli cluster create analytics-cluster analytics \
--cost-level performance \
--node-type i3.2xlarge \
--min-workers 4 \
--max-workers 20
# Create cost-optimized cluster with auto-termination
python -m src.cli cluster create temp-cluster ingestion \
--cost-level aggressive \
--auto-terminate 30Monitor cluster utilization and get optimization recommendations.
python -m src.cli cluster monitor CLUSTER_NAME [OPTIONS]
Arguments:
CLUSTER_NAME Cluster to monitor
Options:
--days INTEGER Days of historical data
--recommendations Show optimization recommendations
--cost-analysis Include detailed cost analysis
Examples:
# Basic cluster monitoring
python -m src.cli cluster monitor medicaid-processing-cluster
# Monitor with recommendations and cost analysis
python -m src.cli cluster monitor analytics-cluster \
--days 14 \
--recommendations \
--cost-analysisApply optimization recommendations to clusters.
python -m src.cli cluster optimize CLUSTER_NAME [OPTIONS]
Options:
--apply-recommendations Apply all recommended optimizations
--schedule-time TIME Schedule optimization for specific time
--dry-run Show what would be optimized
Examples:
# Show optimization recommendations
python -m src.cli cluster optimize analytics-cluster --dry-run
# Apply optimizations immediately
python -m src.cli cluster optimize analytics-cluster \
--apply-recommendations
# Schedule optimization for low-usage hours
python -m src.cli cluster optimize analytics-cluster \
--apply-recommendations \
--schedule-time "02:00"List all clusters with their status and utilization.
python -m src.cli cluster list [OPTIONS]
Options:
--status STATUS Filter by status: running, terminated, pending
--sort-by FIELD Sort by: name, status, cost, utilization
--show-costs Include cost information
Examples:
# List all clusters
python -m src.cli cluster list
# List running clusters with costs
python -m src.cli cluster list --status running --show-costs
# List sorted by utilization
python -m src.cli cluster list --sort-by utilizationCommands for healthcare data validation and compliance.
Validate healthcare codes (NPI, ICD-10, CPT, HCPCS).
python -m src.cli healthcare validate-codes [OPTIONS]
Options:
--file PATH File containing codes to validate
--code-type TYPE Code type: npi, icd10, cpt, hcpcs
--codes LIST Comma-separated list of codes
--output-file PATH Save validation results
Examples:
# Validate NPI codes from file
python -m src.cli healthcare validate-codes \
--file npi_codes.txt \
--code-type npi
# Validate specific CPT codes
python -m src.cli healthcare validate-codes \
--codes "99213,99214,80053" \
--code-type cpt
# Validate mixed codes with output file
python -m src.cli healthcare validate-codes \
--file healthcare_codes.csv \
--output-file validation_results.jsonCheck member eligibility for specific services.
python -m src.cli healthcare check-eligibility [OPTIONS]
Options:
--member-id TEXT Member ID to check
--service-date DATE Date of service (YYYY-MM-DD)
--provider-npi TEXT Provider NPI
--service-type TYPE Service type: medical, pharmacy, dental
Examples:
# Check member eligibility
python -m src.cli healthcare check-eligibility \
--member-id "M123456789" \
--service-date "2024-01-15" \
--provider-npi "1234567893"
# Check pharmacy eligibility
python -m src.cli healthcare check-eligibility \
--member-id "CA123456789" \
--service-date "2024-01-15" \
--service-type pharmacyRun HIPAA and regulatory compliance checks.
python -m src.cli healthcare compliance-check TABLE_NAME [OPTIONS]
Arguments:
TABLE_NAME Table to check for compliance
Options:
--compliance-type Type: hipaa, state_medicaid, medicare
--detailed-report Generate detailed compliance report
--fix-issues Automatically fix common issues
Examples:
# HIPAA compliance check
python -m src.cli healthcare compliance-check \
healthcare_data.silver.claims \
--compliance-type hipaa \
--detailed-report
# State Medicaid compliance with auto-fix
python -m src.cli healthcare compliance-check \
healthcare_data.silver.ca_medicaid_claims \
--compliance-type state_medicaid \
--fix-issuesGeneral platform utilities and administration.
Show overall platform status and health.
python -m src.cli status [OPTIONS]
Options:
--detailed Show detailed component status
--format FORMAT Output format: table, json, yaml
Examples:
# Basic status
python -m src.cli status
# Detailed status in JSON
python -m src.cli status --detailed --format jsonSample Output:
π₯ Healthcare Data Platform Status
====================================
β
System Health: 98.5%
β
Data Quality: 96.2%
β
Pipelines Running: 12/12
β
Cost Optimization: 12.3% savings
β
Self-Healing Actions: 47 today
π Recent Activity:
β
medicaid_claims pipeline completed (5 min ago)
π§ Auto-scaled cluster down for cost savings (1 hour ago)
β οΈ Schema drift detected and resolved (3 hours ago)
β
Quality alert auto-resolved (6 hours ago)
Start the Streamlit monitoring dashboard.
python -m src.cli dashboard [OPTIONS]
Options:
-p, --port INTEGER Dashboard port (default: 8501)
-h, --host TEXT Dashboard host (default: localhost)
--dev-mode Enable development mode with auto-reload
Examples:
# Start dashboard on default port
python -m src.cli dashboard
# Start on custom port and host
python -m src.cli dashboard --port 8080 --host 0.0.0.0
# Development mode with auto-reload
python -m src.cli dashboard --dev-modeConfiguration management utilities.
python -m src.cli config SUBCOMMAND [OPTIONS]
Subcommands:
validate Validate configuration files
show Show current configuration
encrypt Encrypt sensitive configuration values
template Generate configuration templates
Examples:
# Validate all configuration
python -m src.cli config validate
# Show current configuration (sensitive values masked)
python -m src.cli config show
# Generate configuration template for new deployment
python -m src.cli config template --environment productionBackup platform configuration and metadata.
python -m src.cli backup [OPTIONS]
Options:
--output-dir PATH Backup output directory
--include-data Include processed data in backup
--compress Compress backup files
Examples:
# Basic configuration backup
python -m src.cli backup
# Full backup with data compression
python -m src.cli backup \
--include-data \
--compress \
--output-dir /backups/healthcare-platform/Chain multiple commands for complex workflows:
# Create pipeline, run it, and monitor quality
python -m src.cli pipeline create test_pipeline \
--source-path s3://test-data/ \
--target-table test.bronze.data && \
python -m src.cli pipeline run test_pipeline --wait && \
python -m src.cli quality check test.bronze.dataUse different configurations for different environments:
# Development environment
python -m src.cli --config config/dev.yaml pipeline list
# Production environment
python -m src.cli --config config/prod.yaml status --detailedUse JSON output for scripting and automation:
# Get pipeline status as JSON for processing
python -m src.cli pipeline list --format json | jq '.pipelines[] | select(.status=="failed")'
# Monitor quality scores
python -m src.cli quality trends --format json --days 7 | \
jq '.overall_quality_score' | \
awk '$1 < 85 { print "Quality alert: " $1 }'Perform batch operations on multiple resources:
# Stop all failed pipelines
for pipeline in $(python -m src.cli pipeline list --status failed --format json | jq -r '.pipelines[].name'); do
python -m src.cli pipeline stop "$pipeline" --force
done
# Run quality checks on all silver tables
for table in $(python -m src.cli catalog list --type tables --pattern "*.silver.*" --format json | jq -r '.tables[].name'); do
python -m src.cli quality check "$table" --assessment-type quick
doneCommand not found:
# Ensure you're in the correct environment
source venv/bin/activate
python -m src.cli --helpConfiguration errors:
# Validate configuration
python -m src.cli config validate
# Check specific configuration sections
python -m src.cli config show --section databricksConnection issues:
# Test Databricks connectivity
python -c "
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
print(w.current_user.me())
"Enable verbose logging for troubleshooting:
# Verbose output
python -m src.cli --verbose pipeline run test_pipeline
# Debug with dry-run
python -m src.cli --dry-run --verbose pipeline create test \
--source-path s3://test/ \
--target-table test.tableEach command and subcommand has built-in help:
# General help
python -m src.cli --help
# Command-specific help
python -m src.cli pipeline --help
python -m src.cli pipeline create --help
# Show all available options
python -m src.cli pipeline create --help | grep -E "^\s*--"For more examples and use cases, see the CLI Tutorial and Best Practices guides.