feat: MLflow tracing with async Stop hook (opt-in)#139
Closed
dgokeeffe wants to merge 1 commit intodatasciencemonkey:mainfrom
Closed
feat: MLflow tracing with async Stop hook (opt-in)#139dgokeeffe wants to merge 1 commit intodatasciencemonkey:mainfrom
dgokeeffe wants to merge 1 commit intodatasciencemonkey:mainfrom
Conversation
Enables opt-in MLflow tracing for Claude Code sessions. Key design:
- setup_mlflow.py registers a Stop hook when MLFLOW_CLAUDE_TRACING_ENABLED=true
- Hook delegates to mlflow-trace-stop.sh which backgrounds the handler via
`nohup timeout 30 ... & disown`, returning in <1s so the Stop chain
(brain-push, /til, etc.) is not blocked
- Handler receives hook-event JSON via a temp file captured synchronously
before backgrounding (naive nohup would redirect stdin to /dev/null)
- Hard 30s ceiling on the backgrounded flush to prevent stuck handlers
leaking memory/CPU across sessions
- Pins mlflow-skinny and mlflow-tracing to 3.11.1 to match the Apps
runtime image (version mismatch caused silent import failures)
Tracing is disabled by default — set MLFLOW_CLAUDE_TRACING_ENABLED=true
in app.yaml to opt in.
Tests: TestStopHook and TestSettingsMerge updated to match shell-script
delegation model; TestAppOwnerExport mocks app_state.set_app_owner
to avoid ~/.coda writes in unit test context.
Co-authored-by: Isaac
Contributor
Author
|
Migrating to the new repo home. This work continues at databrickslabs/coding-agents-databricks-apps#15 (also resolves the MLflow tracing bug filed at databrickslabs/coding-agents-databricks-apps#9). Closing this stale duplicate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MLFLOW_CLAUDE_TRACING_ENABLED=trueinapp.yamlmlflow-trace-stop.sh, which backgrounds the handler vianohup timeout 30 … & disown— returns in <1s so the Stop chain (brain-push, /til, etc.) is not blockednohupwould redirect stdin to /dev/null)mlflow-skinnyandmlflow-tracingto 3.11.1 to match the Apps runtime image (version mismatch caused silent import failures)Tracing is disabled by default — no behaviour change for existing deployments.
Test plan
uv run pytest tests/test_mlflow_tracing.py— all pass locallyMLFLOW_CLAUDE_TRACING_ENABLED=true, run a session, confirm trace appears in MLflow experiment~/.claude/settings.jsonThis pull request and its description were written by Isaac.