You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Addressed the Daytona usage-proxy review follow-up in 741eb4f:
Replaced the nohup ... & startup with a Node launcher that spawn()s the proxy with detached: true, ignored stdin, log-file stdout/stderr, and unref(), so the Daytona exec command can return immediately.
Restored the remote Bedrock-direct guard: Daytona/OpenHands native Bedrock now skips usage proxy in auto and fails fast in required until native Bedrock metering exists.
Removed agent-name CLI args from the long-lived proxy argv, tightened the disconnect pkill -f pattern, and added a sandbox proxy pid liveness check before runtime reuse.
Fixed the telemetry smoke test trial_name -> rollout_name, set usage_tracking="required", and added an explicit Daytona smoke variant that asserts provider usage and non-empty llm_trajectory.jsonl.
Split sandbox-local coverage into tests/test_sandbox_usage_proxy.py; tests/test_usage_proxy.py is back under 1k lines.
Local checks:
uv run --extra dev ruff format --check src tests
uv run --extra dev ruff check .
uv run --extra dev ty check src/
uv run --extra dev python -m pytest tests/ -> 2482 passed, 12 skipped, 1 deselected
I could not run the live Daytona smoke in this workspace because DAYTONA_API_KEY and provider credentials are not present in the environment.
Follow-up addressed in 6b2656a:\n\n- extract_usage() now only reports provider_response when captured exchanges contain real provider usage/token fields; 200/400 captures without usage stay unavailable.\n- Daytona usage_tracking=auto now degrades to unchanged provider env if proxy startup fails; required still fails fast and cleans partial runtime.\n- Raw capture parsing now prefers response content-type, preserving JSON error bodies even when the request had stream=true.\n- Sandbox proxy shutdown imports best-effort captures but always terminates the proxy and removes its runtime dir.\n- Node proxy capture now redacts sensitive request/response headers plus sensitive query parameters, and a local Node integration test covers forwarding, redaction, SSE, JSON errors, and gzip.\n\nLocal verification on latest diff:\n- uv sync --extra dev --extra sandbox-daytona --locked\n- uv run ruff check .\n- uv run ruff format --check src tests\n- uv run ty check src/\n- uv run python -m pytest tests/ -> 2490 passed, 12 skipped, 1 deselected\n\nLive Daytona SkillsBench evidence:\n- lake-warming-attribution with Daytona + OpenHands + DeepSeek-compatible OpenAI endpoint: rollout reached verifier with error=null, usage_tracking.status=enabled, usage_source=provider_response, 27 LLM exchanges, token totals/timing populated.\n- Repeated with Daytona + OpenHands + GLM-compatible OpenAI endpoint: error=null, usage_tracking.status=enabled, usage_source=provider_response, 20 LLM exchanges, token totals/timing populated.\n\nBoth live runs had reward 0.0 because the model output did not satisfy the task verifier, but the Daytona sandbox-local usage proxy contract, trajectory capture, token aggregation, redaction, and timing metadata all completed end-to-end.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing
uv run --extra dev ruff check .uv run --extra dev ty check src/uv run --extra dev python -m pytest tests/