Enable Daytona sandbox-local usage proxy by bingran-you · Pull Request #587 · benchflow-ai/benchflow

bingran-you · 2026-05-30T06:38:07Z

Summary

Start a per-Daytona-sandbox usage proxy by default so provider token/cost telemetry works without external tunnel parameters.
Remove the external tunnel/fixed-port usage proxy CLI/config path from the PR Add Daytona usage tracking proxy support #568-era implementation.
Reuse host-side raw capture parsing for sandbox-local proxy captures and update docs/tests.

Testing

uv run --extra dev ruff check .
uv run --extra dev ty check src/
uv run --extra dev python -m pytest tests/

devin-ai-integration

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

bingran-you · 2026-05-30T07:23:19Z

Addressed the Daytona usage-proxy review follow-up in 741eb4f:

Replaced the nohup ... & startup with a Node launcher that spawn()s the proxy with detached: true, ignored stdin, log-file stdout/stderr, and unref(), so the Daytona exec command can return immediately.
Restored the remote Bedrock-direct guard: Daytona/OpenHands native Bedrock now skips usage proxy in auto and fails fast in required until native Bedrock metering exists.
Removed agent-name CLI args from the long-lived proxy argv, tightened the disconnect pkill -f pattern, and added a sandbox proxy pid liveness check before runtime reuse.
Fixed the telemetry smoke test trial_name -> rollout_name, set usage_tracking="required", and added an explicit Daytona smoke variant that asserts provider usage and non-empty llm_trajectory.jsonl.
Split sandbox-local coverage into tests/test_sandbox_usage_proxy.py; tests/test_usage_proxy.py is back under 1k lines.

Local checks:

uv run --extra dev ruff format --check src tests
uv run --extra dev ruff check .
uv run --extra dev ty check src/
uv run --extra dev python -m pytest tests/ -> 2482 passed, 12 skipped, 1 deselected

I could not run the live Daytona smoke in this workspace because DAYTONA_API_KEY and provider credentials are not present in the environment.

bingran-you · 2026-05-30T22:29:05Z

Follow-up addressed in 6b2656a:\n\n- extract_usage() now only reports provider_response when captured exchanges contain real provider usage/token fields; 200/400 captures without usage stay unavailable.\n- Daytona usage_tracking=auto now degrades to unchanged provider env if proxy startup fails; required still fails fast and cleans partial runtime.\n- Raw capture parsing now prefers response content-type, preserving JSON error bodies even when the request had stream=true.\n- Sandbox proxy shutdown imports best-effort captures but always terminates the proxy and removes its runtime dir.\n- Node proxy capture now redacts sensitive request/response headers plus sensitive query parameters, and a local Node integration test covers forwarding, redaction, SSE, JSON errors, and gzip.\n\nLocal verification on latest diff:\n- uv sync --extra dev --extra sandbox-daytona --locked\n- uv run ruff check .\n- uv run ruff format --check src tests\n- uv run ty check src/\n- uv run python -m pytest tests/ -> 2490 passed, 12 skipped, 1 deselected\n\nLive Daytona SkillsBench evidence:\n- lake-warming-attribution with Daytona + OpenHands + DeepSeek-compatible OpenAI endpoint: rollout reached verifier with error=null, usage_tracking.status=enabled, usage_source=provider_response, 27 LLM exchanges, token totals/timing populated.\n- Repeated with Daytona + OpenHands + GLM-compatible OpenAI endpoint: error=null, usage_tracking.status=enabled, usage_source=provider_response, 20 LLM exchanges, token totals/timing populated.\n\nBoth live runs had reward 0.0 because the model output did not satisfy the task verifier, but the Daytona sandbox-local usage proxy contract, trajectory capture, token aggregation, redaction, and timing metadata all completed end-to-end.

bingran-you added 2 commits May 29, 2026 23:37

Enable Daytona sandbox-local usage proxy

83ed2dd

Format usage proxy tests

48f2a9d

devin-ai-integration Bot reviewed May 30, 2026

View reviewed changes

Comment thread src/benchflow/providers/sandbox_usage_proxy.py

bingran-you added 2 commits May 29, 2026 23:49

Harden sandbox proxy state polling

b3c4f7f

Address Daytona usage proxy review

741eb4f

Fix Daytona usage proxy edge cases

6b2656a

bingran-you added 4 commits May 30, 2026 15:30

Stabilize sandbox proxy integration test

c58bfd7

Capture provider usage in Daytona runs

72ce374

Support direct OpenAI-compatible provider keys

279bb8a

Refactor Daytona usage proxy runtime wiring

4760039

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Daytona sandbox-local usage proxy#587

Enable Daytona sandbox-local usage proxy#587
bingran-you wants to merge 9 commits into
benchflow-ai:mainfrom
bingran-you:bry/daytona-sandbox-usage-proxy

bingran-you commented May 30, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

bingran-you commented May 30, 2026

Uh oh!

bingran-you commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bingran-you commented May 30, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bingran-you commented May 30, 2026

Uh oh!

bingran-you commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bingran-you commented May 30, 2026 •

edited by devin-ai-integration Bot

Loading