Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
API breakage checks (Griffe)Result: Failed Log excerpt (first 1000 characters) |
Agent server REST API breakage checks (OpenAPI)Result: Passed |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good Taste - Clean Release Bump
Taste Rating: 🟢 Good taste - Mechanical version bump, exactly what it should be.
Review Summary
The version changes are clean and correct:
- All 4 packages consistently bumped from 1.11.5 → 1.12.0
- Workflow default updated to v1.12.0
- Lock file properly synced
- ✅ Deprecation check passes (0 deadline violations)
Process Notes
The release checklist has incomplete items:
- Integration tests
- Behavior tests
- Example tests
- Draft release creation
- Evaluation on OpenHands Index
These should be completed before merge per the standard release workflow.
Verdict
✅ Version changes are correct - No technical issues with the version bumps themselves.
⏸️ Hold for checklist completion - Follow the release process checklist before merging.
Key Insight: This is a textbook mechanical release bump with zero technical issues. Just complete the process checklist and ship it.
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 24.2s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 18.9s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.6s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 40.7s | $0.03 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 16.3s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 31.3s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 33.2s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 10.1s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 20.1s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 23s | $0.18 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 15.7s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 17.5s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.1s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 16.0s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.8s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 14.6s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 19s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 5s | $0.22 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.29 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 19.3s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 28.8s | $0.04 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 37.0s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.6s | $0.00 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 4m 19s | $0.30 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 19.2s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 3m 11s | $0.27 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 12.2s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.2s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.05 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 9.9s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ❌ FAIL Exit code 1 |
11.1s | -- |
| 01_standalone_sdk/41_task_tool_set.py | ❌ FAIL Exit code 1 |
4.4s | -- |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 59.4s | $0.06 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 43.2s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 21s | $0.03 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 48s | $0.00 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 26s | $0.02 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 24.0s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.5s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 22.8s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 43s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 13.5s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 20.5s | $0.03 |
❌ Some tests failed
Total: 43 | Passed: 38 | Failed: 5 | Total Cost: $2.02
Failed examples:
- examples/01_standalone_sdk/25_agent_delegation.py: Timed out after 600 seconds
- examples/01_standalone_sdk/38_browser_session_recording.py: Timed out after 600 seconds
- examples/01_standalone_sdk/40_acp_agent_example.py: Exit code 1
- examples/01_standalone_sdk/41_task_tool_set.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 24.8s | $0.02 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 26.1s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.9s | $0.00 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 36.3s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 16.8s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 28.1s | $0.01 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 30.4s | $0.02 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 10.5s | $0.00 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 19.3s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 3m 28s | $0.20 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 15.6s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 26.4s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 16.6s | $0.01 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.9s | $0.01 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 11.9s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 13.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 57.8s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 12s | $0.33 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.27 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 15.7s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 33.3s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 38.9s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.7s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 5m 44s | $0.40 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 15.3s | $0.01 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 4m 2s | $0.37 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 9.5s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.3s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.6s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ❌ FAIL Exit code 1 |
10.9s | -- |
| 01_standalone_sdk/41_task_tool_set.py | ❌ FAIL Exit code 1 |
4.5s | -- |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 50.5s | $0.04 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 14s | $0.08 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 27s | $0.03 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 57.2s | $0.11 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 54.6s | $0.02 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 30.3s | $0.01 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 7s | $0.02 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 27.8s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 17s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 12.1s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 22.5s | $0.04 |
❌ Some tests failed
Total: 43 | Passed: 39 | Failed: 4 | Total Cost: $2.37
Failed examples:
- examples/01_standalone_sdk/25_agent_delegation.py: Timed out after 600 seconds
- examples/01_standalone_sdk/38_browser_session_recording.py: Timed out after 600 seconds
- examples/01_standalone_sdk/40_acp_agent_example.py: Exit code 1
- examples/01_standalone_sdk/41_task_tool_set.py: Exit code 1
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 96.7% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 96.7% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
|
@OpenHands check the log #2302 (comment) and tell me why claude sonnet 4.6 is failing |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
|
@OpenHands make a new issue to decide the wanted behavior for the public API checks (griffe and oasdiff) workflows. Look at this PR comments / CI to start with:
Make an issue describing the current execution and compare it with documented expectations. Give links to source code on github. |
|
I'm on it! enyst can track my progress at all-hands.dev |
🧪 Integration Tests ResultsOverall Success Rate: 96.7% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
SummaryYou asked me to check the CI log from PR #2302 and explain why Claude Sonnet 4.6 is failing. Here's the answer: Claude Sonnet 4.6 failed the Specifically, the agent:
The test's This is a non-deterministic agent behavior issue, not a code bug. Claude Sonnet 4.6 simply didn't think to set the execute permission. It may pass on a re-run. All other models (DeepSeek, Gemini, Kimi) passed at 100%. No code changes were made — this was purely an investigative/diagnostic task. |
|
Summary of work completed:
No repository code changes were made, so no git diff or push is applicable. |
Failed Examples
Root Causes1. Browser launch failure (
|
|
How about this one, not sure if it's still your @simonrosenberg version basically or maybe you rewrote it @VascoSch92 , just wondering if you guys have an idea what's up here? This used to work I think, it's number 25
Xingyao's agent excluded all 4, it just looks like this shouldn't maybe happen |
|
@OpenHands please revert 8a21c3f - most of the issues should've been fixed in the main branch that's just merged in. Can you re-tag test-examples and monitor the result |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
…r external infra" This reverts commit 8a21c3f.
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 21.6s | $0.02 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 19.5s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.6s | $0.00 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 37.9s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 15.9s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 30.2s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 31.3s | $0.02 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 11.2s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 20.3s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Exit code 1 |
1m 6s | -- |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.7s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 29.1s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.5s | $0.00 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 14.8s | $0.01 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 9.6s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 13.8s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 19s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 54s | $0.38 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 17s | $0.09 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 19.0s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 26.8s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 32.0s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 10.3s | $0.00 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 9m 39s | $0.72 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 19.4s | $0.01 |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 12.1s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 4.8s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | $0.04 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.3s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ❌ FAIL Exit code 1 |
9.4s | -- |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 28.4s | $0.02 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 19s | $0.06 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 9s | $0.06 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 36s | $0.02 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 17s | $0.00 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 18s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 32.0s | $0.02 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 23s | $0.02 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 20.0s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 1m 49s | $0.08 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 12.9s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 23.8s | $0.04 |
❌ Some tests failed
Total: 43 | Passed: 39 | Failed: 4 | Total Cost: $1.91
Failed examples:
- examples/01_standalone_sdk/14_context_condenser.py: Exit code 1
- examples/01_standalone_sdk/34_critic_example.py: Timed out after 600 seconds
- examples/01_standalone_sdk/38_browser_session_recording.py: Timed out after 600 seconds
- examples/01_standalone_sdk/40_acp_agent_example.py: Exit code 1
|
@xingyaoww About the API breakage job failing: I think it's not blocking release, but it's there to warn us and make us look (though we could make it mandatory anytime if it behaves as wanted) These are the current failures:
IMHO, I think maybe we could "sin" this time and let these pass without deprecation? 🤔 |
|
@OpenHands please check https://github.com/OpenHands/software-agent-sdk/actions/runs/22713720951/job/65857938260?pr=2302 and #2302 (comment) Tell me which commit introduced these and which commit removed these? Tell me the release version before and after these commits. I want to assess the impact of ignoring these errors (eg not marking them deprecated since they are very short lived) |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
|
No code changes were made — this was a pure investigation/analysis task. Here's the summary: API Breakage Analysis for PR #2302 (Release v1.12.0)The CI check Timeline
Key Findings
Removal Reasons
Impact AssessmentVery low. These were essentially an accidental over-export — added as part of setting up the API breakage CI for |
|
@enyst agree that it is not release blocking :) Just need to get example tests fixed and we'd be good to go |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 93.3% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
Failed Tests:
|
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 26.6s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 20.6s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 13.2s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 34.3s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 17.8s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 29.1s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 36.4s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 10.1s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 22.4s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 3m 2s | $0.21 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 17.9s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 23.7s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 14.5s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 16.9s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 12.3s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 18.1s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 2m 40s | $0.03 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 2m 49s | $0.18 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 55.7s | $0.06 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 18.8s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 31.6s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 50.0s | $0.04 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 10.4s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 5m 24s | $0.36 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 21.1s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 1m 39s | $0.13 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 10.3s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store.py | ✅ PASS | 5.1s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 56.1s | $0.02 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 11.8s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 34.7s | $0.14 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 32.0s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 27s | $0.09 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 53.0s | $0.04 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 37s | $0.04 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 54.9s | $0.00 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 29s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 36.3s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 14s | $0.01 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 26.0s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 58.7s | $0.05 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 14.1s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 23.6s | $0.03 |
✅ All tests passed!
Total: 43 | Passed: 43 | Failed: 0 | Total Cost: $1.91
|
Evaluation Triggered
|
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>

Release v1.12.0
This PR prepares the release for version 1.12.0.
Release Checklist
integration-test)behavior-test)test-examples)v1.12.0rel-1.12.0Next Steps
Once the release is published on GitHub, the PyPI packages will be automatically published via the
pypi-release.ymlworkflow.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:f03e068-pythonRun
All tags pushed for this build
About Multi-Architecture Support
f03e068-python) is a multi-arch manifest supporting both amd64 and arm64f03e068-python-amd64) are also available if needed