From 0dfbc0cdab3b9c898e792688db511bd1a1bec5ec Mon Sep 17 00:00:00 2001 From: openhands Date: Tue, 24 Feb 2026 08:52:17 +0000 Subject: [PATCH 1/6] docs: override llms files to exclude legacy V0 pages Co-authored-by: openhands --- llms-full.txt | 33560 +++++++++++++++++++++++++++++++ llms.txt | 173 + scripts/generate-llms-files.py | 184 + 3 files changed, 33917 insertions(+) create mode 100644 llms-full.txt create mode 100644 llms.txt create mode 100755 scripts/generate-llms-files.py diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 00000000..27215470 --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,33560 @@ +# OpenHands Docs + +> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded. + +# About OpenHands +Source: https://docs.openhands.dev/openhands/usage/about + +## Research Strategy + +Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves: + +- **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling. +- **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization. +- **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our agents. + +## Default Agent + +Our default Agent is currently the [CodeActAgent](./agents), which is capable of generating code and handling files. + +## Built With + +OpenHands is built using a combination of powerful frameworks and libraries, providing a robust foundation for its +development. Here are the key technologies used in the project: + +![FastAPI](https://img.shields.io/badge/FastAPI-black?style=for-the-badge) ![uvicorn](https://img.shields.io/badge/uvicorn-black?style=for-the-badge) ![LiteLLM](https://img.shields.io/badge/LiteLLM-black?style=for-the-badge) ![Docker](https://img.shields.io/badge/Docker-black?style=for-the-badge) ![Ruff](https://img.shields.io/badge/Ruff-black?style=for-the-badge) ![MyPy](https://img.shields.io/badge/MyPy-black?style=for-the-badge) ![LlamaIndex](https://img.shields.io/badge/LlamaIndex-black?style=for-the-badge) ![React](https://img.shields.io/badge/React-black?style=for-the-badge) + +Please note that the selection of these technologies is in progress, and additional technologies may be added or +existing ones may be removed as the project evolves. We strive to adopt the most suitable and efficient tools to +enhance the capabilities of OpenHands. + +## License + +Distributed under MIT [License](https://github.com/OpenHands/OpenHands/blob/main/LICENSE). + + +# Configuration Options +Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options + + + This page documents the current V1 configuration model. + + Legacy config.toml / “runtime” configuration docs have been moved + to the Legacy (V0) section of the Web tab. + + +## Where configuration lives in V1 + +Most user-facing configuration is done via the **Settings** UI in the Web app +(LLM provider/model, integrations, MCP, secrets, etc.). + +For self-hosted deployments and advanced workflows, OpenHands also supports +environment-variable configuration. + +## Common V1 environment variables + +These are some commonly used variables in V1 deployments: + +- **LLM credentials** + - LLM_API_KEY + - LLM_MODEL + +- **Persistence** + - OH_PERSISTENCE_DIR: where OpenHands stores local state (defaults to + ~/.openhands). + +- **Public URL (optional)** + - OH_WEB_URL: the externally reachable URL of your OpenHands instance + (used for callbacks in some deployments). + +- **Sandbox workspace mounting** + - SANDBOX_VOLUMES: mount host directories into the sandbox (see + [Docker Sandbox](/openhands/usage/sandboxes/docker)). + +- **Sandbox image selection** + - AGENT_SERVER_IMAGE_REPOSITORY + - AGENT_SERVER_IMAGE_TAG + + +## Sandbox provider selection + +Some deployments still use the legacy RUNTIME environment variable to +choose which sandbox provider to use: + +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote + +See [Sandboxes overview](/openhands/usage/sandboxes/overview) for details. + +## Need legacy options? + +If you are looking for the old config.toml reference or V0 “runtime” +providers, see: + +- Web → Legacy (V0) → V0 Configuration Options +- Web → Legacy (V0) → V0 Runtime Configuration + + +# Custom Sandbox +Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide + + + These settings are only available in [Local GUI](/openhands/usage/run-openhands/local-setup). OpenHands Cloud uses managed sandbox environments. + + +The sandbox is where the agent performs its tasks. Instead of running commands directly on your computer +(which could be risky), the agent runs them inside a Docker container. + +The default OpenHands sandbox (`python-nodejs:python3.12-nodejs22` +from [nikolaik/python-nodejs](https://hub.docker.com/r/nikolaik/python-nodejs)) comes with some packages installed such +as python and Node.js but may need other software installed by default. + +You have two options for customization: + +- Use an existing image with the required software. +- Create your own custom Docker image. + +If you choose the first option, you can skip the `Create Your Docker Image` section. + +## Create Your Docker Image + +To create a custom Docker image, it must be Debian based. + +For example, if you want OpenHands to have `ruby` installed, you could create a `Dockerfile` with the following content: + +```dockerfile +FROM nikolaik/python-nodejs:python3.12-nodejs22 + +# Install required packages +RUN apt-get update && apt-get install -y ruby +``` + +Or you could use a Ruby-specific base image: + +```dockerfile +FROM ruby:latest +``` + +Save this file in a folder. Then, build your Docker image (e.g., named custom-image) by navigating to the folder in +the terminal and running:: +```bash +docker build -t custom-image . +``` + +This will produce a new image called `custom-image`, which will be available in Docker. + +## Using the Docker Command + +When running OpenHands using [the docker command](/openhands/usage/run-openhands/local-setup#start-the-app), replace +the `AGENT_SERVER_IMAGE_REPOSITORY` and `AGENT_SERVER_IMAGE_TAG` environment variables with `-e SANDBOX_BASE_CONTAINER_IMAGE=`: + +```commandline +docker run -it --rm --pull=always \ + -e SANDBOX_BASE_CONTAINER_IMAGE=custom-image \ + ... +``` + +## Using the Development Workflow + +### Setup + +First, ensure you can run OpenHands by following the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md). + +### Specify the Base Sandbox Image + +In the `config.toml` file within the OpenHands directory, set the `base_container_image` to the image you want to use. +This can be an image you’ve already pulled or one you’ve built: + +```bash +[core] +... +[sandbox] +base_container_image="custom-image" +``` + +### Additional Configuration Options + +The `config.toml` file supports several other options for customizing your sandbox: + +```toml +[core] +# Install additional dependencies when the runtime is built +# Can contain any valid shell commands +# If you need the path to the Python interpreter in any of these commands, you can use the $OH_INTERPRETER_PATH variable +runtime_extra_deps = """ +pip install numpy pandas +apt-get update && apt-get install -y ffmpeg +""" + +# Set environment variables for the runtime +# Useful for configuration that needs to be available at runtime +runtime_startup_env_vars = { DATABASE_URL = "postgresql://user:pass@localhost/db" } + +# Specify platform for multi-architecture builds (e.g., "linux/amd64" or "linux/arm64") +platform = "linux/amd64" +``` + +### Run + +Run OpenHands by running ```make run``` in the top level directory. + + +# Search Engine Setup +Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup + +## Setting Up Search Engine in OpenHands + +OpenHands can be configured to use [Tavily](https://tavily.com/) as a search engine, which allows the agent to +search the web for information when needed. This capability enhances the agent's ability to provide up-to-date +information and solve problems that require external knowledge. + + + Tavily is configured as a search engine by default in OpenHands Cloud! + + +### Getting a Tavily API Key + +To use the search functionality in OpenHands, you'll need to obtain a Tavily API key: + +1. Visit [Tavily's website](https://tavily.com/) and sign up for an account. +2. Navigate to the API section in your dashboard. +3. Generate a new API key. +4. Copy the API key (it should start with `tvly-`). + +### Configuring Search in OpenHands + +Once you have your Tavily API key, you can configure OpenHands to use it: + +#### In the OpenHands UI + +1. Open OpenHands and navigate to the `Settings > LLM` page. +2. Enter your Tavily API key (starting with `tvly-`) in the `Search API Key (Tavily)` field. +3. Click `Save` to apply the changes. + + + The search API key field is optional. If you don't provide a key, the search functionality will not be available to + the agent. + + +#### Using Configuration Files + +If you're running OpenHands in headless mode or via CLI, you can configure the search API key in your configuration file: + +```toml +# In your OpenHands config file +[core] +search_api_key = "tvly-your-api-key-here" +``` + +### How Search Works in OpenHands + +When the search engine is configured: + +- The agent can decide to search the web when it needs external information. +- Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which + includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map). +- Results are returned and incorporated into the agent's context. +- The agent can use this information to provide more accurate and up-to-date responses. + +### Limitations + +- Search results depend on Tavily's coverage and freshness. +- Usage may be subject to Tavily's rate limits and pricing tiers. +- The agent will only search when it determines that external information is needed. + +### Troubleshooting + +If you encounter issues with the search functionality: + +- Verify that your API key is correct and active. +- Check that your API key starts with `tvly-`. +- Ensure you have an active internet connection. +- Check Tavily's status page for any service disruptions. + + +# Main Agent and Capabilities +Source: https://docs.openhands.dev/openhands/usage/agents + +## CodeActAgent + +### Description + +This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a +unified **code** action space for both _simplicity_ and _performance_. + +The conceptual idea is illustrated below. At each turn, the agent can: + +1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. +2. **CodeAct**: Choose to perform the task by executing code + +- Execute any valid Linux `bash` command +- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. + +![image](https://github.com/OpenHands/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) + +### Demo + +https://github.com/OpenHands/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac + +_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_. + + +# REST API (V1) +Source: https://docs.openhands.dev/openhands/usage/api/v1 + + + OpenHands is in a transition period: legacy (V0) endpoints still exist alongside + the new /api/v1 endpoints. + + If you need the legacy OpenAPI reference, see the Legacy (V0) section in the Web tab. + + +## Overview + +OpenHands V1 REST endpoints are mounted under: + +- /api/v1 + +These endpoints back the current Web UI and are intended for newer integrations. + +## Key resources + +The V1 API is organized around a few core concepts: + +- **App conversations**: create/list conversations and access conversation metadata. + - POST /api/v1/app-conversations + - GET /api/v1/app-conversations + +- **Sandboxes**: list/start/pause/resume the execution environments that power conversations. + - GET /api/v1/sandboxes/search + - POST /api/v1/sandboxes + - POST /api/v1/sandboxes/{id}/pause + - POST /api/v1/sandboxes/{id}/resume + +- **Sandbox specs**: list the available sandbox “templates” (e.g., Docker image presets). + - GET /api/v1/sandbox-specs/search + + +# Backend Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/backend + +This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents. + +# System overview + +```mermaid +flowchart LR + U["User"] --> FE["Frontend (SPA)"] + FE -- "HTTP/WS" --> BE["OpenHands Backend"] + BE --> ES["EventStream"] + BE --> ST["Storage"] + BE --> RT["Runtime Interface"] + BE --> LLM["LLM Providers"] + + subgraph Runtime + direction TB + RT --> DRT["Docker Runtime"] + RT --> LRT["Local Runtime"] + RT --> RRT["Remote Runtime"] + DRT --> AES["Action Execution Server"] + LRT --> AES + RRT --> AES + AES --> Bash["Bash Session"] + AES --> Jupyter["Jupyter Plugin"] + AES --> Browser["BrowserEnv"] + end +``` + +This Overview is simplified to show the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below. + +# Backend Architecture + + +```mermaid +classDiagram + class Agent { + <> + +sandbox_plugins: list[PluginRequirement] + } + class CodeActAgent { + +tools + } + Agent <|-- CodeActAgent + + class EventStream + class Observation + class Action + Action --> Observation + Agent --> EventStream + + class Runtime { + +connect() + +send_action_for_execution() + } + class ActionExecutionClient { + +_send_action_server_request() + } + class DockerRuntime + class LocalRuntime + class RemoteRuntime + Runtime <|-- ActionExecutionClient + ActionExecutionClient <|-- DockerRuntime + ActionExecutionClient <|-- LocalRuntime + ActionExecutionClient <|-- RemoteRuntime + + class ActionExecutionServer { + +/execute_action + +/alive + } + class BashSession + class JupyterPlugin + class BrowserEnv + ActionExecutionServer --> BashSession + ActionExecutionServer --> JupyterPlugin + ActionExecutionServer --> BrowserEnv + + Agent --> Runtime + Runtime ..> ActionExecutionServer : REST +``` + +
+ Updating this Diagram +
+ We maintain architecture diagrams inline with Mermaid in this MDX. + + Guidance: + - Edit the Mermaid blocks directly (flowchart/classDiagram). + - Quote labels and edge text for GitHub preview compatibility. + - Keep relationships concise and reflect stable abstractions (agents, runtime client/server, plugins). + - Verify accuracy against code: + - openhands/runtime/impl/action_execution/action_execution_client.py + - openhands/runtime/impl/docker/docker_runtime.py + - openhands/runtime/impl/local/local_runtime.py + - openhands/runtime/action_execution_server.py + - openhands/runtime/plugins/* + - Build docs locally or view on GitHub to confirm diagrams render. + +
+
+ + +# Runtime Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/runtime + +The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action. +It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system. + +## Why do we need a sandboxed runtime? + +OpenHands needs to execute arbitrary code in a secure, isolated environment for several reasons: + +1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources +2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues +3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system +4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system +5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable + +## How does the Runtime work? + +The OpenHands Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works: + +```mermaid +graph TD + A[User-provided Custom Docker Image] --> B[OpenHands Backend] + B -->|Builds| C[OH Runtime Image] + C -->|Launches| D[Action Executor] + D -->|Initializes| E[Browser] + D -->|Initializes| F[Bash Shell] + D -->|Initializes| G[Plugins] + G -->|Initializes| L[Jupyter Server] + + B -->|Spawn| H[Agent] + B -->|Spawn| I[EventStream] + I <--->|Execute Action to + Get Observation + via REST API + | D + + H -->|Generate Action| I + I -->|Obtain Observation| H + + subgraph "Docker Container" + D + E + F + G + L + end +``` + +1. User Input: The user provides a custom base Docker image +2. Image Building: OpenHands builds a new Docker image (the "OH runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client" +3. Container Launch: When OpenHands starts, it launches a Docker container using the OH runtime image +4. Action Execution Server Initialization: The action execution server initializes an `ActionExecutor` inside the container, setting up necessary components like a bash shell and loading any specified plugins +5. Communication: The OpenHands backend (client: `openhands/runtime/impl/action_execution/action_execution_client.py`; runtimes: `openhands/runtime/impl/docker/docker_runtime.py`, `openhands/runtime/impl/local/local_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations +6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations +7. Observation Return: The action execution server sends execution results back to the OpenHands backend as observations + +The role of the client: + +- It acts as an intermediary between the OpenHands backend and the sandboxed environment +- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container +- It manages the state of the sandboxed environment, including the current working directory and loaded plugins +- It formats and returns observations to the backend, ensuring a consistent interface for processing results + +## How OpenHands builds and maintains OH Runtime images + +OpenHands' approach to building and managing runtime images ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments. + +Check out the [relevant code](https://github.com/OpenHands/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details. + +### Image Tagging System + +OpenHands uses a three-tag system for its runtime images to balance reproducibility with flexibility. +The tags are: + +- **Versioned Tag**: `oh_v{openhands_version}_{base_image}` (e.g.: `oh_v0.9.9_nikolaik_s_python-nodejs_t_python3.12-nodejs22`) +- **Lock Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}` (e.g.: `oh_v0.9.9_1234567890abcdef`) +- **Source Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}_{16_digit_source_hash}` + (e.g.: `oh_v0.9.9_1234567890abcdef_1234567890abcdef`) + +#### Source Tag - Most Specific + +This is the first 16 digits of the MD5 of the directory hash for the source directory. This gives a hash +for only the openhands source + +#### Lock Tag + +This hash is built from the first 16 digits of the MD5 of: + +- The name of the base image upon which the image was built (e.g.: `nikolaik/python-nodejs:python3.12-nodejs22`) +- The content of the `pyproject.toml` included in the image. +- The content of the `poetry.lock` included in the image. + +This effectively gives a hash for the dependencies of Openhands independent of the source code. + +#### Versioned Tag - Most Generic + +This tag is a concatenation of openhands version and the base image name (transformed to fit in tag standard). + +#### Build Process + +When generating an image... + +- **No re-build**: OpenHands first checks whether an image with the same **most specific source tag** exists. If there is such an image, + no build is performed - the existing image is used. +- **Fastest re-build**: OpenHands next checks whether an image with the **generic lock tag** exists. If there is such an image, + OpenHands builds a new image based upon it, bypassing all installation steps (like `poetry install` and + `apt-get`) except a final operation to copy the current source code. The new image is tagged with a + **source** tag only. +- **Ok-ish re-build**: If neither a **source** nor **lock** tag exists, an image will be built based upon the **versioned** tag image. + In versioned tag image, most dependencies should already been installed hence saving time. +- **Slowest re-build**: If all of the three tags don't exists, a brand new image is built based upon the base + image (Which is a slower operation). This new image is tagged with all the **source**, **lock**, and **versioned** tags. + +This tagging approach allows OpenHands to efficiently manage both development and production environments. + +1. Identical source code and Dockerfile always produce the same image (via hash-based tags) +2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images) +3. The **lock** tag (e.g., `runtime:oh_v0.9.3_1234567890abcdef`) always points to the latest build for a particular base image, dependency, and OpenHands version combination + +## Volume mounts: named volumes and overlay + +OpenHands supports both bind mounts and Docker named volumes in SandboxConfig.volumes: + +- Bind mount: "/abs/host/path:/container/path[:mode]" +- Named volume: "volume:``:/container/path[:mode]" or any non-absolute host spec treated as a named volume + +Overlay mode (copy-on-write layer) is supported for bind mounts by appending ":overlay" to the mode (e.g., ":ro,overlay"). +To enable overlay COW, set SANDBOX_VOLUME_OVERLAYS to a writable host directory; per-container upper/work dirs are created under it. If SANDBOX_VOLUME_OVERLAYS is unset, overlay mounts are skipped. + +Implementation references: +- openhands/runtime/impl/docker/docker_runtime.py (named volumes in _build_docker_run_args; overlay mounts in _process_overlay_mounts) +- openhands/core/config/sandbox_config.py (volumes field) + + +## Runtime Plugin System + +The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the action execution server starts up inside the runtime. + +## Ports and URLs + +- Host port allocation uses file-locked ranges for stability and concurrency: + - Main runtime port: find_available_port_with_lock on configured range + - VSCode port: SandboxConfig.sandbox.vscode_port if provided, else find_available_port_with_lock in VSCODE_PORT_RANGE + - App ports: two additional ranges for plugin/web apps +- DOCKER_HOST_ADDR (if set) adjusts how URLs are formed for LocalRuntime/Docker environments. +- VSCode URL is exposed with a connection token from the action execution server endpoint /vscode/connection_token and rendered as: + - Docker/Local: `http://localhost:{port}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` + - RemoteRuntime: `scheme://vscode-{host}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` + +References: +- openhands/runtime/impl/docker/docker_runtime.py (port ranges, locking, DOCKER_HOST_ADDR, vscode_url) +- openhands/runtime/impl/local/local_runtime.py (vscode_url factory) +- openhands/runtime/impl/remote/remote_runtime.py (vscode_url mapping) +- openhands/runtime/action_execution_server.py (/vscode/connection_token) + + +Examples: +- Jupyter: openhands/runtime/plugins/jupyter/__init__.py (JupyterPlugin, Kernel Gateway) +- VS Code: openhands/runtime/plugins/vscode/* (VSCodePlugin, exposes tokenized URL) +- Agent Skills: openhands/runtime/plugins/agent_skills/* + +Key aspects of the plugin system: + +1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class +2. Plugin Registration: Available plugins are registered in `openhands/runtime/plugins/__init__.py` via `ALL_PLUGINS` +3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime +4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions +5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping + + +# OpenHands Cloud +Source: https://docs.openhands.dev/openhands/usage/cli/cloud + +## Overview + +The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: + +- Authenticate with your OpenHands Cloud account +- Create new cloud conversations +- Use cloud resources without the web interface + +## Authentication + +### Login + +Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: + +```bash +openhands login +``` + +This opens a browser window for authentication. After successful login, your credentials are stored locally. + +#### Custom Server URL + +For self-hosted or enterprise deployments: + +```bash +openhands login --server-url https://your-openhands-server.com +``` + +You can also set the server URL via environment variable: + +```bash +export OPENHANDS_CLOUD_URL=https://your-openhands-server.com +openhands login +``` + +### Logout + +Log out from OpenHands Cloud: + +```bash +# Log out from all servers +openhands logout + +# Log out from a specific server +openhands logout --server-url https://app.all-hands.dev +``` + +## Creating Cloud Conversations + +Create a new conversation in OpenHands Cloud: + +```bash +# With a task +openhands cloud -t "Review the codebase and suggest improvements" + +# From a file +openhands cloud -f task.txt +``` + +### Options + +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +### Examples + +```bash +# Create a cloud conversation with a task +openhands cloud -t "Fix the authentication bug in login.py" + +# Create from a task file +openhands cloud -f requirements.txt + +# Use a custom server +openhands cloud --server-url https://custom.server.com -t "Add unit tests" + +# Combine with environment variable +export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev +openhands cloud -t "Refactor the database module" +``` + +## Workflow + +A typical workflow with OpenHands Cloud: + +1. **Login once**: + ```bash + openhands login + ``` + +2. **Create conversations as needed**: + ```bash + openhands cloud -t "Your task here" + ``` + +3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | + +## Cloud vs Local + +| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | +|---------|---------------------------|---------------------| +| Compute | Cloud-hosted | Your machine | +| Persistence | Cloud storage | Local files | +| Collaboration | Share via link | Local only | +| Setup | Just login | Configure LLM & runtime | +| Cost | Subscription/usage-based | Your LLM API costs | + + +Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. + + +## See Also + +- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation +- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide +- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access + + +# Command Reference +Source: https://docs.openhands.dev/openhands/usage/cli/command-reference + +## Basic Usage + +```bash +openhands [OPTIONS] [COMMAND] +``` + +## Global Options + +| Option | Description | +|--------|-------------| +| `-v, --version` | Show version number and exit | +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | +| `--last` | Resume the most recent conversation (use with `--resume`) | +| `--exp` | Use textual-based UI (now default, kept for compatibility) | +| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | +| `--json` | Enable JSONL output (requires `--headless`) | +| `--always-approve` | Auto-approve all actions without confirmation | +| `--llm-approve` | Use LLM-based security analyzer for action approval | +| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | +| `--exit-without-confirmation` | Exit without showing confirmation dialog | + +## Subcommands + +### serve + +Launch the OpenHands GUI server using Docker. + +```bash +openhands serve [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | + +**Examples:** +```bash +openhands serve +openhands serve --mount-cwd +openhands serve --gpu +openhands serve --mount-cwd --gpu +``` + +### web + +Launch the CLI as a web application accessible via browser. + +```bash +openhands web [OPTIONS] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host to bind the web server to | +| `--port` | `12000` | Port to bind the web server to | +| `--debug` | `false` | Enable debug mode | + +**Examples:** +```bash +openhands web +openhands web --port 8080 +openhands web --host 127.0.0.1 --port 3000 +openhands web --debug +``` + +### cloud + +Create a new conversation in OpenHands Cloud. + +```bash +openhands cloud [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +**Examples:** +```bash +openhands cloud -t "Fix the bug" +openhands cloud -f task.txt +openhands cloud --server-url https://custom.server.com -t "Task" +``` + +### acp + +Start the Agent Client Protocol server for IDE integrations. + +```bash +openhands acp [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | + +**Examples:** +```bash +openhands acp +openhands acp --llm-approve +openhands acp --resume abc123def456 +openhands acp --resume --last +``` + +### mcp + +Manage Model Context Protocol server configurations. + +```bash +openhands mcp [OPTIONS] +``` + +#### mcp add + +Add a new MCP server. + +```bash +openhands mcp add --transport [OPTIONS] [-- args...] +``` + +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | + +**Examples:** +```bash +openhands mcp add my-api --transport http https://api.example.com/mcp +openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com +openhands mcp add local --transport stdio python -- -m my_server +openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server +``` + +#### mcp list + +List all configured MCP servers. + +```bash +openhands mcp list +``` + +#### mcp get + +Get details for a specific MCP server. + +```bash +openhands mcp get +``` + +#### mcp remove + +Remove an MCP server configuration. + +```bash +openhands mcp remove +``` + +#### mcp enable + +Enable an MCP server. + +```bash +openhands mcp enable +``` + +#### mcp disable + +Disable an MCP server. + +```bash +openhands mcp disable +``` + +### login + +Authenticate with OpenHands Cloud. + +```bash +openhands login [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +**Examples:** +```bash +openhands login +openhands login --server-url https://enterprise.openhands.dev +``` + +### logout + +Log out from OpenHands Cloud. + +```bash +openhands logout [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | + +**Examples:** +```bash +openhands logout +openhands logout --server-url https://app.all-hands.dev +``` + +## Interactive Commands + +Commands available inside the CLI (prefix with `/`): + +| Command | Description | +|---------|-------------| +| `/help` | Display available commands | +| `/new` | Start a new conversation | +| `/history` | Toggle conversation history | +| `/confirm` | Configure confirmation settings | +| `/condense` | Condense conversation history | +| `/skills` | View loaded skills, hooks, and MCPs | +| `/feedback` | Send anonymous feedback about CLI | +| `/exit` | Exit the application | + +## Command Palette + +Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: + +| Option | Description | +|--------|-------------| +| **History** | Toggle conversation history panel | +| **Keys** | Show keyboard shortcuts | +| **MCP** | View MCP server configurations | +| **Maximize** | Maximize/restore window | +| **Plan** | View agent plan | +| **Quit** | Quit the application | +| **Screenshot** | Take a screenshot | +| **Settings** | Configure LLM model, API keys, and other settings | +| **Theme** | Toggle color theme | + +## Changing Your Model + +### Via Settings UI + +1. Press `Ctrl+P` to open the command palette +2. Select **Settings** +3. Choose your LLM provider and model +4. Save changes (no restart required) + +### Via Configuration File + +Edit `~/.openhands/agent_settings.json` and change the `model` field: + +```json +{ + "llm": { + "model": "claude-sonnet-4-5-20250929", + "api_key": "...", + "base_url": "..." + } +} +``` + +### Via Environment Variables + +Temporarily override your model without changing saved configuration: + +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-api-key" +openhands --override-with-envs +``` + +Changes made with `--override-with-envs` are not persisted. + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `LLM_API_KEY` | API key for your LLM provider | +| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | +| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | +| `OPENHANDS_CLOUD_URL` | Default cloud server URL | +| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | + +## Exit Codes + +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | Error or task failed | +| `2` | Invalid arguments | + +## Configuration Files + +| File | Purpose | +|------|---------| +| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | +| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | +| `~/.openhands/mcp.json` | MCP server configurations | +| `~/.openhands/conversations/` | Conversation history | + +## See Also + +- [Installation](/openhands/usage/cli/installation) - Install the CLI +- [Quick Start](/openhands/usage/cli/quick-start) - Get started +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers + + +# Critic (Experimental) +Source: https://docs.openhands.dev/openhands/usage/cli/critic + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + + +## Overview + +If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. + +For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). + + +## What is the Critic? + +The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: + +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion + + + +![Critic output in CLI](./screenshots/critic-cli-output.png) + +## Pricing + +The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. + +## Disabling the Critic + +If you prefer not to use the critic feature, you can disable it in your settings: + +1. Open the command palette with `Ctrl+P` +2. Select **Settings** +3. Navigate to the **CLI Settings** tab +4. Toggle off **Enable Critic (Experimental)** + +![Critic settings in CLI](./screenshots/critic-cli-settings.png) + + +# GUI Server +Source: https://docs.openhands.dev/openhands/usage/cli/gui-server + +## Overview + +The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. + +```bash +openhands serve +``` + + +This requires Docker to be installed and running on your system. + + +## Prerequisites + +- [Docker](https://docs.docker.com/get-docker/) installed and running +- Sufficient disk space for Docker images (~2GB) + +## Basic Usage + +```bash +# Launch the GUI server +openhands serve + +# The server will be available at http://localhost:3000 +``` + +The command will: +1. Check Docker requirements +2. Pull the required Docker images +3. Start the OpenHands GUI server +4. Display the URL to access the interface + +## Options + +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | + +## Mounting Your Workspace + +To give OpenHands access to your local files: + +```bash +# Mount current directory +openhands serve --mount-cwd +``` + +This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. + + +Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. + + +## GPU Support + +For tasks that benefit from GPU acceleration: + +```bash +openhands serve --gpu +``` + +This requires: +- NVIDIA GPU +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed +- Docker configured for GPU support + +## Examples + +```bash +# Basic GUI server +openhands serve + +# Mount current project and enable GPU +cd /path/to/your/project +openhands serve --mount-cwd --gpu +``` + +## How It Works + +The `openhands serve` command: + +1. **Pulls Docker images**: Downloads the OpenHands runtime and application images +2. **Starts containers**: Runs the OpenHands server in a Docker container +3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` +4. **Shares settings**: Uses your `~/.openhands` directory for configuration + +## Stopping the Server + +Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. + +## Comparison: GUI Server vs Web Interface + +| Feature | `openhands serve` | `openhands web` | +|---------|-------------------|-----------------| +| Interface | Full web GUI | Terminal UI in browser | +| Dependencies | Docker required | None | +| Resources | Full container (~2GB) | Lightweight | +| Features | All GUI features | CLI features only | +| Best for | Rich GUI experience | Quick terminal access | + +## Troubleshooting + +### Docker Not Running + +``` +❌ Docker daemon is not running. +Please start Docker and try again. +``` + +**Solution**: Start Docker Desktop or the Docker daemon. + +### Permission Denied + +``` +Got permission denied while trying to connect to the Docker daemon socket +``` + +**Solution**: Add your user to the docker group: +```bash +sudo usermod -aG docker $USER +# Then log out and back in +``` + +### Port Already in Use + +If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. + +## See Also + +- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide +- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access +- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details + + +# Headless Mode +Source: https://docs.openhands.dev/openhands/usage/cli/headless + +## Overview + +Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: +- CI/CD pipelines +- Automated scripting +- Integration with other tools +- Batch processing + +```bash +openhands --headless -t "Your task here" +``` + +## Requirements + +- Must specify a task with `--task` or `--file` + + +**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. + + +## Basic Usage + +```bash +# Run a task in headless mode +openhands --headless -t "Write a Python script that prints hello world" + +# Load task from a file +openhands --headless -f task.txt +``` + +## JSON Output Mode + +The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: + +```bash +openhands --headless --json -t "Create a simple Flask app" +``` + +Each line is a JSON object representing an agent event: + +```json +{"type": "action", "action": "write", "path": "app.py", ...} +{"type": "observation", "content": "File created successfully", ...} +{"type": "action", "action": "run", "command": "python app.py", ...} +``` + +### Use Cases for JSON Output + +- **CI/CD pipelines**: Parse events to determine success/failure +- **Automated processing**: Feed output to other tools +- **Logging**: Capture structured logs for analysis +- **Integration**: Connect OpenHands with other systems + +### Example: Capture Output to File + +```bash +openhands --headless --json -t "Add unit tests" > output.jsonl +``` + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options + + +# JetBrains IDEs +Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains + +[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. + +## Supported IDEs + +This guide applies to all JetBrains IDEs: + +- IntelliJ IDEA +- PyCharm +- WebStorm +- GoLand +- Rider +- CLion +- PhpStorm +- RubyMine +- DataGrip +- And other JetBrains IDEs + +## Prerequisites + +Before configuring JetBrains IDEs: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **JetBrains IDE version 25.3 or later** +4. **JetBrains AI Assistant enabled** in your IDE + + +JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. + + +## Configuration + +### Step 1: Create the ACP Configuration File + +Create or edit the file `$HOME/.jetbrains/acp.json`: + + + + ```bash + mkdir -p ~/.jetbrains + nano ~/.jetbrains/acp.json + ``` + + + Create the file at `C:\Users\\.jetbrains\acp.json` + + + +### Step 2: Add the Configuration + +Add the following JSON: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + } + } +} +``` + +### Step 3: Use OpenHands in Your IDE + +Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. + +## Advanced Configuration + +### LLM-Approve Mode + +For automatic LLM-based approval: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--llm-approve"], + "env": {} + } + } +} +``` + +### Auto-Approve Mode + +For automatic approval of all actions (use with caution): + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + } + } +} +``` + +### Resume a Conversation + +Resume a specific conversation: + +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "openhands", + "args": ["acp", "--resume", "abc123def456"], + "env": {} + } + } +} +``` + +Resume the latest conversation: + +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +### Multiple Configurations + +Add multiple configurations for different use cases: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +### Environment Variables + +Pass environment variables to the agent: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": { + "LLM_API_KEY": "your-api-key" + } + } + } +} +``` + +## Troubleshooting + +### "Agent not found" or "Command failed" + +1. Verify OpenHands CLI is installed: + ```bash + openhands --version + ``` + +2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) + +### "AI Assistant not available" + +1. Ensure you have JetBrains IDE version 25.3 or later +2. Enable AI Assistant: `Settings > Plugins > AI Assistant` +3. Restart the IDE after enabling + +### Agent doesn't respond + +1. Check your LLM settings: + ```bash + openhands + # Use /settings to configure + ``` + +2. Test ACP mode in terminal: + ```bash + openhands acp + # Should start without errors + ``` + +### Configuration not applied + +1. Verify the config file location: `~/.jetbrains/acp.json` +2. Validate JSON syntax (no trailing commas, proper quotes) +3. Restart your JetBrains IDE + +### Finding Your Conversation ID + +To resume conversations, first find the ID: + +```bash +openhands --resume +``` + +This displays recent conversations with their IDs: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py +-------------------------------------------------------------------------------- +``` + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + + +# IDE Integration Overview +Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview + + +IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). + + + +**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. + + +## What is the Agent Client Protocol (ACP)? + +The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. + +## Supported IDEs + +| IDE | Support Level | Setup Guide | +|-----|---------------|-------------| +| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | +| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | +| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | +| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | + +## Prerequisites + +Before using OpenHands with any IDE, you must: + +1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) + +2. **Configure your LLM settings** using the `/settings` command: + ```bash + openhands + # Then use /settings to configure + ``` + +The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. + +## How It Works + +```mermaid +graph LR + IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] + CLI -->|API Calls| LLM[LLM Provider] + CLI -->|Commands| Runtime[Sandbox Runtime] +``` + +1. Your IDE launches `openhands acp` as a subprocess +2. Communication happens via JSON-RPC 2.0 over stdio +3. OpenHands uses your configured LLM and runtime settings +4. Results are displayed in your IDE's interface + +## The ACP Command + +The `openhands acp` command starts OpenHands as an ACP server: + +```bash +# Basic ACP server +openhands acp + +# With LLM-based approval +openhands acp --llm-approve + +# Resume a conversation +openhands acp --resume + +# Resume the latest conversation +openhands acp --resume --last +``` + +### ACP Options + +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | + +## Confirmation Modes + +OpenHands ACP supports three confirmation modes to control how agent actions are approved: + +### Always Ask (Default) + +The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. + +```bash +openhands acp # defaults to always-ask mode +``` + +### Always Approve + +The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. + +```bash +openhands acp --always-approve +``` + +### LLM-Based Approval + +The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. + +```bash +openhands acp --llm-approve +``` + +### Changing Modes During a Session + +You can change the confirmation mode during an active session using slash commands: + +| Command | Description | +|---------|-------------| +| `/confirm always-ask` | Switch to always-ask mode | +| `/confirm always-approve` | Switch to always-approve mode | +| `/confirm llm-approve` | Switch to LLM-based approval mode | +| `/help` | Show all available slash commands | + + +The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. + + +## Choosing an IDE + + + + High-performance editor with native ACP support. Best for speed and simplicity. + + + Universal terminal interface. Works with any terminal, consistent experience. + + + Popular editor with community extension. Great for VS Code users. + + + IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. + + + +## Resuming Conversations in IDEs + +You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: + +```bash +openhands --resume +``` + +This shows your recent conversations: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py + + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service +-------------------------------------------------------------------------------- +``` + +Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. + +## See Also + +- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide + + +# Toad Terminal +Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad + +[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). + +The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. + +![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) + +## Why Toad? + +Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: + +- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything +- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs +- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP + +OpenHands is included as a recommended agent in Toad's agent store. + +## Prerequisites + +Before using Toad with OpenHands: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` + +## Installation + +Install Toad using [uv](https://docs.astral.sh/uv/): + +```bash +uvx batrachian-toad +``` + +For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). + +## Setup + +### Using the Agent Store + +The easiest way to set up OpenHands with Toad: + +1. Launch Toad: `uvx batrachian-toad` +2. Open Toad's agent store +3. Find **OpenHands** in the list of recommended agents +4. Click **Install** to set up OpenHands +5. Select OpenHands and start a conversation + +The install process runs: +```bash +uv tool install openhands --python 3.12 && openhands login +``` + +### Manual Configuration + +You can also launch Toad directly with OpenHands: + +```bash +toad acp "openhands acp" +``` + +## Usage + +### Basic Usage + +```bash +# Launch Toad with OpenHands +toad acp "openhands acp" +``` + +### With Command Line Arguments + +Pass OpenHands CLI flags through Toad: + +```bash +# Use LLM-based approval mode +toad acp "openhands acp --llm-approve" + +# Auto-approve all actions +toad acp "openhands acp --always-approve" +``` + +### Resume a Conversation + +Resume a specific conversation by ID: + +```bash +toad acp "openhands acp --resume abc123def456" +``` + +Resume the most recent conversation: + +```bash +toad acp "openhands acp --resume --last" +``` + + +Find your conversation IDs by running `openhands --resume` in a regular terminal. + + +## Advanced Configuration + +### Combined Options + +```bash +# Resume with LLM approval +toad acp "openhands acp --resume --last --llm-approve" +``` + +### Environment Variables + +Pass environment variables to OpenHands: + +```bash +LLM_API_KEY=your-key toad acp "openhands acp" +``` + +## Troubleshooting + +### "openhands" command not found + +Ensure OpenHands is installed: +```bash +uv tool install openhands --python 3.12 +``` + +Verify it's in your PATH: +```bash +which openhands +``` + +### Agent doesn't respond + +1. Check your LLM settings: `openhands` then `/settings` +2. Verify your API key is valid +3. Check network connectivity to your LLM provider + +### Conversation not persisting + +Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + + +# VS Code +Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode + +[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. + + +VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. + + +## Prerequisites + +Before configuring VS Code: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) + +## Installation + +### Step 1: Install the Extension + +1. Open VS Code +2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) +3. Search for **"VSCode ACP"** +4. Click **Install** + +Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). + +### Step 2: Connect to OpenHands + +1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) +2. Click **Connect** to start a session +3. Select **OpenHands** from the agent dropdown +4. Start chatting with OpenHands! + +## How It Works + +The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. + +The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. + +## Verification + +Ensure OpenHands is discoverable: + +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` + +If the command is not found, install OpenHands CLI: +```bash +uv tool install openhands --python 3.12 +``` + +## Advanced Usage + +### Custom Arguments + +The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. + +### Resume Conversations + +To resume a conversation, you may need to: + +1. Find your conversation ID: `openhands --resume` +2. Configure the extension to use custom arguments (if supported) +3. Or use the terminal directly: `openhands acp --resume ` + + +The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. + + +## Troubleshooting + +### OpenHands Not Appearing in Dropdown + +1. Verify OpenHands is installed and in PATH: + ```bash + which openhands + openhands --version + ``` + +2. Restart VS Code after installing OpenHands + +3. Check if the extension recognizes agents: + - Look for any error messages in the extension panel + - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) + +### Connection Failed + +1. Ensure your LLM settings are configured: + ```bash + openhands + # Use /settings to configure + ``` + +2. Check that `openhands acp` works in terminal: + ```bash + openhands acp + # Should start without errors (Ctrl+C to exit) + ``` + +### Extension Not Working + +1. Update to the latest version of the extension +2. Check for VS Code updates +3. Report issues on the [extension's GitHub](https://github.com/omercnet) + +## Limitations + +Since this is a community extension: + +- Feature availability may vary +- Support depends on the extension maintainer +- Not all OpenHands CLI flags may be accessible through the UI + +For the most control over OpenHands, consider using: +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage +- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal + + +# Zed IDE +Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed + +[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. + + + +## Prerequisites + +Before configuring Zed, ensure you have: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **Zed editor** - Download from [zed.dev](https://zed.dev/) + +## Configuration + +### Step 1: Open Agent Settings + +1. Open Zed +2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette +3. Search for `agent: open settings` + +![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) + +### Step 2: Add OpenHands as an Agent + +1. On the right side, click `+ Add Agent` +2. Select `Add Custom Agent` + +![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) + +### Step 3: Configure the Agent + +Add the following configuration to the `agent_servers` field: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": [ + "openhands", + "acp" + ], + "env": {} + } + } +} +``` + +### Step 4: Save and Use + +1. Save the settings file +2. You can now use OpenHands within Zed! + +![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) + +## Advanced Configuration + +### LLM-Approve Mode + +For automatic LLM-based approval of actions: + +```json +{ + "agent_servers": { + "OpenHands (LLM Approve)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--llm-approve" + ], + "env": {} + } + } +} +``` + +### Resume a Specific Conversation + +To resume a previous conversation: + +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "abc123def456" + ], + "env": {} + } + } +} +``` + +Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. + +### Resume Latest Conversation + +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "--last" + ], + "env": {} + } + } +} +``` + +### Multiple Configurations + +You can add multiple OpenHands configurations for different use cases: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": ["openhands", "acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "uvx", + "args": ["openhands", "acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "uvx", + "args": ["openhands", "acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +## Troubleshooting + +### Accessing Debug Logs + +If you encounter issues: + +1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) +2. Type and select `acp debug log` +3. Review the logs for errors or warnings +4. Restart the conversation to reload connections after configuration changes + +### Common Issues + +**"openhands" command not found** + +Ensure OpenHands is installed and in your PATH: +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` + +If using `uvx`, ensure uv is installed: +```bash +uv --version +``` + +**Agent doesn't start** + +1. Check that your LLM settings are configured: run `openhands` and verify `/settings` +2. Verify the configuration JSON syntax is valid +3. Check the ACP debug logs for detailed errors + +**Conversation doesn't persist** + +Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. + + +After making configuration changes, restart the conversation in Zed to apply them. + + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + + +# Installation +Source: https://docs.openhands.dev/openhands/usage/cli/installation + + +**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. + + +## Installation Methods + + + + Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. + + **Install OpenHands:** + ```bash + uv tool install openhands --python 3.12 + ``` + + **Run OpenHands:** + ```bash + openhands + ``` + + **Upgrade OpenHands:** + ```bash + uv tool upgrade openhands --python 3.12 + ``` + + + Install the OpenHands CLI binary with the install script: + + ```bash + curl -fsSL https://install.openhands.dev/install.sh | sh + ``` + + Then run: + ```bash + openhands + ``` + + + Your system may require you to allow permissions to run the executable. + + + When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple + cannot check it for malicious software." + + 1. Open `System Settings`. + 2. Go to `Privacy & Security`. + 3. Scroll down to `Security` and click `Allow Anyway`. + 4. Rerun the OpenHands CLI. + + ![mac-security](/openhands/static/img/cli-security-mac.png) + + + + + + 1. Set the following environment variable in your terminal: + - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) + + 2. Ensure you have configured your settings before starting: + - Set up `~/.openhands/settings.json` with your LLM configuration + + 3. Run the following command: + + ```bash + docker run -it \ + --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e SANDBOX_USER_ID=$(id -u) \ + -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/root/.openhands \ + --add-host host.docker.internal:host-gateway \ + --name openhands-cli-$(date +%Y%m%d%H%M%S) \ + python:3.12-slim \ + bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" + ``` + + The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's + permissions. This prevents the agent from creating root-owned files in the mounted workspace. + + + +## First Run + +The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved +for future sessions in `~/.openhands/settings.json`. + +The conversation history will be saved in `~/.openhands/conversations`. + + +If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the +configuration format has changed. + + +## Next Steps + +- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers + + +# MCP Servers +Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers + +## Overview + +[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. + +The CLI provides two ways to manage MCP servers: +1. **CLI commands** (`openhands mcp`) - Manage servers from the command line +2. **Interactive command** (`/mcp`) - View server status within a conversation + + +If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. + + +## MCP Commands + +### List Servers + +View all configured MCP servers: + +```bash +openhands mcp list +``` + +### Get Server Details + +View details for a specific server: + +```bash +openhands mcp get +``` + +### Remove a Server + +Remove a server configuration: + +```bash +openhands mcp remove +``` + +### Enable/Disable Servers + +Control which servers are active: + +```bash +# Enable a server +openhands mcp enable + +# Disable a server +openhands mcp disable +``` + +## Adding Servers + +### HTTP/SSE Servers + +Add remote servers with HTTP or SSE transport: + +```bash +openhands mcp add --transport http +``` + +#### With Bearer Token Authentication + +```bash +openhands mcp add my-api --transport http \ + --header "Authorization: Bearer your-token" \ + https://api.example.com/mcp +``` + +#### With API Key Authentication + +```bash +openhands mcp add weather-api --transport http \ + --header "X-API-Key: your-api-key" \ + https://weather.api.com +``` + +#### With Multiple Headers + +```bash +openhands mcp add secure-api --transport http \ + --header "Authorization: Bearer token123" \ + --header "X-Client-ID: client456" \ + https://api.example.com +``` + +#### With OAuth Authentication + +```bash +openhands mcp add notion-server --transport http \ + --auth oauth \ + https://mcp.notion.com/mcp +``` + +### Stdio Servers + +Add local servers that communicate via stdio: + +```bash +openhands mcp add --transport stdio -- [args...] +``` + +#### Basic Example + +```bash +openhands mcp add local-server --transport stdio \ + python -- -m my_mcp_server +``` + +#### With Environment Variables + +```bash +openhands mcp add local-server --transport stdio \ + --env "API_KEY=secret123" \ + --env "DATABASE_URL=postgresql://localhost/mydb" \ + python -- -m my_mcp_server --config config.json +``` + +#### Add in Disabled State + +```bash +openhands mcp add my-server --transport stdio --disabled \ + node -- my-server.js +``` + +### Command Reference + +```bash +openhands mcp add --transport [options] [-- args...] +``` + +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | + +## Example: Web Search with Tavily + +Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): + +```bash +openhands mcp add tavily --transport stdio \ + npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" +``` + +## Manual Configuration + +You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. + +### Configuration Format + +The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): + +```json +{ + "mcpServers": { + "server-name": { + "command": "command-to-run", + "args": ["arg1", "arg2"], + "env": { + "ENV_VAR": "value" + } + } + } +} +``` + +### Example Configuration + +```json +{ + "mcpServers": { + "tavily-remote": { + "command": "npx", + "args": [ + "-y", + "mcp-remote", + "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" + ] + }, + "local-tools": { + "command": "python", + "args": ["-m", "my_mcp_tools"], + "env": { + "DEBUG": "true" + } + } + } +} +``` + +## Interactive `/mcp` Command + +Within an OpenHands conversation, use `/mcp` to view server status: + +- **View active servers**: Shows which MCP servers are currently active in the conversation +- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts + + +The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. + + +## Workflow + +1. **Add servers** using `openhands mcp add` +2. **Start a conversation** with `openhands` +3. **Check status** with `/mcp` inside the conversation +4. **Use the tools** provided by your MCP servers + +The agent will automatically have access to tools provided by enabled MCP servers. + +## Troubleshooting + +### Server Not Appearing + +1. Verify the server is enabled: + ```bash + openhands mcp list + ``` + +2. Check the configuration: + ```bash + openhands mcp get + ``` + +3. Restart the conversation to load new configurations + +### Server Fails to Start + +1. Test the command manually: + ```bash + # For stdio servers + python -m my_mcp_server + + # For HTTP servers, check the URL is reachable + curl https://api.example.com/mcp + ``` + +2. Check environment variables and credentials + +3. Review error messages in the CLI output + +### Configuration File Location + +The MCP configuration is stored at: +- **Config file**: `~/.openhands/mcp.json` + +## See Also + +- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation +- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference + + +# Quick Start +Source: https://docs.openhands.dev/openhands/usage/cli/quick-start + + +**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. + + +## Overview + +The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: + +| Mode | Command | Best For | +|------|---------|----------| +| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | +| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | +| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | +| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | +| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | + + + +## Your First Conversation + +**Set up your account** (first time only): + + + + ```bash + openhands login + ``` + This authenticates with OpenHands Cloud and fetches your settings. + + + The CLI will prompt you to configure your LLM provider and API key on first run. + + + +1. **Start the CLI:** + ```bash + openhands + ``` + +2. **Enter a task:** + ``` + Create a Python script that prints "Hello, World!" + ``` + +3. **Watch OpenHands work:** + The agent will create the file and show you the results. + +## Controls + +Once inside the CLI, use these controls: + +| Control | Description | +|---------|-------------| +| `Ctrl+P` | Open command palette (access Settings, MCP status) | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | + +## Starting with a Task + +You can start the CLI with an initial task: + +```bash +# Start with a task +openhands -t "Fix the bug in auth.py" + +# Start with a task from a file +openhands -f task.txt +``` + +## Resuming Conversations + +Resume a previous conversation: + +```bash +# List recent conversations and select one +openhands --resume + +# Resume the most recent conversation +openhands --resume --last + +# Resume a specific conversation by ID +openhands --resume abc123def456 +``` + +For more details, see [Resume Conversations](/openhands/usage/cli/resume). + +## Next Steps + + + + Learn about the interactive terminal interface + + + Use OpenHands in Zed, VS Code, or JetBrains + + + Automate tasks with scripting + + + Add tools via Model Context Protocol + + + + +# Resume Conversations +Source: https://docs.openhands.dev/openhands/usage/cli/resume + +## Overview + +OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. + +## Listing Previous Conversations + +To see a list of your recent conversations, run: + +```bash +openhands --resume +``` + +This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py + + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service + + 3. mno345pqr678 (3 days ago) + Refactor the database connection module +-------------------------------------------------------------------------------- +To resume a conversation, use: openhands --resume +``` + +## Resuming a Specific Conversation + +To resume a specific conversation, use the `--resume` flag with the conversation ID: + +```bash +openhands --resume +``` + +For example: + +```bash +openhands --resume abc123def456 +``` + +## Resuming the Latest Conversation + +To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: + +```bash +openhands --resume --last +``` + +This automatically finds and resumes the most recent conversation. + +## How It Works + +When you resume a conversation: + +1. OpenHands loads the full conversation history from disk +2. The agent has access to all previous context, including: + - Your previous messages and requests + - The agent's responses and actions + - Any files that were created or modified +3. You can continue the conversation as if you never left + + +The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. + + +## Resuming in Different Modes + +### Terminal Mode + +```bash +openhands --resume abc123def456 +openhands --resume --last +``` + +### ACP Mode (IDEs) + +```bash +openhands acp --resume abc123def456 +openhands acp --resume --last +``` + +For IDE-specific configurations, see: +- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) +- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) +- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) + +### With Confirmation Modes + +Combine `--resume` with confirmation mode flags: + +```bash +# Resume with LLM-based approval +openhands --resume abc123def456 --llm-approve + +# Resume with auto-approve +openhands --resume --last --always-approve +``` + +## Tips + + +**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. + + + +**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. + + +## Storage Location + +Conversations are stored in: + +``` +~/.openhands/conversations/ +├── abc123def456/ +│ └── conversation.json +├── xyz789ghi012/ +│ └── conversation.json +└── ... +``` + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference + + +# Terminal (CLI) +Source: https://docs.openhands.dev/openhands/usage/cli/terminal + +## Overview + +The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. + +```bash +openhands +``` + +## Features + +- **Real-time interaction**: Type natural language tasks and receive instant feedback +- **Live status monitoring**: Watch the agent's progress as it works +- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more + +## Command Palette + +Press `Ctrl+P` to open the command palette, then select from the dropdown options: + +| Option | Description | +|--------|-------------| +| **Settings** | Open the settings configuration menu | +| **MCP** | View MCP server status | + +## Controls + +| Control | Action | +|---------|--------| +| `Ctrl+P` | Open command palette | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | + +## Starting with a Task + +Start a conversation with an initial task: + +```bash +# Provide a task directly +openhands -t "Create a REST API for user management" + +# Load task from a file +openhands -f requirements.txt +``` + +## Confirmation Modes + +Control how the agent requests approval for actions: + +```bash +# Default: Always ask for confirmation +openhands + +# Auto-approve all actions (use with caution) +openhands --always-approve + +# Use LLM-based security analyzer +openhands --llm-approve +``` + +## Resuming Conversations + +Resume previous conversations: + +```bash +# List recent conversations +openhands --resume + +# Resume the most recent +openhands --resume --last + +# Resume a specific conversation +openhands --resume abc123def456 +``` + +For more details, see [Resume Conversations](/openhands/usage/cli/resume). + +## Tips + + +Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. + + + +Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. + + +## See Also + +- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation + + +# Web Interface +Source: https://docs.openhands.dev/openhands/usage/cli/web-interface + +## Overview + +The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: +- Access the CLI remotely +- Share your terminal session +- Use the CLI on devices without a full terminal + +```bash +openhands web +``` + + +This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. + + +## Basic Usage + +```bash +# Start on default port (12000) +openhands web + +# Access at http://localhost:12000 +``` + +## Options + +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host address to bind to | +| `--port` | `12000` | Port number to use | +| `--debug` | `false` | Enable debug mode | + +## Examples + +```bash +# Custom port +openhands web --port 8080 + +# Bind to localhost only (more secure) +openhands web --host 127.0.0.1 + +# Enable debug mode +openhands web --debug + +# Full example with custom host and port +openhands web --host 0.0.0.0 --port 3000 +``` + +## Remote Access + +To access the web interface from another machine: + +1. Start with `--host 0.0.0.0` to bind to all interfaces: + ```bash + openhands web --host 0.0.0.0 --port 12000 + ``` + +2. Access from another machine using the host's IP: + ``` + http://:12000 + ``` + + +When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. + + +## Use Cases + +### Development on Remote Servers + +Access OpenHands on a remote development server through your local browser: + +```bash +# On remote server +openhands web --host 0.0.0.0 --port 12000 + +# On local machine, use SSH tunnel +ssh -L 12000:localhost:12000 user@remote-server + +# Access at http://localhost:12000 +``` + +### Sharing Sessions + +Run the web interface on a shared server for team access: + +```bash +openhands web --host 0.0.0.0 --port 8080 +``` + +## Comparison: Web Interface vs GUI Server + +| Feature | `openhands web` | `openhands serve` | +|---------|-----------------|-------------------| +| Interface | Terminal UI in browser | Full web GUI | +| Dependencies | None | Docker required | +| Resources | Lightweight | Full container | +| Best for | Quick access | Rich GUI experience | + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage +- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options + + +# Bitbucket Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). + +## Adding Bitbucket Repository Access + +Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. + +## Working With Bitbucket Repos in Openhands Cloud + +After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## IP Whitelisting + +If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow +OpenHands to access your repositories: + +### Core App IP +``` +34.68.58.200 +``` + +### Runtime IPs +``` +34.10.175.217 +34.136.162.246 +34.45.0.142 +34.28.69.126 +35.224.240.213 +34.70.174.52 +34.42.4.87 +35.222.133.153 +34.29.175.97 +34.60.55.59 +``` + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + + +# Cloud API +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api + +For the available API endpoints, refer to the +[OpenHands API Reference](https://docs.openhands.dev/api-reference). + +## Obtaining an API Key + +To use the OpenHands Cloud API, you'll need to generate an API key: + +1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. +2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. +3. Click `Create API Key`. +4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. +5. Copy the generated API key and store it securely. It will only be shown once. + +## API Usage Example (V1) + +### Starting a New Conversation + +To start a new conversation with OpenHands to perform a task, +make a POST request to the V1 app-conversations endpoint. + + + + ```bash + curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests + + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/v1/app-conversations" + + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + data = { + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + } + + response = requests.post(url, headers=headers, json=data) + result = response.json() + + # The response contains a start task with the conversation ID + conversation_id = result.get("app_conversation_id") or result.get("id") + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") + print(f"Status: {result['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/v1/app-conversations"; + + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const data = { + initial_message: { + content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] + }, + selected_repository: "yourusername/your-repo" + }; + + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); + + const result = await response.json(); + + // The response contains a start task with the conversation ID + const conversationId = result.app_conversation_id || result.id; + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); + console.log(`Status: ${result.status}`); + + return result; + } catch (error) { + console.error("Error starting conversation:", error); + } + } + + startConversation(); + ``` + + + +#### Response + +The API will return a JSON object with details about the conversation start task: + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "status": "WORKING", + "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", + "sandbox_id": "sandbox-abc123", + "created_at": "2025-01-15T10:30:00Z" +} +``` + +The `status` field indicates the current state of the conversation startup process: +- `WORKING` - Initial processing +- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready +- `PREPARING_REPOSITORY` - Cloning and setting up the repository +- `READY` - Conversation is ready to use +- `ERROR` - An error occurred during startup + +You may receive an authentication error if: + +- You provided an invalid API key. +- You provided the wrong repository name. +- You don't have access to the repository. + +### Streaming Conversation Start (Optional) + +For real-time updates during conversation startup, you can use the streaming endpoint: + +```bash +curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Your task description here"}] + }, + "selected_repository": "yourusername/your-repo" + }' +``` + +#### Streaming Response + +The endpoint streams a JSON array incrementally. Each element represents a status update: + +```json +[ + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} +] +``` + +Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. + +## Rate Limits + +If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. +If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). + +--- + +## Migrating from V0 to V1 API + + + The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. + Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. + + +### Key Differences + +| Feature | V0 API | V1 API | +|---------|--------|--------| +| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | +| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | +| Repository field | `repository` | `selected_repository` | +| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | + +### Migration Steps + +1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` + +2. **Update the request body**: + - Change `repository` to `selected_repository` + - Change `initial_user_msg` (string) to `initial_message` (object with content array): + ```json + // V0 format + { "initial_user_msg": "Your message here" } + + // V1 format + { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } + ``` + +3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. + +--- + +## Legacy API (V0) - Deprecated + + + The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. + New integrations should use the V1 API documented above. + + +### Starting a New Conversation (V0) + + + + ```bash + curl -X POST "https://app.all-hands.dev/api/conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests + + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/conversations" + + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + data = { + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + } + + response = requests.post(url, headers=headers, json=data) + conversation = response.json() + + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") + print(f"Status: {conversation['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/conversations"; + + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const data = { + initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + repository: "yourusername/your-repo" + }; + + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); + + const conversation = await response.json(); + + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); + console.log(`Status: ${conversation.status}`); + + return conversation; + } catch (error) { + console.error("Error starting conversation:", error); + } + } + + startConversation(); + ``` + + + +#### Response (V0) + +```json +{ + "status": "ok", + "conversation_id": "abc1234" +} +``` + + +# Cloud UI +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui + +## Landing Page + +The landing page is where you can: + +- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), + [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or + [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. +- Launch an empty conversation using `New Conversation`. +- See `Suggested Tasks` for repositories that OpenHands has access to. +- See your `Recent Conversations`. + +## Settings + +Settings are divided across tabs, with each tab focusing on a specific area of configuration. + +- `User` + - Change your email address. +- `Integrations` + - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). +- `Application` + - Set your preferred language, notifications and other preferences. + - Toggle task suggestions on GitHub. + - Toggle Solvability Analysis. + - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). + - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). +- `LLM` + - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). +- `Billing` + - Add credits for using the OpenHands provider. +- `Secrets` + - [Manage secrets](/openhands/usage/settings/secrets-settings). +- `API Keys` + - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). +- `MCP` + - [Setup an MCP server](/openhands/usage/settings/mcp-settings) + +## Key Features + +For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) +section of the documentation. + +## Next Steps + +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + + +# GitHub Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). + +## Adding GitHub Repository Access + +You can grant OpenHands access to specific GitHub repositories: + +1. Click on `+ Add GitHub Repos` in the repository selection dropdown. +2. Select your organization and choose the specific repositories to grant OpenHands access to. + + - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: + - Actions: Read and write + - Commit statuses: Read and write + - Contents: Read and write + - Issues: Read and write + - Metadata: Read-only + - Pull requests: Read and write + - Webhooks: Read and write + - Workflows: Read and write + - Repository access for a user is granted based on: + - Permission granted for the repository + - User's GitHub permissions (owner/collaborator) + + +3. Click `Install & Authorize`. + +## Modifying Repository Access + +You can modify GitHub repository access at any time by: +- Selecting `+ Add GitHub Repos` in the repository selection dropdown or +- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` + +## Working With GitHub Repos in Openhands Cloud + +Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the +`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click +on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## Working on GitHub Issues and Pull Requests Using Openhands + +To allow OpenHands to work directly from GitHub directly, you must +[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is +given, you can use OpenHands by labeling the issue or by tagging `@openhands`. + +### Working with Issues + +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a pull request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. + +### Working with Pull Requests + +To get OpenHands to work on pull requests, mention `@openhands` in the comments to: +- Ask questions +- Request updates +- Get code explanations + + +The `@openhands` mention functionality in pull requests only works if the pull request is both +*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate +permissions to access both repositories. + + + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + + +# GitLab Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). + +## Adding GitLab Repository Access + +Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. + +## Working With GitLab Repos in Openhands Cloud + +After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## Using Tokens with Reduced Scopes + +OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. +To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, +which will override the default token assigned to the agent. While the high-permission API token is still requested +and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. + +## Working on GitLab Issues and Merge Requests Using Openhands + + +This feature works for personal projects and is available for group projects with a +[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). + +A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into +OpenHands Cloud. + + + +Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. + +### Working with Issues + +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: + +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a merge request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. + +### Working with Merge Requests + +To get OpenHands to work on merge requests, mention `@openhands` in the comments to: + +- Ask questions +- Request updates +- Get code explanations + +## Managing GitLab Webhooks + +The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. + +### Accessing Webhook Management + +The webhook management table is available on the Integrations page when: + +- You are signed in to OpenHands Cloud with a GitLab account +- Your GitLab token is connected + +To access it: + +1. Navigate to the `Settings > Integrations` page +2. Find the GitLab section +3. If your GitLab token is connected, you'll see the webhook management table below the connection status + +### Viewing Webhook Status + +The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. + +- **Resource**: The name and full path of the project or group +- **Type**: Whether it's a "project" or "group" +- **Status**: The current webhook installation status: + - **Installed**: The webhook is active and working + - **Not Installed**: No webhook is currently installed + - **Failed**: A previous installation attempt failed (error details are shown below the status) + +### Reinstalling Webhooks + +If a webhook is not installed or has failed, you can reinstall it: + +1. Find the resource in the webhook management table +2. Click the `Reinstall` button in the Action column +3. The button will show `Reinstalling...` while the operation is in progress +4. Once complete, the status will update to reflect the result + + + To reinstall an existing webhook, you must first delete the current webhook + from the GitLab UI before using the Reinstall button in OpenHands Cloud. + + +**Important behaviors:** + +- The Reinstall button is disabled if the webhook is already installed +- Only one reinstall operation can run at a time +- After a successful reinstall, the button remains disabled to prevent duplicate installations +- If a reinstall fails, the error message is displayed below the status badge +- The resources list automatically refreshes after a reinstall completes + +### Constraints and Limitations + +- The webhook management table only displays resources that are accessible with your connected GitLab token +- Webhook installation requires Admin or Owner permissions on the GitLab project or group + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + + +# Getting Started +Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud + +## Accessing OpenHands Cloud + +OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, +visit [app.all-hands.dev](https://app.all-hands.dev). + +You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: + +1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. +2. Review the permissions requested by OpenHands and authorize the application. + - OpenHands will require certain permissions from your account. To read more about these permissions, + you can click the `Learn more` link on the authorization page. +3. Review and accept the `terms of service` and select `Continue`. + +## Next Steps + +Once you've connected your account, you can: + +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). + + +# Jira Data Center Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration + +# Jira Data Center Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Access User Management** + - Log in to Jira Data Center as administrator + - Go to **Administration** > **User Management** + +2. **Create User** + - Click **Create User** + - Username: `openhands-agent` + - Full Name: `OpenHands Agent` + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Password: Set a secure password + - Click **Create** + +3. **Assign Permissions** + - Add user to appropriate groups + - Ensure access to relevant projects + - Grant necessary project permissions + +### Step 2: Generate API Token + +1. **Personal Access Tokens** + - Log in as the service account + - Go to **Profile** > **Personal Access Tokens** + - Click **Create token** + - Name: `OpenHands Cloud Integration` + - Expiry: Set appropriate expiration (recommend 1 year) + - Click **Create** + - **Important**: Copy and store the token securely + +### Step 3: Configure Webhook + +1. **Create Webhook** + - Go to **Administration** > **System** > **WebHooks** + - Click **Create a WebHook** + - **Name**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` + - Set a suitable webhook secret + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Jira Data Center Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Data Center** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The personal access token from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the host name of your Jira Data Center instance. + +Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 + +Here the workspace name is **jira.all-hands.dev**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Jira Data Center to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) + + + + +# Jira Cloud Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration + +# Jira Cloud Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Navigate to User Management** + - Go to [Atlassian Admin](https://admin.atlassian.com/) + - Select your organization + - Go to **Directory** > **Users** + +2. **Create OpenHands Service Account** + - Click **Service accounts** + - Click **Create a service account** + - Name: `OpenHands Agent` + - Click **Next** + - Select **User** role for Jira app + - Click **Create** + +### Step 2: Generate API Token + +1. **Access Service Account Configuration** + - Locate the created service account from above step and click on it + - Click **Create API token** + - Set the expiry to 365 days (maximum allowed value) + - Click **Next** + - In **Select token scopes** screen, filter by following values + - App: Jira + - Scope type: Classic + - Scope actions: Write, Read + - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes + - Click **Next** + - Review and create API token + - **Important**: Copy and securely store the token immediately + +### Step 3: Configure Webhook + +1. **Navigate to Webhook Settings** + - Go to **Jira Settings** > **System** > **WebHooks** + - Click **Create a WebHook** + +2. **Configure Webhook** + - **Name**: `OpenHands Cloud Integration` + - **Status**: Enabled + - **URL**: `https://app.all-hands.dev/integration/jira/events` + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Jira Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Cloud** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API token from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the host name when accessing a resource in Jira Cloud. + +Eg: https://all-hands.atlassian.net/browse/OH-55 + +Here the workspace name is **all-hands**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Jira Cloud to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/jira-user-link.png) + + + +![workspace-link.png](/openhands/static/img/jira-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/jira-user-unlink.png) + + + +![workspace-link.png](/openhands/static/img/jira-admin-edit.png) + + + + +# Linear Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration + +# Linear Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Access Team Settings** + - Log in to Linear as a team admin + - Go to **Settings** > **Members** + +2. **Invite Service Account** + - Click **Invite members** + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Role: **Member** (with appropriate team access) + - Send invitation + +3. **Complete Setup** + - Accept invitation from the service account email + - Complete profile setup + - Ensure access to relevant teams/workspaces + +### Step 2: Generate API Key + +1. **Access API Settings** + - Log in as the service account + - Go to **Settings** > **Security & access** + +2. **Create Personal API Key** + - Click **Create new key** + - Name: `OpenHands Cloud Integration` + - Scopes: Select the following: + - `Read` - Read access to issues and comments + - `Create comments` - Ability to create or update comments + - Select the teams you want to provide access to, or allow access for all teams you have permissions for + - Click **Create** + - **Important**: Copy and store the API key securely + +### Step 3: Configure Webhook + +1. **Access Webhook Settings** + - Go to **Settings** > **API** > **Webhooks** + - Click **New webhook** + +2. **Configure Webhook** + - **Label**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/linear/events` + - **Resource types**: Select: + - `Comment` - For comment events + - `Issue` - For issue updates (label changes) + - Select the teams you want to provide access to, or allow access for all public teams + - Click **Create webhook** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Linear Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Linear** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API key from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the identifier after the host name when accessing a resource in Linear. + +Eg: https://linear.app/allhands/issue/OH-37 + +Here the workspace name is **allhands**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Linear to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/linear-user-link.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + + + +# Project Management Tool Integrations (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview + +# Project Management Tool Integrations + +## Overview + +OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: +- Adding `@openhands` in ticket comments +- Adding the `openhands` label to tickets + +## Prerequisites + +Integration requires two levels of setup: +1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) +2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace + +### Platform-Specific Setup Guides: +- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) +- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) +- [Linear Integration (Coming soon...)](./linear-integration.md) + +## Usage + +Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: + +### Method 1: Comment Mention +Add a comment to any issue with `@openhands` followed by your task description: +``` +@openhands Please implement the user authentication feature described in this ticket +``` + +### Method 2: Label-based Delegation +Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. + +### Git Repository Detection + +The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: + +#### Specifying the Target Repository + +**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. + +**Supported Repository Formats:** +- Full HTTPS URL: `https://github.com/owner/repository.git` +- GitHub URL without .git: `https://github.com/owner/repository` +- Owner/repository format: `owner/repository` + +#### Platform-Specific Behavior + +**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. + +**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. + +## Troubleshooting + +### Platform Configuration Issues +- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) +- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings +- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions + +### Workspace Integration Issues +- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) +- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first +- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access + +### General Issues +- **Agent not responding**: Check webhook logs in your platform settings and verify service account status +- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access +- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on +- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed + +### Getting Help +For additional support, contact OpenHands Cloud support with: +- Your integration platform (Linear, Jira Cloud, or Jira Data Center) +- Workspace name +- Error logs from webhook/integration attempts +- Screenshots of configuration settings (without sensitive credentials) + + +# Slack Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation + + + + +OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. +While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to +validate critical information independently. + + +## Prerequisites + +- Access to OpenHands Cloud. + +## Installation Steps + + + + + **This step is for Slack admins/owners** + + 1. Make sure you have permissions to install Apps to your workspace. + 2. Click the button below to install OpenHands Slack App Add to Slack + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. + + + + + + **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** + + Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: + 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. + 2. Click `Install OpenHands Slack App`. + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. + + Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. + + + + + + +## Working With the Slack App + +To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. + +Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. + +To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. +You must be the user who started the conversation. + +## Example conversation + +### Start a new conversation, and select repo + +Conversation is started by mentioning `@openhands`. + +![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) + +### See agent response and send follow up messages + +Initial request is followed up by mentioning `@openhands` in a thread reply. + +![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) + +## Pro tip + +You can mention a repo name when starting a new conversation in the following formats + +1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) +2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) + +The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. +If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. + +![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) + + +# Repository Customization +Source: https://docs.openhands.dev/openhands/usage/customization/repository + +## Skills (formerly Microagents) + +Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands +should function. See [Skills Overview](/overview/skills) for more information. + + +## Setup Script +You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. +This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. + +For example: +```bash +#!/bin/bash +export MY_ENV_VAR="my value" +sudo apt-get update +sudo apt-get install -y lsof +cd frontend && npm install ; cd .. +``` + +## Pre-commit Script +You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. +This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. + +For example: +```bash +#!/bin/bash +# Run linting checks +cd frontend && npm run lint +if [ $? -ne 0 ]; then + echo "Frontend linting failed. Please fix the issues before committing." + exit 1 +fi + +# Run tests +cd backend && pytest tests/unit +if [ $? -ne 0 ]; then + echo "Backend tests failed. Please fix the issues before committing." + exit 1 +fi + +exit 0 +``` + + +# Debugging +Source: https://docs.openhands.dev/openhands/usage/developers/debugging + +The following is intended as a primer on debugging OpenHands for Development purposes. + +## Server / VSCode + +The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: + +``` +{ + "version": "0.2.0", + "configurations": [ + { + "name": "OpenHands CLI", + "type": "debugpy", + "request": "launch", + "module": "openhands.cli.main", + "justMyCode": false + }, + { + "name": "OpenHands WebApp", + "type": "debugpy", + "request": "launch", + "module": "uvicorn", + "args": [ + "openhands.server.listen:app", + "--reload", + "--reload-exclude", + "${workspaceFolder}/workspace", + "--port", + "3000" + ], + "justMyCode": false + } + ] +} +``` + +More specific debugging configurations which include more parameters may be specified: + +``` + ... + { + "name": "Debug CodeAct", + "type": "debugpy", + "request": "launch", + "module": "openhands.core.main", + "args": [ + "-t", + "Ask me what your task is.", + "-d", + "${workspaceFolder}/workspace", + "-c", + "CodeActAgent", + "-l", + "llm.o1", + "-n", + "prompts" + ], + "justMyCode": false + } + ... +``` + +Values in the snippet above can be updated such that: + + * *t*: the task + * *d*: the openhands workspace directory + * *c*: the agent + * *l*: the LLM config (pre-defined in config.toml) + * *n*: session name (e.g. eventstream name) + + +# Development Overview +Source: https://docs.openhands.dev/openhands/usage/developers/development-overview + +## Core Documentation + +### Project Fundamentals +- **Main Project Overview** (`/README.md`) + The primary entry point for understanding OpenHands, including features and basic setup instructions. + +- **Development Guide** (`/Development.md`) + Guide for developers working on OpenHands, including setup, requirements, and development workflows. + +- **Contributing Guidelines** (`/CONTRIBUTING.md`) + Essential information for contributors, covering code style, PR process, and contribution workflows. + +### Component Documentation + +#### Frontend +- **Frontend Application** (`/frontend/README.md`) + Complete guide for setting up and developing the React-based frontend application. + +#### Backend +- **Backend Implementation** (`/openhands/README.md`) + Detailed documentation of the Python backend implementation and architecture. + +- **Server Documentation** (`/openhands/server/README.md`) + Server implementation details, API documentation, and service architecture. + +- **Runtime Environment** (`/openhands/runtime/README.md`) + Documentation covering the runtime environment, execution model, and runtime configurations. + +#### Infrastructure +- **Container Documentation** (`/containers/README.md`) + Information about Docker containers, deployment strategies, and container management. + +### Testing and Evaluation +- **Unit Testing Guide** (`/tests/unit/README.md`) + Instructions for writing, running, and maintaining unit tests. + +- **Evaluation Framework** (`/evaluation/README.md`) + Documentation for the evaluation framework, benchmarks, and performance testing. + +### Advanced Features +- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) + Detailed information about the skills architecture, implementation, and usage. + +### Documentation Standards +- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) + Standards and guidelines for writing and maintaining project documentation. + +## Getting Started with Development + +If you're new to developing with OpenHands, we recommend following this sequence: + +1. Start with the main `README.md` to understand the project's purpose and features +2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute +3. Follow the setup instructions in `Development.md` +4. Dive into specific component documentation based on your area of interest: + - Frontend developers should focus on `/frontend/README.md` + - Backend developers should start with `/openhands/README.md` + - Infrastructure work should begin with `/containers/README.md` + +## Documentation Updates + +When making changes to the codebase, please ensure that: +1. Relevant documentation is updated to reflect your changes +2. New features are documented in the appropriate README files +3. Any API changes are reflected in the server documentation +4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` + + +# Evaluation Harness +Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness + +This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. + +## Setup Environment and LLM Configuration + +Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. +OpenHands in development mode uses `config.toml` to keep track of most configurations. + +Here's an example configuration file you can use to define and use multiple LLMs: + +```toml +[llm] +# IMPORTANT: add your API key here, and set the model to the one you want to evaluate +model = "claude-3-5-sonnet-20241022" +api_key = "sk-XXX" + +[llm.eval_gpt4_1106_preview_llm] +model = "gpt-4-1106-preview" +api_key = "XXX" +temperature = 0.0 + +[llm.eval_some_openai_compatible_model_llm] +model = "openai/MODEL_NAME" +base_url = "https://OPENAI_COMPATIBLE_URL/v1" +api_key = "XXX" +temperature = 0.0 +``` + + +## How to use OpenHands in the command line + +OpenHands can be run from the command line using the following format: + +```bash +poetry run python ./openhands/core/main.py \ + -i \ + -t "" \ + -c \ + -l +``` + +For example: + +```bash +poetry run python ./openhands/core/main.py \ + -i 10 \ + -t "Write me a bash script that prints hello world." \ + -c CodeActAgent \ + -l llm +``` + +This command runs OpenHands with: +- A maximum of 10 iterations +- The specified task description +- Using the CodeActAgent +- With the LLM configuration defined in the `llm` section of your `config.toml` file + +## How does OpenHands work + +The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: + +1. Parse command-line arguments and load the configuration +2. Create a runtime environment using `create_runtime()` +3. Initialize the specified agent +4. Run the controller using `run_controller()`, which: + - Attaches the runtime to the agent + - Executes the agent's task + - Returns a final state when complete + +The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. + + +## Easiest way to get started: Exploring Existing Benchmarks + +We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. + +To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. + +## How to create an evaluation workflow + + +To create an evaluation workflow for your benchmark, follow these steps: + +1. Import relevant OpenHands utilities: + ```python + import openhands.agenthub + from evaluation.utils.shared import ( + EvalMetadata, + EvalOutput, + make_metadata, + prepare_dataset, + reset_logger_for_multiprocessing, + run_evaluation, + ) + from openhands.controller.state.state import State + from openhands.core.config import ( + AppConfig, + SandboxConfig, + get_llm_config_arg, + parse_arguments, + ) + from openhands.core.logger import openhands_logger as logger + from openhands.core.main import create_runtime, run_controller + from openhands.events.action import CmdRunAction + from openhands.events.observation import CmdOutputObservation, ErrorObservation + from openhands.runtime.runtime import Runtime + ``` + +2. Create a configuration: + ```python + def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: + config = AppConfig( + default_agent=metadata.agent_class, + runtime='docker', + max_iterations=metadata.max_iterations, + sandbox=SandboxConfig( + base_container_image='your_container_image', + enable_auto_lint=True, + timeout=300, + ), + ) + config.set_llm_config(metadata.llm_config) + return config + ``` + +3. Initialize the runtime and set up the evaluation environment: + ```python + def initialize_runtime(runtime: Runtime, instance: pd.Series): + # Set up your evaluation environment here + # For example, setting environment variables, preparing files, etc. + pass + ``` + +4. Create a function to process each instance: + ```python + from openhands.utils.async_utils import call_async_from_sync + def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: + config = get_config(instance, metadata) + runtime = create_runtime(config) + call_async_from_sync(runtime.connect) + initialize_runtime(runtime, instance) + + instruction = get_instruction(instance, metadata) + + state = run_controller( + config=config, + task_str=instruction, + runtime=runtime, + fake_user_response_fn=your_user_response_function, + ) + + # Evaluate the agent's actions + evaluation_result = await evaluate_agent_actions(runtime, instance) + + return EvalOutput( + instance_id=instance.instance_id, + instruction=instruction, + test_result=evaluation_result, + metadata=metadata, + history=compatibility_for_eval_history_pairs(state.history), + metrics=state.metrics.get() if state.metrics else None, + error=state.last_error if state and state.last_error else None, + ) + ``` + +5. Run the evaluation: + ```python + metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) + output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') + instances = prepare_dataset(your_dataset, output_file, eval_n_limit) + + await run_evaluation( + instances, + metadata, + output_file, + num_workers, + process_instance + ) + ``` + +This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. + +Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. + +By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. + + +## Understanding the `user_response_fn` + +The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. + + +### Workflow and Interaction + +The correct workflow for handling actions and the `user_response_fn` is as follows: + +1. Agent receives a task and starts processing +2. Agent emits an Action +3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): + - The Runtime processes the Action + - Runtime returns an Observation +4. If the Action is not executable (typically a MessageAction): + - The `user_response_fn` is called + - It returns a simulated user response +5. The agent receives either the Observation or the simulated response +6. Steps 2-5 repeat until the task is completed or max iterations are reached + +Here's a more accurate visual representation: + +``` + [Agent] + | + v + [Emit Action] + | + v + [Is Action Executable?] + / \ + Yes No + | | + v v + [Runtime] [user_response_fn] + | | + v v + [Return Observation] [Simulated Response] + \ / + \ / + v v + [Agent receives feedback] + | + v + [Continue or Complete Task] +``` + +In this workflow: + +- Executable actions (like running commands or executing code) are handled directly by the Runtime +- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` +- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` + +This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. + +### Example Implementation + +Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: + +```python +def codeact_user_response(state: State | None) -> str: + msg = ( + 'Please continue working on the task on whatever approach you think is suitable.\n' + 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' + 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' + ) + + if state and state.history: + # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up + user_msgs = [ + event + for event in state.history + if isinstance(event, MessageAction) and event.source == 'user' + ] + if len(user_msgs) >= 2: + # let the agent know that it can give up when it has tried 3 times + return ( + msg + + 'If you want to give up, run: exit .\n' + ) + return msg +``` + +This function does the following: + +1. Provides a standard message encouraging the agent to continue working +2. Checks how many times the agent has attempted to communicate with the user +3. If the agent has made multiple attempts, it provides an option to give up + +By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. + + +# WebSocket Connection +Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection + +This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. + +## Overview + +OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: + +1. Receive real-time events from the agent +2. Send user actions to the agent +3. Maintain a persistent connection for ongoing conversations + +## Connecting to the WebSocket + +### Connection Parameters + +When connecting to the WebSocket, you need to provide the following query parameters: + +- `conversation_id`: The ID of the conversation you want to join +- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) +- `providers_set`: (Optional) A comma-separated list of provider types + +### Connection Example + +Here's a basic example of connecting to the WebSocket using JavaScript: + +```javascript +import { io } from "socket.io-client"; + +const socket = io("http://localhost:3000", { + transports: ["websocket"], + query: { + conversation_id: "your-conversation-id", + latest_event_id: -1, + providers_set: "github,gitlab" // Optional + } +}); + +socket.on("connect", () => { + console.log("Connected to OpenHands WebSocket"); +}); + +socket.on("oh_event", (event) => { + console.log("Received event:", event); +}); + +socket.on("connect_error", (error) => { + console.error("Connection error:", error); +}); + +socket.on("disconnect", (reason) => { + console.log("Disconnected:", reason); +}); +``` + +## Sending Actions to the Agent + +To send an action to the agent, use the `oh_user_action` event: + +```javascript +// Send a user message to the agent +socket.emit("oh_user_action", { + type: "message", + source: "user", + message: "Hello, can you help me with my project?" +}); +``` + +## Receiving Events from the Agent + +The server emits events using the `oh_event` event type. Here are some common event types you might receive: + +- User messages (`source: "user", type: "message"`) +- Agent messages (`source: "agent", type: "message"`) +- File edits (`action: "edit"`) +- File writes (`action: "write"`) +- Command executions (`action: "run"`) + +Example event handler: + +```javascript +socket.on("oh_event", (event) => { + if (event.source === "agent" && event.type === "message") { + console.log("Agent says:", event.message); + } else if (event.action === "run") { + console.log("Command executed:", event.args.command); + console.log("Result:", event.result); + } +}); +``` + +## Using Websocat for Testing + +[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. + +### Installation + +```bash +# On macOS +brew install websocat + +# On Linux +curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat +chmod +x websocat +sudo mv websocat /usr/local/bin/ +``` + +### Connecting to the WebSocket + +```bash +# Connect to the WebSocket and print all received messages +echo "40{}" | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +### Sending a Message + +```bash +# Send a message to the agent +echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +### Complete Example with Websocat + +Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: + +```bash +# Start a persistent connection +websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" + +# In another terminal, send a message +echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +## Event Structure + +Events sent and received through the WebSocket follow a specific structure: + +```typescript +interface OpenHandsEvent { + id: string; // Unique event ID + source: string; // "user" or "agent" + timestamp: string; // ISO timestamp + message?: string; // For message events + type?: string; // Event type (e.g., "message") + action?: string; // Action type (e.g., "run", "edit", "write") + args?: any; // Action arguments + result?: any; // Action result +} +``` + +## Best Practices + +1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. +2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. +3. **Error Handling**: Implement proper error handling for connection errors and failed actions. +4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. + +## Troubleshooting + +### Connection Issues + +- Verify that the OpenHands server is running and accessible +- Check that you're providing the correct conversation ID +- Ensure your WebSocket URL is correctly formatted + +### Authentication Issues + +- Make sure you have the necessary authentication cookies if required +- Verify that you have permission to access the specified conversation + +### Event Handling Issues + +- Check that you're correctly parsing the event data +- Verify that your event handlers are properly registered + + +# Environment Variables Reference +Source: https://docs.openhands.dev/openhands/usage/environment-variables + +This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. + +## Environment Variable Naming Convention + +OpenHands follows a consistent naming pattern for environment variables: + +- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) +- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) +- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) +- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) +- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) + +## Core Configuration Variables + +These variables correspond to the `[core]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | +| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | +| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | +| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | +| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | +| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | +| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | +| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | +| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | +| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | +| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | +| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | +| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | +| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | +| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | +| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | +| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | + +## LLM Configuration Variables + +These variables correspond to the `[llm]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | +| `LLM_API_KEY` | string | `""` | API key for the LLM provider | +| `LLM_BASE_URL` | string | `""` | Custom API base URL | +| `LLM_API_VERSION` | string | `""` | API version to use | +| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | +| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | +| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | +| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | +| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | +| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | +| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | +| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | +| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | +| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | +| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | +| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | +| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | +| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | +| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | +| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | +| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | +| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | + +### AWS Configuration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | +| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | +| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | + +## Agent Configuration Variables + +These variables correspond to the `[agent]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | +| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | +| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | +| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | +| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | +| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | +| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | +| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | + +## Sandbox Configuration Variables + +These variables correspond to the `[sandbox]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | +| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | +| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | +| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | +| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | +| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | +| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | +| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | +| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | +| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | +| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | +| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | +| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | +| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | +| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | +| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | +| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | +| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | +| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | +| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | + +### Sandbox Environment Variables +Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: + +| Environment Variable | Description | +|---------------------|-------------| +| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | + +## Security Configuration Variables + +These variables correspond to the `[security]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | +| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | +| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | + +## Debug and Logging Variables + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable general debug logging | +| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | +| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | +| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | + +## Runtime-Specific Variables + +### Docker Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | + +### Remote Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | +| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | + +### Local Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | +| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | +| `RUNTIME_ID` | string | `""` | Runtime identifier | +| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | + +## Integration Variables + +### GitHub Integration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | + +### Third-Party API Keys +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `OPENAI_API_KEY` | string | `""` | OpenAI API key | +| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | +| `GOOGLE_API_KEY` | string | `""` | Google API key | +| `AZURE_API_KEY` | string | `""` | Azure API key | +| `TAVILY_API_KEY` | string | `""` | Tavily search API key | + +## Server Configuration Variables + +These are primarily used when running OpenHands as a server: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `FRONTEND_PORT` | integer | `3000` | Frontend server port | +| `BACKEND_PORT` | integer | `8000` | Backend server port | +| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | +| `BACKEND_HOST` | string | `"localhost"` | Backend host address | +| `WEB_HOST` | string | `"localhost"` | Web server host | +| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | + +## Deprecated Variables + +These variables are deprecated and should be replaced: + +| Environment Variable | Replacement | Description | +|---------------------|-------------|-------------| +| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | + +## Usage Examples + +### Basic Setup with OpenAI +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-openai-api-key" +export DEBUG=true +``` + +### Docker Deployment with Custom Volumes +```bash +export RUNTIME="docker" +export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" +export SANDBOX_TIMEOUT=300 +``` + +### Remote Runtime Configuration +```bash +export RUNTIME="remote" +export SANDBOX_API_KEY="your-remote-api-key" +export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" +``` + +### Security-Enhanced Setup +```bash +export SECURITY_CONFIRMATION_MODE=true +export SECURITY_SECURITY_ANALYZER="llm" +export DEBUG_RUNTIME=true +``` + +## Notes + +1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). + +2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. + +3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. + +4. **Precedence**: Environment variables take precedence over TOML configuration files. + +5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: + ```bash + docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands + ``` + +6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. + + +# Good vs. Bad Instructions +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions + +The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. + +## Concrete Examples of Good/Bad Prompts + +### Bug Fixing Examples + +#### Bad Example + +``` +Fix the bug in my code. +``` + +**Why it's bad:** +- No information about what the bug is +- No indication of where to look +- No description of expected vs. actual behavior +- OpenHands would have to guess what's wrong + +#### Good Example + +``` +Fix the TypeError in src/api/users.py line 45. + +Error message: +TypeError: 'NoneType' object has no attribute 'get' + +Expected behavior: The get_user_preferences() function should return +default preferences when the user has no saved preferences. + +Actual behavior: It crashes with the error above when user.preferences is None. + +The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. +``` + +**Why it works:** +- Specific file and line number +- Exact error message +- Clear expected vs. actual behavior +- Suggested approach for the fix + +### Feature Development Examples + +#### Bad Example + +``` +Add user authentication to my app. +``` + +**Why it's bad:** +- Scope is too large and undefined +- No details about authentication requirements +- No mention of existing code or patterns +- Could mean many different things + +#### Good Example + +``` +Add email/password login to our Express.js API. + +Requirements: +1. POST /api/auth/login endpoint +2. Accept email and password in request body +3. Validate against users in PostgreSQL database +4. Return JWT token on success, 401 on failure +5. Use bcrypt for password comparison (already in dependencies) + +Follow the existing patterns in src/api/routes.js for route structure. +Use the existing db.query() helper in src/db/index.js for database access. + +Success criteria: I can call the endpoint with valid credentials +and receive a JWT token that works with our existing auth middleware. +``` + +**Why it works:** +- Specific, scoped feature +- Clear technical requirements +- Points to existing patterns to follow +- Defines what "done" looks like + +### Code Review Examples + +#### Bad Example + +``` +Review my code. +``` + +**Why it's bad:** +- No code provided or referenced +- No indication of what to look for +- No context about the code's purpose +- No criteria for the review + +#### Good Example + +``` +Review this pull request for our payment processing module: + +Focus areas: +1. Security - we're handling credit card data +2. Error handling - payments must never silently fail +3. Idempotency - duplicate requests should be safe + +Context: +- This integrates with Stripe API +- It's called from our checkout flow +- We have ~10,000 transactions/day + +Please flag any issues as Critical/Major/Minor with explanations. +``` + +**Why it works:** +- Clear scope and focus areas +- Important context provided +- Business implications explained +- Requested output format specified + +### Refactoring Examples + +#### Bad Example + +``` +Make the code better. +``` + +**Why it's bad:** +- "Better" is subjective and undefined +- No specific problems identified +- No goals for the refactoring +- No constraints or requirements + +#### Good Example + +``` +Refactor the UserService class in src/services/user.js: + +Problems to address: +1. The class is 500+ lines - split into smaller, focused services +2. Database queries are mixed with business logic - separate them +3. There's code duplication in the validation methods + +Constraints: +- Keep the public API unchanged (other code depends on it) +- Maintain test coverage (run npm test after changes) +- Follow our existing service patterns in src/services/ + +Goal: Improve maintainability while keeping the same functionality. +``` + +**Why it works:** +- Specific problems identified +- Clear constraints and requirements +- Points to patterns to follow +- Measurable success criteria + +## Key Principles for Effective Instructions + +### Be Specific + +Vague instructions produce vague results. Be concrete about: + +| Instead of... | Say... | +|---------------|--------| +| "Fix the error" | "Fix the TypeError on line 45 of api.py" | +| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | +| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | +| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | + +### Provide Context + +Help OpenHands understand the bigger picture: + +``` +Context to include: +- What does this code do? (purpose) +- Who uses it? (users/systems) +- Why does this matter? (business impact) +- What constraints exist? (performance, compatibility) +- What patterns should be followed? (existing conventions) +``` + +**Example with context:** + +``` +Add rate limiting to our public API endpoints. + +Context: +- This is a REST API serving mobile apps and third-party integrations +- We've been seeing abuse from web scrapers hitting us 1000+ times/minute +- Our infrastructure can handle 100 req/sec per client sustainably +- We use Redis (already available in the project) +- Our API follows the controller pattern in src/controllers/ + +Requirement: Limit each API key to 100 requests per minute with +appropriate 429 responses and Retry-After headers. +``` + +### Set Clear Goals + +Define what success looks like: + +``` +Success criteria checklist: +✓ What specific outcome do you want? +✓ How will you verify it worked? +✓ What tests should pass? +✓ What should the user experience be? +``` + +**Example with clear goals:** + +``` +Implement password reset functionality. + +Success criteria: +1. User can request reset via POST /api/auth/forgot-password +2. System sends email with secure reset link +3. Link expires after 1 hour +4. User can set new password via POST /api/auth/reset-password +5. Old sessions are invalidated after password change +6. All edge cases return appropriate error messages +7. Existing tests still pass, new tests cover the feature +``` + +### Include Constraints + +Specify what you can't or won't change: + +``` +Constraints to specify: +- API compatibility (can't break existing clients) +- Technology restrictions (must use existing stack) +- Performance requirements (must respond in <100ms) +- Security requirements (must not log PII) +- Time/scope limits (just this one file) +``` + +## Common Pitfalls to Avoid + +### Vague Requirements + + + + ``` + Make the dashboard faster. + ``` + + + ``` + The dashboard takes 5 seconds to load. + + Profile it and optimize to load in under 1 second. + + Likely issues: + - N+1 queries in getWidgetData() + - Uncompressed images + - Missing database indexes + + Focus on the biggest wins first. + ``` + + + +### Missing Context + + + + ``` + Add caching to the API. + ``` + + + ``` + Add caching to the product catalog API. + + Context: + - 95% of requests are for the same 1000 products + - Product data changes only via admin panel (rare) + - We already have Redis running for sessions + - Current response time is 200ms, target is <50ms + + Cache strategy: Cache product data in Redis with 5-minute TTL, + invalidate on product update. + ``` + + + +### Unrealistic Expectations + + + + ``` + Rewrite our entire backend from PHP to Go. + ``` + + + ``` + Create a Go microservice for the image processing currently in + src/php/ImageProcessor.php. + + This is the first step in our gradual migration. + The Go service should: + 1. Expose the same API endpoints + 2. Be deployable alongside the existing PHP app + 3. Include a feature flag to route traffic + + Start with just the resize and crop functions. + ``` + + + +### Incomplete Information + + + + ``` + The login is broken, fix it. + ``` + + + ``` + Users can't log in since yesterday's deployment. + + Symptoms: + - Login form submits but returns 500 error + - Server logs show: "Redis connection refused" + - Redis was moved to a new host yesterday + + The issue is likely in src/config/redis.js which may + have the old host hardcoded. + + Expected: Login should work with the new Redis at redis.internal:6380 + ``` + + + +## Best Practices + +### Structure Your Instructions + +Use clear structure for complex requests: + +``` +## Task +[One sentence describing what you want] + +## Background +[Context and why this matters] + +## Requirements +1. [Specific requirement] +2. [Specific requirement] +3. [Specific requirement] + +## Constraints +- [What you can't change] +- [What must be preserved] + +## Success Criteria +- [How to verify it works] +``` + +### Provide Examples + +Show what you want through examples: + +``` +Add input validation to the user registration endpoint. + +Example of what validation errors should look like: + +{ + "error": "validation_failed", + "details": [ + {"field": "email", "message": "Invalid email format"}, + {"field": "password", "message": "Must be at least 8 characters"} + ] +} + +Validate: +- email: valid format, not already registered +- password: min 8 chars, at least 1 number +- username: 3-20 chars, alphanumeric only +``` + +### Define Success Criteria + +Be explicit about what "done" means: + +``` +This task is complete when: +1. All existing tests pass (npm test) +2. New tests cover the added functionality +3. The feature works as described in the acceptance criteria +4. Code follows our style guide (npm run lint passes) +5. Documentation is updated if needed +``` + +### Iterate and Refine + +Build on previous work: + +``` +In our last session, you added the login endpoint. + +Now add the logout functionality: +1. POST /api/auth/logout endpoint +2. Invalidate the current session token +3. Clear any server-side session data +4. Follow the same patterns used in login + +The login implementation is in src/api/auth/login.js for reference. +``` + +## Quick Reference + +| Element | Bad | Good | +|---------|-----|------| +| Location | "in the code" | "in src/api/users.py line 45" | +| Problem | "it's broken" | "TypeError when user.preferences is None" | +| Scope | "add authentication" | "add JWT-based login endpoint" | +| Behavior | "make it work" | "return 200 with user data on success" | +| Patterns | (none) | "follow patterns in src/services/" | +| Success | (none) | "all tests pass, endpoint returns correct data" | + + +The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. + + + +# OpenHands in Your SDLC +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration + +OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. + +## Integration with Development Workflows + +### Planning Phase + +Use OpenHands during planning to accelerate technical decisions: + +**Technical specification assistance:** +``` +Create a technical specification for adding search functionality: + +Requirements from product: +- Full-text search across products and articles +- Filter by category, price range, and date +- Sub-200ms response time at 1000 QPS + +Provide: +1. Architecture options (Elasticsearch vs. PostgreSQL full-text) +2. Data model changes needed +3. API endpoint designs +4. Estimated implementation effort +5. Risks and mitigations +``` + +**Sprint planning support:** +``` +Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: + +Story 1: As a user, I can reset my password via email +Story 2: As an admin, I can view user activity logs + +For each story, create: +- Technical subtasks +- Estimated effort (hours) +- Dependencies on other work +- Testing requirements +``` + +### Development Phase + +OpenHands excels during active development: + +**Feature implementation:** +- Write new features with clear specifications +- Follow existing code patterns automatically +- Generate tests alongside code +- Create documentation as you go + +**Bug fixing:** +- Analyze error logs and stack traces +- Identify root causes +- Implement fixes with regression tests +- Document the issue and solution + +**Code improvement:** +- Refactor for clarity and maintainability +- Optimize performance bottlenecks +- Update deprecated APIs +- Improve error handling + +### Testing Phase + +Automate test creation and improvement: + +``` +Add comprehensive tests for the UserService module: + +Current coverage: 45% +Target coverage: 85% + +1. Analyze uncovered code paths using the codecov module +2. Write unit tests for edge cases +3. Add integration tests for API endpoints +4. Create test data factories +5. Document test scenarios + +Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). +``` + +### Review Phase + +Accelerate code reviews: + +``` +Review this PR for our coding standards: + +Check for: +1. Security issues (SQL injection, XSS, etc.) +2. Performance concerns +3. Test coverage adequacy +4. Documentation completeness +5. Adherence to our style guide + +Provide actionable feedback with severity ratings. +``` + +### Deployment Phase + +Assist with deployment preparation: + +``` +Prepare for production deployment: + +1. Review all changes since last release +2. Check for breaking API changes +3. Verify database migrations are reversible +4. Update the changelog +5. Create release notes +6. Identify rollback steps if needed +``` + +## CI/CD Integration + +OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. + +### GitHub Actions Integration + +The Software Agent SDK provides composite GitHub Actions for common workflows: + +- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments +- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK + +For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. + +### What You Can Automate + +Using the SDK, you can create GitHub Actions workflows to: + +1. **Automatic code review** when a PR is opened +2. **Automatically update docs** weekly when new functionality is added +3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements +4. **Manage TODO comments** and track technical debt +5. **Assign reviewers** based on code ownership patterns + +### Getting Started + +To integrate OpenHands into your CI/CD: + +1. Review the [SDK Getting Started guide](/sdk/getting-started) +2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) +3. Set up your `LLM_API_KEY` as a repository secret +4. Use the provided composite actions or build custom workflows + +See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. + +## Team Workflows + +### Solo Developer Workflows + +For individual developers: + +**Daily workflow:** +1. **Morning review**: Have OpenHands analyze overnight CI results +2. **Feature development**: Use OpenHands for implementation +3. **Pre-commit**: Request review before pushing +4. **Documentation**: Generate/update docs for changes + +**Best practices:** +- Set up automated reviews on all PRs +- Use OpenHands for boilerplate and repetitive tasks +- Keep AGENTS.md updated with project patterns + +### Small Team Workflows + +For teams of 2-10 developers: + +**Collaborative workflow:** +``` +Team Member A: Creates feature branch, writes initial implementation +OpenHands: Reviews code, suggests improvements +Team Member B: Reviews OpenHands suggestions, approves or modifies +OpenHands: Updates documentation, adds missing tests +Team: Merges after final human review +``` + +**Communication integration:** +- Slack notifications for OpenHands findings +- Automatic issue creation for bugs found +- Weekly summary reports + +### Enterprise Team Workflows + +For larger organizations: + +**Governance and oversight:** +- Configure approval requirements for OpenHands changes +- Set up audit logging for all AI-assisted changes +- Define scope limits for automated actions +- Establish human review requirements + +**Scale patterns:** +``` +Central Platform Team: +├── Defines OpenHands policies +├── Manages integrations +└── Monitors usage and quality + +Feature Teams: +├── Use OpenHands within policies +├── Customize for team needs +└── Report issues to platform team +``` + +## Best Practices + +### Code Review Integration + +Set up effective automated reviews: + +```yaml +# .openhands/review-config.yml +review: + focus_areas: + - security + - performance + - test_coverage + - documentation + + severity_levels: + block_merge: + - critical + - security + require_response: + - major + informational: + - minor + - suggestion + + ignore_patterns: + - "*.generated.*" + - "vendor/*" +``` + +### Pull Request Automation + +Automate common PR tasks: + +| Trigger | Action | +|---------|--------| +| PR opened | Auto-review, label by type | +| Tests fail | Analyze failures, suggest fixes | +| Coverage drops | Identify missing tests | +| PR approved | Update changelog, check docs | + +### Quality Gates + +Define automated quality gates: + +```yaml +quality_gates: + - name: test_coverage + threshold: 80% + action: block_merge + + - name: security_issues + threshold: 0 critical + action: block_merge + + - name: code_review_score + threshold: 7/10 + action: require_review + + - name: documentation + requirement: all_public_apis + action: warn +``` + +### Automated Testing + +Integrate OpenHands with your testing strategy: + +**Test generation triggers:** +- New code without tests +- Coverage below threshold +- Bug fix without regression test +- API changes without contract tests + +**Example workflow:** +```yaml +on: + push: + branches: [main] + +jobs: + ensure-coverage: + steps: + - name: Check coverage + run: | + COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') + if [ "$COVERAGE" -lt "80" ]; then + openhands generate-tests --target 80 + fi +``` + +## Common Integration Patterns + +### Pre-Commit Hooks + +Run OpenHands checks before commits: + +```bash +# .git/hooks/pre-commit +#!/bin/bash + +# Quick code review +openhands review --quick --staged-only + +if [ $? -ne 0 ]; then + echo "OpenHands found issues. Review and fix before committing." + exit 1 +fi +``` + +### Post-Commit Actions + +Automate tasks after commits: + +```yaml +# .github/workflows/post-commit.yml +on: + push: + branches: [main] + +jobs: + update-docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Update API docs + run: openhands update-docs --api + - name: Commit changes + run: | + git add docs/ + git commit -m "docs: auto-update API documentation" || true + git push +``` + +### Scheduled Tasks + +Run regular maintenance: + +```yaml +# Weekly dependency check +on: + schedule: + - cron: '0 9 * * 1' # Monday 9am + +jobs: + dependency-review: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check dependencies + run: | + openhands check-dependencies --security --outdated + - name: Create issues + run: openhands create-issues --from-report deps.json +``` + +### Event-Triggered Workflows + +You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. + +For more event-driven automation patterns, see: +- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events +- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage + + +# When to Use OpenHands +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands + +OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. + +## Task Complexity Guidance + +### Simple Tasks + +**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. + +- Adding a new function or method +- Writing unit tests for existing code +- Fixing simple bugs with clear error messages +- Code formatting and style fixes +- Adding documentation or comments +- Simple refactoring (rename, extract method) +- Configuration changes + +**Example prompt:** +``` +Add a calculateDiscount() function to src/utils/pricing.js that takes +a price and discount percentage, returns the discounted price. +Add unit tests. +``` + +### Medium Complexity Tasks + +**Good for OpenHands** — These tasks may need more context and possibly some iteration. + +- Implementing a new API endpoint +- Adding a feature to an existing module +- Debugging issues that span multiple files +- Migrating code to a new pattern +- Writing integration tests +- Performance optimization with clear metrics +- Setting up CI/CD workflows + +**Example prompt:** +``` +Add a user profile endpoint to our API: +- GET /api/users/:id/profile +- Return user data with their recent activity +- Follow patterns in existing controllers +- Add integration tests +- Handle not-found and unauthorized cases +``` + +### Complex Tasks + +**May require iteration** — These benefit from breaking down into smaller pieces. + +- Large refactoring across many files +- Architectural changes +- Implementing complex business logic +- Multi-service integrations +- Performance optimization without clear cause +- Security audits +- Framework or major dependency upgrades + +**Recommended approach:** +``` +Break large tasks into phases: + +Phase 1: "Analyze the current authentication system and document +all touch points that need to change for OAuth2 migration." + +Phase 2: "Implement the OAuth2 provider configuration and basic +token flow, keeping existing auth working in parallel." + +Phase 3: "Migrate the user login flow to use OAuth2, maintaining +backwards compatibility." +``` + +## Best Use Cases + +### Ideal Scenarios + +OpenHands is **most effective** when: + +| Scenario | Why It Works | +|----------|--------------| +| Clear requirements | OpenHands can work independently | +| Well-defined scope | Less ambiguity, fewer iterations | +| Existing patterns to follow | Consistency with codebase | +| Good test coverage | Easy to verify changes | +| Isolated changes | Lower risk of side effects | + +**Perfect use cases:** + +- **Bug fixes with reproduction steps**: Clear problem, measurable solution +- **Test additions**: Existing code provides the specification +- **Documentation**: Code is the source of truth +- **Boilerplate generation**: Follows established patterns +- **Code review and analysis**: Read-only, analytical tasks + +### Good Fit Scenarios + +OpenHands works **well with some guidance** for: + +- **Feature implementation**: When requirements are documented +- **Refactoring**: When goals and constraints are clear +- **Debugging**: When you can provide logs and context +- **Code modernization**: When patterns are established +- **API development**: When specs exist + +**Tips for these scenarios:** + +1. Provide clear acceptance criteria +2. Point to examples of similar work in the codebase +3. Specify constraints and non-goals +4. Be ready to iterate and clarify + +### Poor Fit Scenarios + +**Consider alternatives** when: + +| Scenario | Challenge | Alternative | +|----------|-----------|-------------| +| Vague requirements | Unclear what "done" means | Define requirements first | +| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | +| Highly sensitive code | Risk tolerance is zero | Human review essential | +| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | +| Visual design | Subjective aesthetic judgments | Use design tools | + +**Red flags that a task may not be suitable:** + +- "Make it look better" (subjective) +- "Figure out what's wrong" (too vague) +- "Rewrite everything" (too large) +- "Do what makes sense" (unclear requirements) +- Changes to production infrastructure without review + +## Limitations + +### Current Limitations + +Be aware of these constraints: + +- **Long-running processes**: Sessions have time limits +- **Interactive debugging**: Can't set breakpoints interactively +- **Visual verification**: Can't see rendered UI easily +- **External system access**: May need credentials configured +- **Large codebase analysis**: Memory and time constraints + +### Technical Constraints + +| Constraint | Impact | Workaround | +|------------|--------|------------| +| Session duration | Very long tasks may timeout | Break into smaller tasks | +| Context window | Can't see entire large codebase at once | Focus on relevant files | +| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | +| Network access | Some external services may be blocked | Use local resources when possible | + +### Scope Boundaries + +OpenHands works within your codebase but has boundaries: + +**Can do:** +- Read and write files in the repository +- Run tests and commands +- Access configured services and APIs +- Browse documentation and reference material + +**Cannot do:** +- Access your local environment outside the sandbox +- Make decisions requiring business context it doesn't have +- Replace human judgment for critical decisions +- Guarantee production-safe changes without review + +## Pre-Task Checklist + +### Prerequisites + +Before starting a task, ensure: + +- [ ] Clear description of what you want +- [ ] Expected outcome is defined +- [ ] Relevant files are identified +- [ ] Dependencies are available +- [ ] Tests can be run + +### Environment Setup + +Prepare your repository: + +```markdown +## AGENTS.md Checklist + +- [ ] Build commands documented +- [ ] Test commands documented +- [ ] Code style guidelines noted +- [ ] Architecture overview included +- [ ] Common patterns described +``` + +See [Repository Setup](/openhands/usage/customization/repository) for details. + +### Repository Preparation + +Optimize for success: + +1. **Clean state**: Commit or stash uncommitted changes +2. **Working build**: Ensure the project builds +3. **Passing tests**: Start from a green state +4. **Updated dependencies**: Resolve any dependency issues +5. **Clear documentation**: Update AGENTS.md if needed + +## Post-Task Review + +### Quality Checks + +After OpenHands completes a task: + +- [ ] Review all changed files +- [ ] Understand each change made +- [ ] Check for unintended modifications +- [ ] Verify code style consistency +- [ ] Look for hardcoded values or credentials + +### Validation Steps + +1. **Run tests**: `npm test`, `pytest`, etc. +2. **Check linting**: Ensure style compliance +3. **Build the project**: Verify it still compiles +4. **Manual testing**: Test the feature yourself +5. **Edge cases**: Try unusual inputs + +### Learning from Results + +After each significant task: + +**What went well?** +- Note effective prompt patterns +- Document successful approaches +- Update AGENTS.md with learnings + +**What could improve?** +- Identify unclear instructions +- Note missing context +- Plan better for next time + +**Update your repository:** +```markdown +## Things OpenHands Should Know (add to AGENTS.md) + +- When adding API endpoints, always add to routes/index.js +- Our date format is ISO 8601 everywhere +- All database queries go through the repository pattern +``` + +## Decision Framework + +Use this framework to decide if a task is right for OpenHands: + +``` +Is the task well-defined? +├── No → Define it better first +└── Yes → Continue + +Do you have clear success criteria? +├── No → Define acceptance criteria +└── Yes → Continue + +Is the scope manageable (< 100 LOC)? +├── No → Break into smaller tasks +└── Yes → Continue + +Do examples exist in the codebase? +├── No → Provide examples or patterns +└── Yes → Continue + +Can you verify the result? +├── No → Add tests or verification steps +└── Yes → ✅ Good candidate for OpenHands +``` + +OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! + +But it can be particularly useful for certain types of tasks. For instance: + +- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. +- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. +- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. +- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. +- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. + +There are also some tasks where agent struggle a little more. + +- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. +- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. + + +# Tutorial Library +Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials + +Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. + +## Categories Overview + +| Category | Best For | Complexity | +|----------|----------|------------| +| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | +| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | +| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | +| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | +| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | +| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | + + +For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. + + +## Task Complexity Guidance + +Before starting, assess your task's complexity: + +**Simple tasks** (5-15 minutes): +- Single file changes +- Clear, well-defined requirements +- Existing patterns to follow + +**Medium tasks** (15-45 minutes): +- Multiple file changes +- Some discovery required +- Integration with existing code + +**Complex tasks** (45+ minutes): +- Architectural changes +- Multiple components +- Requires iteration + + +Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. + + +## Best Use Cases + +OpenHands excels at: + +- **Repetitive tasks**: Boilerplate code, test generation +- **Pattern application**: Following established conventions +- **Analysis**: Code review, debugging, documentation +- **Exploration**: Understanding new codebases + +## Example Tutorials by Category + +### Testing + +#### Tutorial: Add Unit Tests for a Module + +**Goal**: Achieve 80%+ test coverage for a service module + +**Prompt**: +``` +Add unit tests for the UserService class in src/services/user.js. + +Current coverage: 35% +Target coverage: 80% + +Requirements: +1. Test all public methods +2. Cover edge cases (null inputs, empty arrays, etc.) +3. Mock external dependencies (database, API calls) +4. Follow our existing test patterns in tests/services/ +5. Use Jest as the testing framework + +Focus on these methods: +- createUser() +- updateUser() +- deleteUser() +- getUserById() +``` + +**What OpenHands does**: +1. Analyzes the UserService class +2. Identifies untested code paths +3. Creates test file with comprehensive tests +4. Mocks dependencies appropriately +5. Runs tests to verify they pass + +**Tips**: +- Provide existing test files as examples +- Specify the testing framework +- Mention any mocking conventions + +--- + +#### Tutorial: Add Integration Tests for an API + +**Goal**: Test API endpoints end-to-end + +**Prompt**: +``` +Add integration tests for the /api/products endpoints. + +Endpoints to test: +- GET /api/products (list all) +- GET /api/products/:id (get one) +- POST /api/products (create) +- PUT /api/products/:id (update) +- DELETE /api/products/:id (delete) + +Requirements: +1. Use our test database (configured in jest.config.js) +2. Set up and tear down test data properly +3. Test success cases and error cases +4. Verify response bodies and status codes +5. Follow patterns in tests/integration/ +``` + +--- + +### Data Analysis + +#### Tutorial: Create a Data Processing Script + +**Goal**: Process CSV data and generate a report + +**Prompt**: +``` +Create a Python script to analyze our sales data. + +Input: sales_data.csv with columns: date, product, quantity, price, region + +Requirements: +1. Load and validate the CSV data +2. Calculate: + - Total revenue by product + - Monthly sales trends + - Top 5 products by quantity + - Revenue by region +3. Generate a summary report (Markdown format) +4. Create visualizations (bar chart for top products, line chart for trends) +5. Save results to reports/ directory + +Use pandas for data processing and matplotlib for charts. +``` + +**What OpenHands does**: +1. Creates a Python script with proper structure +2. Implements data loading with validation +3. Calculates requested metrics +4. Generates formatted report +5. Creates and saves visualizations + +--- + +#### Tutorial: Database Query Analysis + +**Goal**: Analyze and optimize slow database queries + +**Prompt**: +``` +Analyze our slow query log and identify optimization opportunities. + +File: logs/slow_queries.log + +For each slow query: +1. Explain why it's slow +2. Suggest index additions if helpful +3. Rewrite the query if it can be optimized +4. Estimate the improvement + +Create a report in reports/query_optimization.md with: +- Summary of findings +- Prioritized recommendations +- SQL for suggested changes +``` + +--- + +### Web Scraping + +#### Tutorial: Build a Web Scraper + +**Goal**: Extract product data from a website + +**Prompt**: +``` +Create a web scraper to extract product information from our competitor's site. + +Target URL: https://example-store.com/products + +Extract for each product: +- Name +- Price +- Description +- Image URL +- SKU (if available) + +Requirements: +1. Use Python with BeautifulSoup or Scrapy +2. Handle pagination (site has 50 pages) +3. Respect rate limits (1 request/second) +4. Save results to products.json +5. Handle errors gracefully +6. Log progress to console + +Include a README with usage instructions. +``` + +**Tips**: +- Specify rate limiting requirements +- Mention error handling expectations +- Request logging for debugging + +--- + +### Code Review + + +For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). + + +#### Tutorial: Security-Focused Code Review + +**Goal**: Identify security vulnerabilities in a PR + +**Prompt**: +``` +Review this pull request for security issues: + +Focus areas: +1. Input validation - check all user inputs are sanitized +2. Authentication - verify auth checks are in place +3. SQL injection - check for parameterized queries +4. XSS - verify output encoding +5. Sensitive data - ensure no secrets in code + +For each issue found, provide: +- File and line number +- Severity (Critical/High/Medium/Low) +- Description of the vulnerability +- Suggested fix with code example + +Output format: Markdown suitable for PR comments +``` + +--- + +#### Tutorial: Performance Review + +**Goal**: Identify performance issues in code + +**Prompt**: +``` +Review the OrderService class for performance issues. + +File: src/services/order.js + +Check for: +1. N+1 database queries +2. Missing indexes (based on query patterns) +3. Inefficient loops or algorithms +4. Missing caching opportunities +5. Unnecessary data fetching + +For each issue: +- Explain the impact +- Show the problematic code +- Provide an optimized version +- Estimate the improvement +``` + +--- + +### Bug Fixing + + +For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. + + +#### Tutorial: Fix a Crash Bug + +**Goal**: Diagnose and fix an application crash + +**Prompt**: +``` +Fix the crash in the checkout process. + +Error: +TypeError: Cannot read property 'price' of undefined + at calculateTotal (src/checkout/calculator.js:45) + at processOrder (src/checkout/processor.js:23) + +Steps to reproduce: +1. Add item to cart +2. Apply discount code "SAVE20" +3. Click checkout +4. Crash occurs + +The bug was introduced in commit abc123 (yesterday's deployment). + +Requirements: +1. Identify the root cause +2. Fix the bug +3. Add a regression test +4. Verify the fix doesn't break other functionality +``` + +**What OpenHands does**: +1. Analyzes the stack trace +2. Reviews recent changes +3. Identifies the null reference issue +4. Implements a defensive fix +5. Creates test to prevent regression + +--- + +#### Tutorial: Fix a Memory Leak + +**Goal**: Identify and fix a memory leak + +**Prompt**: +``` +Investigate and fix the memory leak in our Node.js application. + +Symptoms: +- Memory usage grows 100MB/hour +- After 24 hours, app becomes unresponsive +- Restarting temporarily fixes the issue + +Suspected areas: +- Event listeners in src/events/ +- Cache implementation in src/cache/ +- WebSocket connections in src/ws/ + +Analyze these areas and: +1. Identify the leak source +2. Explain why it's leaking +3. Implement a fix +4. Add monitoring to detect future leaks +``` + +--- + +### Feature Development + +#### Tutorial: Add a REST API Endpoint + +**Goal**: Create a new API endpoint with full functionality + +**Prompt**: +``` +Add a user preferences API endpoint. + +Endpoint: /api/users/:id/preferences + +Operations: +- GET: Retrieve user preferences +- PUT: Update user preferences +- PATCH: Partially update preferences + +Preferences schema: +{ + theme: "light" | "dark", + notifications: { email: boolean, push: boolean }, + language: string, + timezone: string +} + +Requirements: +1. Follow patterns in src/api/routes/ +2. Add request validation with Joi +3. Use UserPreferencesService for business logic +4. Add appropriate error handling +5. Document the endpoint in OpenAPI format +6. Add unit and integration tests +``` + +**What OpenHands does**: +1. Creates route handler following existing patterns +2. Implements validation middleware +3. Creates or updates the service layer +4. Adds error handling +5. Generates API documentation +6. Creates comprehensive tests + +--- + +#### Tutorial: Implement a Feature Flag System + +**Goal**: Add feature flags to the application + +**Prompt**: +``` +Implement a feature flag system for our application. + +Requirements: +1. Create a FeatureFlags service +2. Support these flag types: + - Boolean (on/off) + - Percentage (gradual rollout) + - User-based (specific user IDs) +3. Load flags from environment variables initially +4. Add a React hook: useFeatureFlag(flagName) +5. Add middleware for API routes + +Initial flags to configure: +- new_checkout: boolean, default false +- dark_mode: percentage, default 10% +- beta_features: user-based + +Include documentation and tests. +``` + +--- + +## Contributing Tutorials + +Have a great use case? Share it with the community! + +**What makes a good tutorial:** +- Solves a common problem +- Has clear, reproducible steps +- Includes example prompts +- Explains expected outcomes +- Provides tips for success + +**How to contribute:** +1. Create a detailed example following this format +2. Test it with OpenHands to verify it works +3. Submit via GitHub pull request to the docs repository +4. Include any prerequisites or setup required + + +These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. + + + +# Key Features +Source: https://docs.openhands.dev/openhands/usage/key-features + + + + - Displays the conversation between the user and OpenHands. + - OpenHands explains its actions in this panel. + + ![overview](/openhands/static/img/chat-panel.png) + + + - Shows the file changes performed by OpenHands. + + ![overview](/openhands/static/img/changes-tab.png) + + + - Embedded VS Code for browsing and modifying files. + - Can also be used to upload and download files. + + ![overview](/openhands/static/img/vs-tab.png) + + + - A space for OpenHands and users to run terminal commands. + + ![overview](/openhands/static/img/terminal-tab.png) + + + - Displays the web server when OpenHands runs an application. + - Users can interact with the running application. + + ![overview](/openhands/static/img/app-tab.png) + + + - Used by OpenHands to browse websites. + - The browser is non-interactive. + + ![overview](/openhands/static/img/browser-tab.png) + + + + +# Azure +Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms + +## Azure OpenAI Configuration + +When running OpenHands, you'll need to set the following environment variable using `-e` in the +docker run command: + +``` +LLM_API_VERSION="" # e.g. "2023-05-15" +``` + +Example: +```bash +docker run -it --pull=always \ + -e LLM_API_VERSION="2023-05-15" + ... +``` + +Then in the OpenHands UI Settings under the `LLM` tab: + + +You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as +<deployment-name> below. + + +1. Enable `Advanced` options. +2. Set the following: + - `Custom Model` to azure/<deployment-name> + - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) + - `API Key` to your Azure API key + +### Azure OpenAI Configuration + +When running OpenHands, set the following environment variable using `-e` in the +docker run command: + +``` +LLM_API_VERSION="" # e.g. "2024-02-15-preview" +``` + + +# Custom LLM Configurations +Source: https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs + +## How It Works + +Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: + +```toml +# Default LLM configuration +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 + +# Custom LLM configuration for a cheaper model +[llm.gpt3] +model = "gpt-3.5-turbo" +api_key = "your-api-key" +temperature = 0.2 + +# Another custom configuration with different parameters +[llm.high-creativity] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.8 +top_p = 0.9 +``` + +Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. + +## Using Custom Configurations + +### With Agents + +You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: + +```toml +[agent.RepoExplorerAgent] +# Use the cheaper GPT-3 configuration for this agent +llm_config = 'gpt3' + +[agent.CodeWriterAgent] +# Use the high creativity configuration for this agent +llm_config = 'high-creativity' +``` + +### Configuration Options + +Each named LLM configuration supports all the same options as the default LLM configuration. These include: + +- Model selection (`model`) +- API configuration (`api_key`, `base_url`, etc.) +- Model parameters (`temperature`, `top_p`, etc.) +- Retry settings (`num_retries`, `retry_multiplier`, etc.) +- Token limits (`max_input_tokens`, `max_output_tokens`) +- And all other LLM configuration options + +For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. + +## Use Cases + +Custom LLM configurations are particularly useful in several scenarios: + +- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. +- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. +- **Different Providers**: Use different LLM providers or API endpoints for different tasks. +- **Testing and Development**: Easily switch between different model configurations during development and testing. + +## Example: Cost Optimization + +A practical example of using custom LLM configurations to optimize costs: + +```toml +# Default configuration using GPT-4 for high-quality responses +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 + +# Cheaper configuration for repository exploration +[llm.repo-explorer] +model = "gpt-3.5-turbo" +temperature = 0.2 + +# Configuration for code generation +[llm.code-gen] +model = "gpt-4" +temperature = 0.0 +max_output_tokens = 2000 + +[agent.RepoExplorerAgent] +llm_config = 'repo-explorer' + +[agent.CodeWriterAgent] +llm_config = 'code-gen' +``` + +In this example: +- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code +- Code generation uses GPT-4 with a higher token limit for generating larger code blocks +- The default configuration remains available for other tasks + +# Custom Configurations with Reserved Names + +OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. + +## Draft Editor Configuration + +The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. + +For example, you can define in `config.toml` a draft editor like this: + +```toml +[llm.draft_editor] +model = "gpt-4" +temperature = 0.2 +top_p = 0.95 +presence_penalty = 0.0 +frequency_penalty = 0.0 +``` + +This configuration: +- Uses GPT-4 for high-quality edits and suggestions +- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility +- Uses a high top_p value (0.95) to consider a wide range of token options +- Disables presence and frequency penalties to maintain focus on the specific edits needed + +Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: +- Review and suggest code improvements +- Refine existing content while maintaining its core meaning +- Make precise, focused changes to code or text + + +Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. + + + +# Google Gemini/Vertex +Source: https://docs.openhands.dev/openhands/usage/llms/google-llms + +## Gemini - Google AI Studio Configs + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Gemini` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). +- `API Key` to your Gemini API key + +## VertexAI - Google Cloud Platform Configs + +To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment +variables using `-e` in the docker run command: + +``` +GOOGLE_APPLICATION_CREDENTIALS="" +VERTEXAI_PROJECT="" +VERTEXAI_LOCATION="" +``` + +Then set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `VertexAI` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. vertex_ai/<model-name>). + + +# Groq +Source: https://docs.openhands.dev/openhands/usage/llms/groq + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Groq` +- `LLM Model` to the model you will be using. [Visit here to see the list of +models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, +enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). +- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). + +## Using Groq as an OpenAI-Compatible Endpoint + +The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you +would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) + - `Base URL` to `https://api.groq.com/openai/v1` + - `API Key` to your Groq API key + + +# LiteLLM Proxy +Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy + +## Configuration + +To use LiteLLM proxy with OpenHands, you need to: + +1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) +2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: + * Enable `Advanced` options + * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) + * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) + * `API Key` to your LiteLLM proxy API key + +## Supported Models + +The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy +is configured to handle. + +Refer to your LiteLLM proxy configuration for the list of available models and their names. + + +# Overview +Source: https://docs.openhands.dev/openhands/usage/llms/llms + + +This section is for users who want to connect OpenHands to different LLMs. + + + +OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this +page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation +for the canonical list of supported parameters. + + +## Model Recommendations + +Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some +recommendations for model selection. Our latest benchmarking results can be found in +[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). + +Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: + +### Cloud / API-Based Models + +- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) +- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) +- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) +- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) +- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) +- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) + +If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process +to help others using the same provider! + +For a full list of the providers and models available, please consult the +[litellm documentation](https://docs.litellm.ai/docs/providers). + + +OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending +limits and monitor usage. + + +### Local / Self-Hosted Models + +- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) +- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) + +### Known Issues + + +Most current local and open source models are not as powerful. When using such models, you may see long +wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the +models driving it. However, if you do find ones that work, please add them to the verified list above. + + +## LLM Configuration + +The following can be set in the OpenHands UI through the Settings. Each option is serialized into the +`LLM.load_from_env()` schema before being passed to the Agent SDK: + +- `LLM Provider` +- `LLM Model` +- `API Key` +- `Base URL` (through `Advanced` settings) + +There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them +as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: + +- `LLM_API_VERSION` +- `LLM_EMBEDDING_MODEL` +- `LLM_EMBEDDING_DEPLOYMENT_NAME` +- `LLM_DROP_PARAMS` +- `LLM_DISABLE_VISION` +- `LLM_CACHING_PROMPT` + +## LLM Provider Guides + +We have a few guides for running OpenHands with specific model providers: + +- [Azure](/openhands/usage/llms/azure-llms) +- [Google](/openhands/usage/llms/google-llms) +- [Groq](/openhands/usage/llms/groq) +- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) +- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) +- [Moonshot AI](/openhands/usage/llms/moonshot) +- [OpenAI](/openhands/usage/llms/openai-llms) +- [OpenHands](/openhands/usage/llms/openhands-llms) +- [OpenRouter](/openhands/usage/llms/openrouter) + +These pages remain the authoritative provider references for both the Agent SDK +and the OpenHands interfaces. + +## Model Customization + +LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: + +- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. +- **Native Tool Calling**: Toggle native function/tool calling capabilities. + +For detailed information about model customization, see +[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). + +### API retries and rate limits + +LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically +retry requests if it receives a Rate Limit Error (429 error code). + +You can customize these options as you need for the provider you're using. Check their documentation, and set the +following environment variables to control the number of retries and the time between retries: + +- `LLM_NUM_RETRIES` (Default of 4 times) +- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) +- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) +- `LLM_RETRY_MULTIPLIER` (Default of 2) + +If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: + +```toml +[llm] +num_retries = 4 +retry_min_wait = 5 +retry_max_wait = 30 +retry_multiplier = 2 +``` + + +# Local LLMs +Source: https://docs.openhands.dev/openhands/usage/llms/local-llms + +## News + +- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! + +## Quickstart: Running OpenHands with a Local LLM using LM Studio + +This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. + +We recommend: +- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. +- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. + +### Hardware Requirements + +Running Qwen3-Coder-30B-A3B-Instruct requires: +- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or +- A Mac with Apple Silicon with at least 32GB of RAM + +### 1. Install LM Studio + +Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). + +### 2. Download the Model + +1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. +2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. + +![image](./screenshots/01_lm_studio_open_model_hub.png) + +3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. + +![image](./screenshots/02_lm_studio_download_devstral.png) + +4. Wait for the download to finish. + +### 3. Load the Model + +1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. +2. Click the "Select a model to load" dropdown at the top of the application window. + +![image](./screenshots/03_lm_studio_open_load_model.png) + +3. Enable the "Manually choose model load parameters" switch. +4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. + +![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) + +5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. +6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. +7. Click "Load Model" to start loading the model. + +![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) + +### 4. Start the LLM server + +1. Enable the switch next to "Status" at the top-left of the Window. +2. Take note of the Model API Identifier shown on the sidebar on the right. + +![image](./screenshots/06_lm_studio_start_server.png) + +### 5. Start OpenHands + +1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: + +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` + +2. Wait until the server is running (see log below): +``` +Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f +Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 +Starting OpenHands... +Running OpenHands as root +14:22:13 - openhands:INFO: server_config.py:50 - Using config class None +INFO: Started server process [8] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) +``` + +3. Visit `http://localhost:3000` in your browser. + +### 6. Configure OpenHands to use the LLM server + +Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. + +When started for the first time, OpenHands will prompt you to set up the LLM provider. + +1. Click "see advanced settings" to open the LLM Settings page. + +![image](./screenshots/07_openhands_open_advanced_settings.png) + +2. Enable the "Advanced" switch at the top of the page to show all the available settings. + +3. Set the following values: + - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") + - **Base URL**: `http://host.docker.internal:1234/v1` + - **API Key**: `local-llm` + +4. Click "Save Settings" to save the configuration. + +![image](./screenshots/08_openhands_configure_local_llm_parameters.png) + +That's it! You can now start using OpenHands with the local LLM server. + +If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). + +## Advanced: Alternative LLM Backends + +This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. + +### Create an OpenAI-Compatible Endpoint with Ollama + +- Install Ollama following [the official documentation](https://ollama.com/download). +- Example launch command for Qwen3-Coder-30B-A3B-Instruct: + +```bash +# ⚠️ WARNING: OpenHands requires a large context size to work properly. +# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. +# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. +OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & +ollama pull qwen3-coder:30b +``` + +### Create an OpenAI-Compatible Endpoint with vLLM or SGLang + +First, download the model checkpoint: + +```bash +huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct +``` + +#### Serving the model using SGLang + +- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). +- Example launch command (with at least 2 GPUs): + +```bash +SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ + --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --port 8000 \ + --tp 2 --dp 1 \ + --host 0.0.0.0 \ + --api-key mykey --context-length 131072 +``` + +#### Serving the model using vLLM + +- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). +- Example launch command (with at least 2 GPUs): + +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --enable-prefix-caching +``` + +If you are interested in further improved inference speed, you can also try Snowflake's version +of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), +which can achieve up to 2x speedup in some cases. + +1. Install the Arctic Inference library that automatically patches vLLM: + +```bash +pip install git+https://github.com/snowflakedb/ArcticInference.git +``` + +2. Run the launch command with speculative decoding enabled: + +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --speculative-config '{"method": "suffix"}' +``` + +### Run OpenHands (Alternative Backends) + +#### Using Docker + +Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). + +#### Using Development Mode + +Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. + +Start OpenHands using `make run`. + +### Configure OpenHands (Alternative Backends) + +Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. + +1. Click **"see advanced settings"** to access the full configuration panel. +2. Enable the **Advanced** toggle at the top of the page. +3. Set the following parameters, if you followed the examples above: + - **Custom Model**: `openai/` + - For **Ollama**: `openai/qwen3-coder:30b` + - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` + - **Base URL**: `http://host.docker.internal:/v1` + Use port `11434` for Ollama, or `8000` for SGLang and vLLM. + - **API Key**: + - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) + - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) + + +# Moonshot AI +Source: https://docs.openhands.dev/openhands/usage/llms/moonshot + +## Using Moonshot AI with OpenHands + +[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. + +### Setup + +1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) +2. Generate an API key from your account settings +3. Configure OpenHands to use Moonshot AI: + +| Setting | Value | +| --- | --- | +| LLM Provider | `moonshot` | +| LLM Model | `kimi-k2-0711-preview` | +| API Key | Your Moonshot API key | + +### Recommended Models + +- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. + + +# OpenAI +Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenAI` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). +* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). + +## Using OpenAI-Compatible Endpoints + +Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). + +## Using an OpenAI Proxy + +If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) + - `Base URL` to the URL of your OpenAI proxy + - `API Key` to your OpenAI API key + + +# OpenHands +Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms + +## Obtain Your OpenHands LLM API Key + +1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. + +![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `OpenHands` +- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) +- `API Key` to your OpenHands LLM API key copied from above + +## Using OpenHands LLM Provider in the CLI + +1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). +2. To select OpenHands as the LLM provider: + - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. + - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then + choose `openhands` and finally the model. + +![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) + + + +When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy + + +## Using OpenHands LLM Provider with the SDK + +You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. + +### Configuration + +The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: + +```bash +export LLM_API_KEY="your-openhands-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-20250514" +``` + +### Example + +```python +from openhands.sdk import LLM + +# The openhands/ prefix auto-configures the base URL +llm = LLM.load_from_env() + +# Or configure directly +llm = LLM( + model="openhands/claude-sonnet-4-20250514", + api_key="your-openhands-api-key", +) +``` + +The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. + +### Available Models + +When using the SDK, prefix any model from the pricing table below with `openhands/`: +- `openhands/claude-sonnet-4-20250514` +- `openhands/claude-sonnet-4-5-20250929` +- `openhands/claude-opus-4-20250514` +- `openhands/gpt-5-2025-08-07` +- etc. + + +If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. + + +## Pricing + +Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: + + +| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | +|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| +| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | +| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | +| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | +| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | +| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | +| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | +| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | +| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | +| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | +| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | +| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | + +**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. + + +# OpenRouter +Source: https://docs.openhands.dev/openhands/usage/llms/openrouter + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenRouter` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). +If the model is not in the list, enable `Advanced` options, and enter it in +`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). +* `API Key` to your OpenRouter API key. + + +# OpenHands GitHub Action +Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action + +## Using the Action in the OpenHands Repository + +To use the OpenHands GitHub Action in a repository, you can: + +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. + +The action will automatically trigger and attempt to resolve the issue. + +## Installing the Action in a New Repository + +To install the OpenHands GitHub Action in your own repository, follow +the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). + +## Usage Tips + +### Iterative resolution + +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. +3. Review the attempt to resolve the issue by checking the pull request. +4. Follow up with feedback through general comments, review comments, or inline thread comments. +5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. + +### Label versus Macro + +- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. +- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. + +## Advanced Settings + +### Add custom repository settings + +You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). + +### Custom configurations + +GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. +The customization options you can set are: + +| **Attribute name** | **Type** | **Purpose** | **Example** | +| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | +| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | +| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | +| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | +| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | +| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | + + +# Configure +Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode + +## Prerequisites + +- [OpenHands is running](/openhands/usage/run-openhands/local-setup) + +## Launching the GUI Server + +### Using the CLI Command + +You can launch the OpenHands GUI server directly from the command line using the `serve` command: + + +**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` +installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker +directly (see the [Docker section](#using-docker-directly) below). + + +```bash +openhands serve +``` + +This command will: +- Check that Docker is installed and running +- Pull the required Docker images +- Launch the OpenHands GUI server at http://localhost:3000 +- Use the same configuration directory (`~/.openhands`) as the CLI mode + +#### Mounting Your Current Directory + +To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: + +```bash +openhands serve --mount-cwd +``` + +This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. + +#### Using GPU Support + +If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: + +```bash +openhands serve --gpu +``` + +This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: + +```bash +openhands serve --gpu --mount-cwd +``` + +**Prerequisites for GPU support:** +- NVIDIA GPU drivers must be installed on your host system +- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured + +#### Requirements + +Before using the `openhands serve` command, ensure that: +- Docker is installed and running on your system +- You have internet access to pull the required Docker images +- Port 3000 is available on your system + +The CLI will automatically check these requirements and provide helpful error messages if anything is missing. + +### Using Docker Directly + +Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. + +## Overview + +### Initial Setup + +1. Upon first launch, you'll see a settings popup. +2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, + select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the + `Custom Model` text box. +3. Enter the corresponding `API Key` for your chosen provider. +4. Click `Save Changes` to apply the settings. + +### Settings + +You can use the Settings page at any time to: + +- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). +- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). +- [Configure MCP servers](/openhands/usage/settings/mcp-settings). +- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), + [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) + and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). +- Set application settings like your preferred language, notifications and other preferences. +- [Manage custom secrets](/openhands/usage/settings/secrets-settings). + +### Key Features + +For an overview of the key features available inside a conversation, please refer to the +[Key Features](/openhands/usage/key-features) section of the documentation. + +## Other Ways to Run Openhands +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) + + +# Setup +Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup + +## Recommended Methods for Running Openhands on Your Local System + +### System Requirements + +- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) +- Linux +- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) + +A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. + +### Prerequisites + + + + + + **Docker Desktop** + + 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). + 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. + + + + + + Tested with Ubuntu 22.04. + + + **Docker Desktop** + + 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). + + + + + + **WSL** + + 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). + 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. + + **Ubuntu (Linux Distribution)** + + 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. + 2. Restart computer when prompted. + 3. Open Ubuntu from Start menu to complete setup. + 4. Verify installation: `wsl --list` should show Ubuntu. + + **Docker Desktop** + + 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). + 2. Open Docker Desktop, go to `Settings` and confirm the following: + - General: `Use the WSL 2 based engine` is enabled. + - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. + + + The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. + + + + + + +### Start the App + +#### Option 1: Using the CLI Launcher with uv (Recommended) + +We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). + +**Install uv** (if you haven't already): + +See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. + +**Install OpenHands**: +```bash +uv tool install openhands --python 3.12 +``` + +**Launch OpenHands**: +```bash +# Launch the GUI server +openhands serve + +# Or with GPU support (requires nvidia-docker) +openhands serve --gpu + +# Or with current directory mounted +openhands serve --mount-cwd +``` + +This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. + +**Upgrade OpenHands**: +```bash +uv tool upgrade openhands --python 3.12 +``` + + + +If you prefer to use pip and have Python 3.12+ installed: + +```bash +# Install OpenHands +pip install openhands + +# Launch the GUI server +openhands serve +``` + +Note that you'll still need `uv` installed for the default MCP servers to work properly. + + + +#### Option 2: Using Docker Directly + + + +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` + + + +> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. + +You'll find OpenHands running at http://localhost:3000! + +### Setup + +After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. +This can be done during the initial settings popup or by selecting the `Settings` +button (gear icon) in the UI. + +If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options +and manually enter it with the correct prefix in the `Custom Model` text box. +The `Advanced` options also allow you to specify a `Base URL` if required. + +#### Getting an API Key + +OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: + + + + + +1. [Log in to OpenHands Cloud](https://app.all-hands.dev). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. + +OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). + + + + + +1. [Create an Anthropic account](https://console.anthropic.com/). +2. [Generate an API key](https://console.anthropic.com/settings/keys). +3. [Set up billing](https://console.anthropic.com/settings/billing). + + + + + +1. [Create an OpenAI account](https://platform.openai.com/). +2. [Generate an API key](https://platform.openai.com/api-keys). +3. [Set up billing](https://platform.openai.com/account/billing/overview). + + + + + +1. Create a Google account if you don't already have one. +2. [Generate an API key](https://aistudio.google.com/apikey). +3. [Set up billing](https://aistudio.google.com/usage?tab=billing). + + + + + +If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. + + + + + +Consider setting usage limits to control costs. + +#### Using a Local LLM + + +Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. + + +To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. + +#### Setting Up Search Engine + +OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. + +To enable search functionality in OpenHands: + +1. Get a Tavily API key from [tavily.com](https://tavily.com/). +2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` + +For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. + +### Versions + +The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: +- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. +For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. +- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. +This version is unstable and is recommended for testing or development purposes only. + +## Next Steps + +- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) +- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) + + +# Docker Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker + +The **Docker sandbox** runs the agent server inside a Docker container. This is +the default and recommended option for most users. + + + In some self-hosted deployments, the sandbox provider is controlled via the + legacy RUNTIME environment variable. Docker is the default. + + + +## Why Docker? + +- Isolation: reduces risk when the agent runs commands. +- Reproducibility: consistent environment across machines. + +## Mounting your code into the sandbox + +If you want OpenHands to work directly on a local repository, mount it into the +sandbox. + +### Recommended: CLI launcher + +If you start OpenHands via: + +```bash +openhands serve --mount-cwd +``` + +your current directory will be mounted into the sandbox workspace. + +### Using SANDBOX_VOLUMES + +You can also configure mounts via the SANDBOX_VOLUMES environment +variable (format: host_path:container_path[:mode]): + +```bash +export SANDBOX_VOLUMES=$PWD:/workspace:rw +``` + + + Anything mounted read-write into /workspace can be modified by the + agent. + + +## Custom sandbox images + +To customize the container image (extra tools, system deps, etc.), see +[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). + + +# Overview +Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview + +A **sandbox** is the environment where OpenHands runs commands, edits files, and +starts servers while working on your task. + +In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. + +## Sandbox providers + +OpenHands supports multiple sandbox “providers”, with different tradeoffs: + +- **Docker sandbox (recommended)** + - Runs the agent server inside a Docker container. + - Good isolation from your host machine. + +- **Process sandbox (unsafe, but fast)** + - Runs the agent server as a regular process on your machine. + - No container isolation. + +- **Remote sandbox** + - Runs the agent server in a remote environment. + - Used by managed deployments and some hosted setups. + +## Selecting a provider (current behavior) + +In some deployments, the provider selection is still controlled via the legacy +RUNTIME environment variable: + +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote + + + The user-facing terminology in V1 is sandbox, but the configuration knob + may still be called RUNTIME while the migration is in progress. + + +## Terminology note (V0 vs V1) + +Older documentation refers to these environments as **runtimes**. +Those legacy docs are now in the Legacy (V0) section of the Web tab. + + +# Process Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/process + +The **Process sandbox** runs the agent server directly on your machine as a +regular process. + + + This mode provides **no sandbox isolation**. + + The agent can read/write files your user account can access and execute + commands on your host system. + + Only use this in controlled environments. + + +## When to use it + +- Local development when Docker is unavailable +- Some CI environments +- Debugging issues that only reproduce outside containers + +## Choosing process mode + +In some deployments, this is selected via the legacy RUNTIME +environment variable: + +```bash +export RUNTIME=process +# (legacy alias) +# export RUNTIME=local +``` + +If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). + + +# Remote Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote + +A **remote sandbox** runs the agent server in a remote execution environment +instead of on your local machine. + +This is typically used by managed deployments (e.g., OpenHands Cloud) and +advanced self-hosted setups. + +## Selecting remote mode + +In some self-hosted deployments, remote sandboxes are selected via the legacy +RUNTIME environment variable: + +```bash +export RUNTIME=remote +``` + +Remote sandboxes require additional configuration (API URL + API key). The exact +variable names depend on your deployment, but you may see legacy names like: + +- SANDBOX_REMOTE_RUNTIME_API_URL +- SANDBOX_API_KEY + +## Notes + +- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) + depending on the provider. +- Configuration and credentials vary by deployment. + +If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). + + +# API Keys Settings +Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings + + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + + +## Overview + +Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to +OpenHands Cloud + +## OpenHands LLM Key + + +You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. + + +You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will +use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. + +## OpenHands API Key + +These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the +[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). + +### Create API Key + +1. Navigate to the `Settings > API Keys` page. +2. Click `Create API Key`. +3. Give your API key a name and click `Create`. + +### Delete API Key + +1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. +2. Click `Delete` to confirm removal. + + +# Application Settings +Source: https://docs.openhands.dev/openhands/usage/settings/application-settings + +## Overview + +The Application settings allows you to customize various application-level behaviors in OpenHands, including +language preferences, notification settings, custom Git author configuration and more. + +## Setting Maximum Budget Per Conversation + +To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) +in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but +you can choose to continue the conversation with a prompt. + +## Git Author Settings + +OpenHands provides the ability to customize the Git author information used when making commits and creating +pull requests on your behalf. + +By default, OpenHands uses the following Git author information for all commits and pull requests: + +- **Username**: `openhands` +- **Email**: `openhands@all-hands.dev` + +To override the defaults: + +1. Navigate to the `Settings > Application` page. +2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. +3. Click `Save Changes` + + + When you configure a custom Git author, OpenHands will use your specified username and email as the primary author + for commits and pull requests. OpenHands will remain as a co-author. + + + +# Integrations Settings +Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings + +## Overview + +OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some +integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether +you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). + +## OpenHands Cloud Integrations Settings + + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + + +### GitHub Settings + +- `Configure GitHub Repositories` - Allows you to +[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + +### Slack Settings + +- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in + your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. + +## Running on Your Own Integrations Settings + + + These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). + + +### Version Control Integrations + +#### GitHub Setup + +OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: + + + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. + - **Tokens (classic)** + - Required scopes: + - `repo` (Full control of private repositories) + - **Fine-grained tokens** + - All Repositories (You can select specific repositories, but this will impact what returns in repo search) + - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitHub Token` field. + - Click `Save Changes` to apply the changes. + + If you're working with organizational repositories, additional setup may be required: + + 1. **Check organization requirements**: + - Organization admins may enforce specific token policies. + - Some organizations require tokens to be created with SSO enabled. + - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). + 2. **Verify organization access**: + - Go to your token settings on GitHub. + - Look for the organization under `Organization access`. + - If required, click `Enable SSO` next to your organization. + - Complete the SSO authorization process. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + - Try regenerating the token. + + - **Organization Access Denied**: + - Check if SSO is required but not enabled. + - Verify organization membership. + - Contact organization admin if token policies are blocking access. + + + +#### GitLab Setup + +OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: + + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitLab, go to `User Settings > Access Tokens`. + - Create a new token with the following scopes: + - `api` (API access) + - `read_user` (Read user information) + - `read_repository` (Read repository) + - `write_repository` (Write repository) + - Set an expiration date or leave it blank for a non-expiring token. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitLab Token` field. + - Click `Save Changes` to apply the changes. + + 3. **(Optional): Restrict agent permissions** + - Create another PAT using Step 1 and exclude `api` scope . + - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. + - OpenHands will use the higher scope token, and the agent will use the lower scope token. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + + - **Access Denied**: + - Verify project access permissions. + - Check if the token has the necessary scopes. + - For group/organization repositories, ensure you have proper access. + + + +#### BitBucket Setup + + +1. **Generate an App password**: + - On Bitbucket, go to `Account Settings > App Password`. + - Create a new password with the following scopes: + - `account`: `read` + - `repository: write` + - `pull requests: write` + - `issues: write` + - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `BitBucket Token` field. + - Click `Save Changes` to apply the changes. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + + + + + +# Language Model (LLM) Settings +Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings + +## Overview + +The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is +supported by litellm, but it requires a powerful model to work properly. +[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some +additional LLM settings on this page. + +## Basic LLM Settings + +The most popular providers and models are available in the basic settings. Some of the providers have been verified to +work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and +Mistral AI. + +1. Choose your preferred provider using the `LLM Provider` dropdown. +2. Choose your favorite model using the `LLM Model` dropdown. +3. Set the `API Key` for your chosen provider and model and click `Save Changes`. + +This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first +restart older conversations. + +## Advanced LLM Settings + +Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use +this when your preferred provider or model does not exist in the basic settings dropdowns. + +1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the + custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have + [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). +2. `Base URL`: If your provider has a specific base URL, specify it here. +3. `API Key`: Set the API key for your custom model. +4. Click `Save Changes` + +### Memory Condensation + +The memory condenser manages the language model's context by ensuring only the most important and relevant information +is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running +conversations. + +- `Enable memory condensation` - Turn on this setting to activate this feature. +- `Memory condenser max history size` - The condenser will summarize the history after this many events. + + +# Model Context Protocol (MCP) +Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings + +## Overview + +Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These +servers can provide additional functionality to the agent, such as specialized data processing, external API access, +or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). + +## Supported MCPs + +OpenHands supports the following MCP transport protocols: + +* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) +* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) +* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) + +## How MCP Works + +When OpenHands starts, it: + +1. Reads the MCP configuration. +2. Connects to any configured SSE and SHTTP servers. +3. Starts any configured stdio servers. +4. Registers the tools provided by these servers with the agent. + +The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: + +1. OpenHands routes the call to the appropriate MCP server. +2. The server processes the request and returns a response. +3. OpenHands converts the response to an observation and presents it to the agent. + +## Configuration + +MCP configuration can be defined in: +* The OpenHands UI in the `Settings > MCP` page. +* The `config.toml` file under the `[mcp]` section if not using the UI. + +### Configuration Options + + + + SSE servers are configured using either a string URL or an object with the following properties: + + - `url` (required) + - Type: `str` + - Description: The URL of the SSE server. + + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + + SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: + + - `url` (required) + - Type: `str` + - Description: The URL of the SHTTP server. + + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + - `timeout` (optional) + - Type: `int` + - Default: `60` + - Range: `1-3600` seconds (1 hour maximum) + - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. + - **Use Cases:** + - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. + - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. + - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. + + This timeout only applies to individual tool calls, not server connection establishment. + + + + + While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for + better reliability and performance. + + + Stdio servers are configured using an object with the following properties: + + - `name` (required) + - Type: `str` + - Description: A unique name for the server. + + - `command` (required) + - Type: `str` + - Description: The command to run the server. + + - `args` (optional) + - Type: `list of str` + - Default: `[]` + - Description: Command-line arguments to pass to the server. + + - `env` (optional) + - Type: `dict of str to str` + - Default: `{}` + - Description: Environment variables to set for the server process. + + + +#### When to Use Direct Stdio + +Direct stdio connections may still be appropriate in these scenarios: +- **Development and testing**: Quick prototyping of MCP servers. +- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. +- **Local-only environments**: When you don't want to manage additional proxy processes. + +### Configuration Examples + + + + For stdio-based MCP servers, we recommend using MCP proxy tools like + [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. + [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to + HTTP/SSE endpoints. + + Start the proxy servers separately: + ```bash + # Terminal 1: Filesystem server proxy + supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 + + # Terminal 2: Fetch server proxy + supergateway --stdio "uvx mcp-server-fetch" --port 8081 + ``` + + Then configure OpenHands to use the HTTP endpoint: + + ```toml + [mcp] + # SSE Servers - Recommended approach using proxy tools + sse_servers = [ + # Basic SSE server with just a URL + "http://example.com:8080/mcp", + + # SuperGateway proxy for fetch server + "http://localhost:8081/sse", + + # External MCP service with authentication + {url="https://api.example.com/mcp/sse", api_key="your-api-key"} + ] + + # SHTTP Servers - Modern streamable HTTP transport (recommended) + shttp_servers = [ + # Basic SHTTP server with default 60s timeout + "https://api.example.com/mcp/shttp", + + # Server with custom timeout for heavy operations + { + url = "https://files.example.com/mcp/shttp", + api_key = "your-api-key", + timeout = 1800 # 30 minutes for large file processing + } + ] + ``` + + + + This setup is not Recommended for production. + + ```toml + [mcp] + # Direct stdio servers - use only for development/testing + stdio_servers = [ + # Basic stdio server + {name="fetch", command="uvx", args=["mcp-server-fetch"]}, + + # Stdio server with environment variables + { + name="filesystem", + command="npx", + args=["@modelcontextprotocol/server-filesystem", "/"], + env={ + "DEBUG": "true" + } + } + ] + ``` + + For production use, we recommend using proxy tools like SuperGateway. + + + +Other options include: + +- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. +- **Docker-based proxies**: Containerized solutions for better isolation. +- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. + + +# Secrets Management +Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings + +## Overview + +OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be +accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment +variables in the agent's runtime environment. + +## Accessing the Secrets Manager + +Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. + +## Adding a New Secret +1. Click `Add a new secret`. +2. Fill in the following fields: + - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. + - **Value**: The sensitive information you want to store. + - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. +3. Click `Add secret` to save. + +## Editing a Secret + +1. Click the `Edit` button next to the secret you want to modify. +2. You can update the name and description of the secret. + + For security reasons, you cannot view or edit the value of an existing secret. If you need to change the + value, delete the secret and create a new one. + + +## Deleting a Secret + +1. Click the `Delete` button next to the secret you want to remove. +2. Select `Confirm` to delete the secret. + +## Using Secrets in the Agent + - All custom secrets are automatically exported as environment variables in the agent's runtime environment. + - You can access them in your code using standard environment variable access methods. For example, if you create a + secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or + `os.environ['OPENAI_API_KEY']` in Python. + + +# Prompting Best Practices +Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices + +## Characteristics of Good Prompts + +Good prompts are: + +- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. +- **Location-specific**: Specify the locations in the codebase that should be modified, if known. +- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. + +## Examples + +### Good Prompt Examples + +- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. +- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. +- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. + +### Bad Prompt Examples + +- Make the code better. (Too vague, not concrete) +- Rewrite the entire backend to use a different framework. (Not appropriately scoped) +- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) + +## Tips for Effective Prompting + +- Be as specific as possible about the desired outcome or the problem to be solved. +- Provide context, including relevant file paths and line numbers if available. +- Break large tasks into smaller, manageable prompts. +- Include relevant error messages or logs. +- Specify the programming language or framework, if not obvious. + +The more precise and informative your prompt, the better OpenHands can assist you. + +See [First Projects](/overview/first-projects) for more examples of helpful prompts. + + +# Troubleshooting +Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting + + +OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. + + +### Launch docker client failed + +**Description** + +When running OpenHands, the following error is seen: +``` +Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. +``` + +**Resolution** + +Try these in order: +* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. +* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. +* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. +* Reinstall Docker Desktop. + +### Permission Error + +**Description** + +On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. + +**Resolution** + +* Check if the `~/.openhands` is owned by `root`. If so, you can: + * Change the directory's ownership: `sudo chown : ~/.openhands`. + * or update permissions on the directory: `sudo chmod 777 ~/.openhands` + * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. +* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running + OpenHands. + +### On Linux, Getting ConnectTimeout Error + +**Description** + +When running on Linux, you might run into the error `ERROR:root:: timed out`. + +**Resolution** + +If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that +these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker +[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. + +If that does not solve the issue, try incrementally adding the following parameters to the docker run command: +* `--network host` +* `-e SANDBOX_USE_HOST_NETWORK=true` +* `-e DOCKER_HOST_ADDR=127.0.0.1` + +### Internal Server Error. Ports are not available + +**Description** + +When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP +...: bind: An attempt was made to access a socket in a +way forbidden by its access permissions.")` is encountered. + +**Resolution** + +* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: +``` +Restart-Service -Name "winnat" +``` + +### Unable to access VS Code tab via local IP + +**Description** + +When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" +error, while other parts of the UI work fine. + +**Resolution** + +This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. +To fix this: + +1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: + ```bash + docker run -it --rm \ + -e SANDBOX_VSCODE_PORT=41234 \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + -p 41234:41234 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:latest + ``` + + > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. + +2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. +3. If running with the development workflow, you can set this in your `config.toml` file: + ```toml + [sandbox] + vscode_port = 41234 + ``` + +### GitHub Organization Rename Issues + +**Description** + +After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. + +**Resolution** + +* Update your git remote URL: + ```bash + # Check current remote + git remote get-url origin + + # Update SSH remote + git remote set-url origin git@github.com:OpenHands/OpenHands.git + + # Or update HTTPS remote + git remote set-url origin https://github.com/OpenHands/OpenHands.git + ``` +* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` +* Find and update any hardcoded references: + ```bash + git grep -i "all-hands-ai" + git grep -i "ghcr.io/all-hands-ai" + ``` + + +# COBOL Modernization +Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization + +Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. + + +This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). + + +## The COBOL Modernization Challenge + +[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. + +The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. + +## Overview + +COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. + +OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: + +1. **Understanding**: Analyze and document existing COBOL code +2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# +3. **Validation**: Ensure the modernized code behaves identically to the original + +In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. + +## Understanding + +A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. + +Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: + +``` +For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. + +Save the results in `business_functions.json` in the following format: + +{ + ..., + "COBIL00C.cbl": { + "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", + "references": [ + "docs/billing.md#bill-payment", + "docs/transactions.md#transaction-action" + ], + }, + ... +} +``` + +OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. + +## Translation + +With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. + +One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. + +An example prompt is below: + +``` +Convert the COBOL files in `/src` to Java in `/src/java`. + +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures (see `business_functions.json`) +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices +``` + +Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. + +## Validation + +Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. + +Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. + +## Scaling Up + +The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. + +The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): + +```python +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Migrating agent converts COBOL to Java + migration_conversation.send_message(migration_prompt) + migration_conversation.run() + + # Critiquing agent evaluates the conversion + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse the score and decide whether to continue + current_score = parse_critique_score(critique_file) +``` + +By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: + +``` +Evaluate the quality of the COBOL to Java migration in `/src`. + +For each Java file, assess using the following criteria: +1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? +2. Code Quality: Is the code clean, readable, and following Java 17 conventions? +3. Completeness: Are all COBOL features properly converted? +4. Best Practices: Does it use proper OOP, error handling, and documentation? + +For each instance of a criteria not met, deduct a point. + +Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. + +Save the results in `critique.json` in the following format: + +{ + "total_score": -12, + "files": [ + { + "cobol": "COBIL00C.cbl", + "java": "bill_payment.java", + "scores": { + "correctness": 0, + "code_quality": 0, + "completeness": -1, + "best_practices": -2 + }, + "feedback": [ + "Rename single-letter variables to meaningful names.", + "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", + ], + }, + ... + ] +} +``` + +In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. + +This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. + +## Try It Yourself + +The full iterative refinement example is available in the OpenHands SDK: + +```bash +export LLM_API_KEY="your-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/31_iterative_refinement.py +``` + +For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. + + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + + +# Automated Code Review +Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review + +Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. + +## Overview + +The OpenHands PR Review workflow is a GitHub Actions workflow that: + +- **Triggers automatically** when PRs are opened or when you request a review +- **Analyzes code changes** in the context of your entire repository +- **Posts inline comments** directly on specific lines of code in the PR +- **Provides fast feedback** - typically within 2-3 minutes + +## How It Works + +The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: + +1. **Trigger**: The workflow runs when: + - A new non-draft PR is opened + - A draft PR is marked as ready for review + - The `review-this` label is added to a PR + - `openhands-agent` is requested as a reviewer + +2. **Analysis**: The agent receives the complete PR diff and uses two skills: + - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices + - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API + +3. **Output**: Review comments are posted directly on the PR with: + - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) + - Specific line references + - Actionable suggestions with code examples + +### Review Styles + +Choose between two review styles: + +| Style | Description | Best For | +|-------|-------------|----------| +| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | +| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | + +## Quick Start + + + + Create `.github/workflows/pr-review-by-openhands.yml` in your repository: + + ```yaml + name: PR Review by OpenHands + + on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] + + permissions: + contents: read + pull-requests: write + issues: write + + jobs: + pr-review: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + review-style: standard + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} + ``` + + + + Go to your repository's **Settings → Secrets and variables → Actions** and add: + - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) + + + + Create a `review-this` label in your repository: + 1. Go to **Issues → Labels** + 2. Click **New label** + 3. Name: `review-this` + 4. Description: `Trigger OpenHands PR review` + + + + Open a PR and either: + - Add the `review-this` label, OR + - Request `openhands-agent` as a reviewer + + + +## Composite Action + +The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: + +- Checking out the SDK at the specified version +- Setting up Python and dependencies +- Running the PR review agent +- Uploading logs as artifacts + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | +| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + + +Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. + + +## Customization + +### Repository-Specific Review Guidelines + +Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: + +```markdown +--- +name: code-review +description: Custom code review guidelines for this repository +triggers: +- /codereview +--- + +# Repository Code Review Guidelines + +You are reviewing code for [Your Project Name]. Follow these guidelines: + +## Review Decisions + +### When to APPROVE +- Configuration changes following existing patterns +- Documentation-only changes +- Test-only changes without production code changes +- Simple additions following established conventions + +### When to COMMENT +- Issues that need attention (bugs, security concerns) +- Suggestions for improvement +- Questions about design decisions + +## Core Principles + +1. **[Your Principle 1]**: Description +2. **[Your Principle 2]**: Description + +## What to Check + +- **[Category 1]**: What to look for +- **[Category 2]**: What to look for + +## Repository Conventions + +- Use [your linter] for style checking +- Follow [your style guide] +- Tests should be in [your test directory] +``` + + +The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. + + +### Workflow Configuration + +Customize the workflow by modifying the action inputs: + +```yaml +- name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + # Change the LLM model + llm-model: anthropic/claude-sonnet-4-5-20250929 + # Use a custom LLM endpoint + llm-base-url: https://your-llm-proxy.example.com + # Switch to "roasted" style for brutally honest reviews + review-style: roasted + # Pin to a specific SDK version for stability + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Trigger Customization + +Modify when reviews are triggered by editing the workflow conditions: + +```yaml +# Only trigger on label (disable auto-review on PR open) +if: github.event.label.name == 'review-this' + +# Only trigger when specific reviewer is requested +if: github.event.requested_reviewer.login == 'openhands-agent' + +# Trigger on all PRs (including drafts) +if: | + github.event.action == 'opened' || + github.event.action == 'synchronize' +``` + +## Security Considerations + +The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. + + +**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. + +To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. + + +## Example Reviews + +See real automated reviews in action on the OpenHands Software Agent SDK repository: + +| PR | Description | Review Highlights | +|----|-------------|-------------------| +| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | +| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | +| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | +| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | + +## Troubleshooting + + + + - Ensure the `LLM_API_KEY` secret is set correctly + - Check that the label name matches exactly (`review-this`) + - Verify the workflow file is in `.github/workflows/` + - Check the Actions tab for workflow run errors + + + + - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission + - Check the workflow logs for API errors + - Verify the PR is not from a fork with restricted permissions + + + + - Large PRs may take longer to analyze + - Consider splitting large PRs into smaller ones + - Check if the LLM API is experiencing delays + + + +## Related Resources + +- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews +- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows +- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud +- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills + + +# Dependency Upgrades +Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades + +Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. + +## Overview + +OpenHands helps with dependency management by: + +- **Analyzing dependencies**: Identifying outdated packages and their versions +- **Planning upgrades**: Creating upgrade strategies and migration guides +- **Implementing changes**: Updating code to handle breaking changes +- **Validating results**: Running tests and verifying functionality + +## Dependency Analysis Examples + +### Identifying Outdated Dependencies + +Start by understanding your current dependency state: + +``` +Analyze the dependencies in this project and create a report: + +1. List all direct dependencies with current and latest versions +2. Identify dependencies more than 2 major versions behind +3. Flag any dependencies with known security vulnerabilities +4. Highlight dependencies that are deprecated or unmaintained +5. Prioritize which updates are most important +``` + +**Example output:** + +| Package | Current | Latest | Risk | Priority | +|---------|---------|--------|------|----------| +| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | +| react | 16.8.0 | 18.2.0 | Outdated | Medium | +| express | 4.17.1 | 4.18.2 | Minor update | Low | +| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | + +### Security-Related Dependency Upgrades + +Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: + +- Automating vulnerability detection and remediation +- Integrating with security scanners (Snyk, Dependabot, CodeQL) +- Building automated pipelines for security fixes +- Using OpenHands agents to create pull requests automatically + +### Compatibility Checking + +Check for compatibility issues before upgrading: + +``` +Check compatibility for upgrading React from 16 to 18: + +1. Review our codebase for deprecated React patterns +2. List all components using lifecycle methods +3. Identify usage of string refs or findDOMNode +4. Check third-party library compatibility with React 18 +5. Estimate the effort required for migration +``` + +**Compatibility matrix:** + +| Dependency | React 16 | React 17 | React 18 | Action Needed | +|------------|----------|----------|----------|---------------| +| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | +| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | +| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | + +## Automated Upgrade Examples + +### Version Updates + +Perform straightforward version updates: + + + + ``` + Update all patch and minor versions in package.json: + + 1. Review each update for changelog notes + 2. Update package.json with new versions + 3. Update package-lock.json + 4. Run the test suite + 5. List any deprecation warnings + ``` + + + ``` + Update dependencies in requirements.txt: + + 1. Check each package for updates + 2. Update requirements.txt with compatible versions + 3. Update requirements-dev.txt similarly + 4. Run tests and verify functionality + 5. Note any deprecation warnings + ``` + + + ``` + Update dependencies in pom.xml: + + 1. Check for newer versions of each dependency + 2. Update version numbers in pom.xml + 3. Run mvn dependency:tree to check conflicts + 4. Run the test suite + 5. Document any API changes encountered + ``` + + + +### Breaking Change Handling + +When major versions introduce breaking changes: + +``` +Upgrade axios from v0.x to v1.x and handle breaking changes: + +1. List all breaking changes in axios 1.0 changelog +2. Find all axios usages in our codebase +3. For each breaking change: + - Show current code + - Show updated code + - Explain the change +4. Create a git commit for each logical change +5. Verify all tests pass +``` + +**Example transformation:** + +```javascript +// Before (axios 0.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const response = await axios.get('/users', { + cancelToken: source.token +}); + +// After (axios 1.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const controller = new AbortController(); +const response = await axios.get('/users', { + signal: controller.signal +}); +``` + +### Code Adaptation + +Adapt code to new API patterns: + +``` +Migrate our codebase from moment.js to date-fns: + +1. List all moment.js usages in our code +2. Map moment methods to date-fns equivalents +3. Update imports throughout the codebase +4. Handle any edge cases where APIs differ +5. Remove moment.js from dependencies +6. Verify all date handling still works correctly +``` + +**Migration map:** + +| moment.js | date-fns | Notes | +|-----------|----------|-------| +| `moment()` | `new Date()` | Different return type | +| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | +| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | +| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | + +## Testing and Validation Examples + +### Automated Test Execution + +Run comprehensive tests after upgrades: + +``` +After the dependency upgrades, validate the application: + +1. Run the full test suite (unit, integration, e2e) +2. Check test coverage hasn't decreased +3. Run type checking (if applicable) +4. Run linting with new lint rule versions +5. Build the application for production +6. Report any failures with analysis +``` + +### Integration Testing + +Verify integrations still work: + +``` +Test our integrations after upgrading the AWS SDK: + +1. Test S3 operations (upload, download, list) +2. Test DynamoDB operations (CRUD) +3. Test Lambda invocations +4. Test SQS send/receive +5. Compare behavior to before the upgrade +6. Note any subtle differences +``` + +### Regression Detection + +Detect regressions from upgrades: + +``` +Check for regressions after upgrading the ORM: + +1. Run database operation benchmarks +2. Compare query performance before and after +3. Verify all migrations still work +4. Check for any N+1 queries introduced +5. Validate data integrity in test database +6. Document any behavioral changes +``` + +## Additional Examples + +### Security-Driven Upgrade + +``` +We have a critical security vulnerability in jsonwebtoken. + +Current: jsonwebtoken@8.5.1 +Required: jsonwebtoken@9.0.0 + +Perform the upgrade: +1. Check for breaking changes in v9 +2. Find all usages of jsonwebtoken in our code +3. Update any deprecated methods +4. Update the package version +5. Verify all JWT operations work +6. Run security tests +``` + +### Framework Major Upgrade + +``` +Upgrade our Next.js application from 12 to 14: + +Key areas to address: +1. App Router migration (pages -> app) +2. New metadata API +3. Server Components by default +4. New Image component +5. Route handlers replacing API routes + +For each area: +- Show current implementation +- Show new implementation +- Test the changes +``` + +### Multi-Package Coordinated Upgrade + +``` +Upgrade our React ecosystem packages together: + +Current: +- react: 17.0.2 +- react-dom: 17.0.2 +- react-router-dom: 5.3.0 +- @testing-library/react: 12.1.2 + +Target: +- react: 18.2.0 +- react-dom: 18.2.0 +- react-router-dom: 6.x +- @testing-library/react: 14.x + +Create an upgrade plan that handles all these together, +addressing breaking changes in the correct order. +``` + +## Related Resources + +- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities +- [Security Guide](/sdk/guides/security) - Security best practices for AI agents +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + + +# Incident Triage +Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage + +When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). + + +This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). + + +## Overview + +Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. + +What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. + +OpenHands accelerates incident response by: + +- **Automated error analysis**: AI agents investigate errors and provide detailed reports +- **Root cause identification**: Connect symptoms to underlying issues in your codebase +- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues +- **Integration with monitoring tools**: Work directly with platforms like Datadog + +## Automated Datadog Error Analysis + +The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. + +### How It Works + +[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. + +[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. + +### Triggering Automated Debugging + +The GitHub Actions workflow can be triggered in two ways: + +1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. + +2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. + +### Automated Investigation Process + +When the workflow runs, it automatically performs the following steps: + +1. Get detailed info from the DataDog API +2. Create or find an existing GitHub issue to track the error +3. Clone all relevant repositories to get full code context +4. Run an OpenHands agent to analyze the error and investigate the code +5. Post the findings as a comment on the GitHub issue + +The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. + + +The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. + + +## Setting Up the Workflow + +To set up automated Datadog debugging in your own repository: + +1. Copy the workflow file to `.github/workflows/` in your repository +2. Configure the required secrets (Datadog API keys, LLM API key) +3. Customize the default queries and repository lists for your needs +4. Run the workflow manually or set up scheduled runs + +The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. + +Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. + +## Manual Incident Investigation + +You can also use OpenHands directly to investigate incidents without the automated workflow. + +### Log Analysis + +OpenHands can analyze logs to identify patterns and anomalies: + +``` +Analyze these application logs for the incident that occurred at 14:32 UTC: + +1. Identify the first error or warning that appeared +2. Trace the sequence of events leading to the failure +3. Find any correlated errors across services +4. Identify the user or request that triggered the issue +5. Summarize the timeline of events +``` + +**Log analysis capabilities:** + +| Log Type | Analysis Capabilities | +|----------|----------------------| +| Application logs | Error patterns, exception traces, timing anomalies | +| Access logs | Traffic patterns, slow requests, error responses | +| System logs | Resource exhaustion, process crashes, system errors | +| Database logs | Slow queries, deadlocks, connection issues | + +### Stack Trace Analysis + +Deep dive into stack traces: + +``` +Analyze this stack trace from our production error: + +[paste full stack trace] + +1. Identify the exception type and message +2. Trace back to our code (not framework code) +3. Identify the likely cause +4. Check if this code path has changed recently +5. Suggest a fix +``` + +**Multi-language support:** + + + + ``` + Analyze this Java exception: + + java.lang.OutOfMemoryError: Java heap space + at java.util.Arrays.copyOf(Arrays.java:3210) + at java.util.ArrayList.grow(ArrayList.java:265) + at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) + + Identify: + 1. What operation is consuming memory? + 2. Is there a memory leak or just too much data? + 3. What's the fix? + ``` + + + ``` + Analyze this Python traceback: + + Traceback (most recent call last): + File "app/api/orders.py", line 45, in create_order + order = OrderService.create(data) + File "app/services/order.py", line 89, in create + inventory.reserve(item_id, quantity) + AttributeError: 'NoneType' object has no attribute 'reserve' + + What's None and why? + ``` + + + ``` + Analyze this Node.js error: + + TypeError: Cannot read property 'map' of undefined + at processItems (/app/src/handlers/items.js:23:15) + at async handleRequest (/app/src/api/router.js:45:12) + + What's undefined and how should we handle it? + ``` + + + +### Root Cause Analysis + +Identify the underlying cause of an incident: + +``` +Perform root cause analysis for this incident: + +Symptoms: +- API response times increased 5x at 14:00 +- Error rate jumped from 0.1% to 15% +- Database CPU spiked to 100% + +Available data: +- Application metrics (Grafana dashboard attached) +- Recent deployments: v2.3.1 deployed at 13:45 +- Database slow query log (attached) + +Identify the root cause using the 5 Whys technique. +``` + +## Common Incident Patterns + +OpenHands can recognize and help diagnose these common patterns: + +- **Connection pool exhaustion**: Increasing connection errors followed by complete failure +- **Memory leaks**: Gradual memory increase leading to OOM +- **Cascading failures**: One service failure triggering others +- **Thundering herd**: Simultaneous requests overwhelming a service +- **Split brain**: Inconsistent state across distributed components + +## Quick Fix Generation + +Once the root cause is identified, generate fixes: + +``` +We've identified the root cause: a missing null check in OrderProcessor.java line 156. + +Generate a fix that: +1. Adds proper null checking +2. Logs when null is encountered +3. Returns an appropriate error response +4. Includes a unit test for the edge case +5. Is minimally invasive for a hotfix +``` + +## Best Practices + +### Investigation Checklist + +Use this checklist when investigating: + +1. **Scope the impact** + - How many users affected? + - What functionality is broken? + - What's the business impact? + +2. **Establish timeline** + - When did it start? + - What changed around that time? + - Is it getting worse or stable? + +3. **Gather data** + - Application logs + - Infrastructure metrics + - Recent deployments + - Configuration changes + +4. **Form hypotheses** + - List possible causes + - Rank by likelihood + - Test systematically + +5. **Implement fix** + - Choose safest fix + - Test before deploying + - Monitor after deployment + +### Common Pitfalls + + +Avoid these common incident response mistakes: + +- **Jumping to conclusions**: Gather data before assuming the cause +- **Changing multiple things**: Make one change at a time to isolate effects +- **Not documenting**: Record all actions for the post-mortem +- **Ignoring rollback**: Always have a rollback plan before deploying fixes + + + +For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. + + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + + +# Spark Migrations +Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations + +Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. + +## Overview + +Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. + +Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. + +Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. + +Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: + +1. **Understanding**: Analyze the existing codebase to identify what needs to change and why +2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences +3. **Validation**: Verify that migrated pipelines produce identical results to the originals + +In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. + +## Understanding + +Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. + +Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. + +If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: + +``` +Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. + +Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. + +Then, for each source file, identify + +1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) +2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) +3. Configuration properties that have changed defaults or been renamed +4. Dependencies that need version updates + +Save the results in `migration_inventory.json` in the following format: + +{ + ..., + "src/main/scala/etl/TransformJob.scala": { + "deprecated_apis": [ + {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} + ], + "behavioral_changes": [ + {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} + ], + "config_changes": [], + "risk": "medium" + }, + ... +} +``` + +Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. + +## Migration + +With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. + +To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. + +The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. + +An example prompt is below: + +``` +Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. + +Use `migration_inventory.json` to guide the changes. + +For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. + +Requirements: +1. Replace all deprecated APIs with their Spark 3.0 equivalents +2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) +3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions +4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical +5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists +6. Update import statements for any relocated classes +7. Preserve all existing business logic and output schemas +``` + +Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. + +## Validation + +Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. + +The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. + +An example prompt for data-level comparison: + +``` +Validate the migrated Spark application in `/src` against the original. + +1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` +2. Compare outputs: + - Row counts must match exactly + - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns + - Flag any NULL handling differences +3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments +4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory + +Save the results in `validation_report.json` in the following format: + +{ + "jobs": [ + { + "name": "daily_etl", + "data_match": true, + "row_count": {"v2": 1000000, "v3": 1000000}, + "column_diffs": [], + "performance": { + "duration_seconds": {"v2": 340, "v3": 285}, + "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} + } + }, + ... + ] +} +``` + +Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. + +Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. + +## Beyond Version Upgrades + +While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: + +- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. +- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. + +In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + + +# Vulnerability Remediation +Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation + +Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. + +## The Challenge + +The traditional approach to vulnerability remediation is manual and time-consuming: + +1. Scan repositories for vulnerabilities +2. Review each vulnerability and its impact +3. Research the fix (usually a version upgrade) +4. Update dependency files +5. Test the changes +6. Create pull requests +7. Get reviews and merge + +This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. + +**What if we could automate this entire process using AI agents?** + +## Automated Vulnerability Remediation with OpenHands + +The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. + +OpenHands assists with vulnerability remediation by: + +- **Identifying vulnerabilities**: Analyzing code for common security issues +- **Understanding impact**: Explaining the risk and exploitation potential +- **Implementing fixes**: Generating secure code to address vulnerabilities +- **Validating remediation**: Verifying fixes are effective and complete + +## Two Approaches to Vulnerability Fixing + +### 1. Point to a GitHub Repository + +Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. + +### 2. Upload Security Scanner Reports + +Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. + +This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. + +## Architecture Overview + +A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. + +The key architectural components include: + +- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) +- **WebSocket interface**: Enables real-time status updates on agent actions and operations +- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider +- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud + +This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. + +## Example: Vulnerability Fixer Application + +An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: + +1. User points to a repository or uploads a security scan report +2. Agent analyzes the vulnerabilities +3. Agent creates fixes and pull requests automatically +4. User reviews and merges the changes + +## Security Scanning Integration + +Use OpenHands to analyze security scanner output: + +``` +We ran a security scan and found these issues. Analyze each one: + +1. SQL Injection in src/api/users.py:45 +2. XSS in src/templates/profile.html:23 +3. Hardcoded credential in src/config/database.py:12 +4. Path traversal in src/handlers/files.py:67 + +For each vulnerability: +- Explain what the vulnerability is +- Show how it could be exploited +- Rate the severity (Critical/High/Medium/Low) +- Suggest a fix +``` + +## Common Vulnerability Patterns + +OpenHands can detect these common vulnerability patterns: + +| Vulnerability | Pattern | Example | +|--------------|---------|---------| +| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | +| XSS | Unescaped user input in HTML | `
${user_comment}
` | +| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | +| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | +| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | + +## Automated Remediation + +### Applying Security Patches + +Fix identified vulnerabilities: + + + + ``` + Fix the SQL injection vulnerability in src/api/users.py: + + Current code: + query = f"SELECT * FROM users WHERE id = {user_id}" + cursor.execute(query) + + Requirements: + 1. Use parameterized queries + 2. Add input validation + 3. Maintain the same functionality + 4. Add a test case for the fix + ``` + + **Fixed code:** + ```python + # Using parameterized query + query = "SELECT * FROM users WHERE id = %s" + cursor.execute(query, (user_id,)) + ``` + + + ``` + Fix the XSS vulnerability in src/templates/profile.html: + + Current code: +
${user.bio}
+ + Requirements: + 1. Properly escape user content + 2. Consider Content Security Policy + 3. Handle rich text if needed + 4. Test with malicious input + ``` + + **Fixed code:** + ```html + +
{{ user.bio | escape }}
+ ``` +
+ + ``` + Fix the command injection in src/utils/network.py: + + Current code: + def ping_host(hostname): + os.system(f"ping -c 1 {hostname}") + + Requirements: + 1. Use safe subprocess calls + 2. Validate input format + 3. Avoid shell=True + 4. Handle errors properly + ``` + + **Fixed code:** + ```python + import subprocess + import re + + def ping_host(hostname): + # Validate hostname format + if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): + raise ValueError("Invalid hostname") + + # Use subprocess without shell + result = subprocess.run( + ["ping", "-c", "1", hostname], + capture_output=True, + text=True + ) + return result.returncode == 0 + ``` + +
+ +### Code-Level Vulnerability Fixes + +Fix application-level security issues: + +``` +Fix the broken access control in our API: + +Issue: Users can access other users' data by changing the ID in the URL. + +Current code: +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int): + return db.get_documents(user_id) + +Requirements: +1. Add authorization check +2. Verify requesting user matches or is admin +3. Return 403 for unauthorized access +4. Log access attempts +5. Add tests for authorization +``` + +**Fixed code:** + +```python +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int, current_user: User = Depends(get_current_user)): + # Check authorization + if current_user.id != user_id and not current_user.is_admin: + logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") + raise HTTPException(status_code=403, detail="Not authorized") + + return db.get_documents(user_id) +``` + +## Security Testing + +Test your fixes thoroughly: + +``` +Create security tests for the SQL injection fix: + +1. Test with normal input +2. Test with SQL injection payloads: + - ' OR '1'='1 + - '; DROP TABLE users; -- + - UNION SELECT * FROM passwords +3. Test with special characters +4. Test with null/empty input +5. Verify error handling doesn't leak information +``` + +## Automated Remediation Pipeline + +Create an end-to-end automated pipeline: + +``` +Create an automated vulnerability remediation pipeline: + +1. Parse Snyk/Dependabot/CodeQL alerts +2. Categorize by severity and type +3. For each vulnerability: + - Create a branch + - Apply the fix + - Run tests + - Create a PR with: + - Description of vulnerability + - Fix applied + - Test results +4. Request review from security team +5. Auto-merge low-risk fixes after tests pass +``` + +## Building Your Own Vulnerability Fixer + +The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. + +To build your own vulnerability remediation agent: + +1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent +2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) +3. Configure the agent to create pull requests automatically +4. Set up human review workflows for critical fixes + +As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. + +## Related Resources + +- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example +- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents +- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + + +# Windows Without WSL +Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl + + + This way of running OpenHands is not officially supported. It is maintained by the community and may not work. + + +# Running OpenHands GUI on Windows Without WSL + +This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. + +## Prerequisites + +1. **Windows 10/11** - A modern Windows operating system +2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) +3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet +4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) +5. **Git** - For cloning the repository and version control +6. **Node.js and npm** - For running the frontend + +## Step 1: Install Required Software + +1. **Install Python 3.12 or 3.13** + - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) + - During installation, check "Add Python to PATH" + - Verify installation by opening PowerShell and running: + ```powershell + python --version + ``` + +2. **Install PowerShell 7** + - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) + - Choose the MSI installer appropriate for your system (x64 for most modern computers) + - Run the installer with default options + - Verify installation by opening a new terminal and running: + ```powershell + pwsh --version + ``` + - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors + +3. **Install .NET Core Runtime** + - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose the latest .NET Core Runtime (not SDK) + - Verify installation by opening PowerShell and running: + ```powershell + dotnet --info + ``` + - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. + +4. **Install Git** + - Download Git from [git-scm.com](https://git-scm.com/download/win) + - Use default installation options + - Verify installation: + ```powershell + git --version + ``` + +5. **Install Node.js and npm** + - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) + - During installation, accept the default options which will install npm as well + - Verify installation: + ```powershell + node --version + npm --version + ``` + +6. **Install Poetry** + - Open PowerShell as Administrator and run: + ```powershell + (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - + ``` + - Add Poetry to your PATH: + ```powershell + $env:Path += ";$env:APPDATA\Python\Scripts" + ``` + - Verify installation: + ```powershell + poetry --version + ``` + +## Step 2: Clone and Set Up OpenHands + +1. **Clone the Repository** + ```powershell + git clone https://github.com/OpenHands/OpenHands.git + cd OpenHands + ``` + +2. **Install Dependencies** + ```powershell + poetry install + ``` + + This will install all required dependencies, including: + - pythonnet - Required for Windows PowerShell integration + - All other OpenHands dependencies + +## Step 3: Run OpenHands + +1. **Build the Frontend** + ```powershell + cd frontend + npm install + npm run build + cd .. + ``` + + This will build the frontend files that the backend will serve. + +2. **Start the Backend** + ```powershell + # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell + pwsh + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` + + This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. + + > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. + + > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. + +3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** + ```powershell + cd frontend + npm run dev + ``` + +4. **Access the OpenHands GUI** + + Open your browser and navigate to: + ``` + http://localhost:3000 + ``` + + > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` + +## Installing and Running the CLI + +To install and run the OpenHands CLI on Windows without WSL, follow these steps: + +### 1. Install uv (Python Package Manager) + +Open PowerShell as Administrator and run: + +```powershell +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +``` + +### 2. Install .NET SDK (Required) + +The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: + +```powershell +winget install Microsoft.DotNet.SDK.8 +``` + +Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). + +After installation, restart your PowerShell session to ensure the environment variables are updated. + +### 3. Install and Run OpenHands + +After installing the prerequisites, install OpenHands with: + +```powershell +uv tool install openhands --python 3.12 +``` + +Then run OpenHands: + +```powershell +openhands +``` + +To upgrade OpenHands in the future: + +```powershell +uv tool upgrade openhands --python 3.12 +``` + +### Troubleshooting CLI Issues + +#### CoreCLR Error + +If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: + +1. Install the .NET SDK as described in step 2 above +2. Verify that your system PATH includes the .NET SDK directories +3. Restart your PowerShell session completely after installing the .NET SDK +4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell + +To verify your .NET installation, run: + +```powershell +dotnet --info +``` + +This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. + +If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). + +## Limitations on Windows + +When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: + +1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. + +2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. + +3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. + +4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. + +## Troubleshooting + +### "System.Management.Automation" Not Found Error + +If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. + +> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. + +To resolve this issue: + +1. **Install the latest version of PowerShell 7** from the official Microsoft repository: + - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) + - Download and install the latest MSI package for your system architecture (x64 for most systems) + - During installation, ensure you select the following options: + - "Add PowerShell to PATH environment variable" + - "Register Windows PowerShell 7 as the default shell" + - "Enable PowerShell remoting" + - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default + +2. **Restart your terminal or command prompt** to ensure the new PowerShell is available + +3. **Verify the installation** by running: + ```powershell + pwsh --version + ``` + + You should see output indicating PowerShell 7.x.x + +4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: + ```powershell + pwsh + cd path\to\openhands + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` + + > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". + +5. **If the issue persists**, ensure that you have the .NET Runtime installed: + - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose ".NET Runtime" (not SDK) version 6.0 or later + - After installation, verify it's properly installed by running: + ```powershell + dotnet --info + ``` + - Restart your computer after installation + - Try running OpenHands again + +6. **Ensure that the .NET Framework is properly installed** on your system: + - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off + - Make sure ".NET Framework 4.8 Advanced Services" is enabled + - Click OK and restart if prompted + +This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. + + +# Community +Source: https://docs.openhands.dev/overview/community + +# The OpenHands Community + +OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered world. + +## Mission + +It's very clear that AI is changing software development. We want the developer community to drive that change organically, through open source. + +So we're not just building friendly interfaces for AI-driven development. We're publishing _building blocks_ that empower developers to create new experiences, tailored to your own habits, needs, and imagination. + +## Ethos + +We have two core values: **high openness** and **high agency**. While we don't expect everyone in the community to embody these values, we want to establish them as norms. + +### High Openness + +We welcome anyone and everyone into our community by default. You don't have to be a software developer to help us build. You don't have to be pro-AI to help us learn. + +Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the fruits of our work, but the whole process of growing it. + +We welcome thoughtful criticism, whether it's a comment on a PR or feedback on the community as a whole. + +### High Agency + +Everyone should feel empowered to contribute to OpenHands. Whether it's by making a PR, hosting an event, sharing feedback, or just asking a question, don't hold back! + +OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly and love building new things. + +Coding, development practices, and communities are changing rapidly. We won't hesitate to change direction and make big bets. + +## Relationship to All Hands + +OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.all-hands.dev/). + +All Hands was founded by three of the first major contributors to OpenHands: + +- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards +- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands +- Robert Brennan, a software engineer who architected the user-facing features of OpenHands + +All Hands is an important part of the OpenHands ecosystem. We've raised over $20M—mainly to hire developers and researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/)) + +But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we promise to navigate that conflict thoughtfully and transparently. + +At some point, we may transfer custody of OpenHands to an open source foundation. But for now, the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the "benevolent" part, please: fork us. + + +# Contributing +Source: https://docs.openhands.dev/overview/contributing + +# Contributing to OpenHands + +Welcome to the OpenHands community! We're building the future of AI-powered software development, and we'd love for you to be part of this journey. + +## Our Vision: Free as in Freedom + +The OpenHands community is built around the belief that **AI and AI agents are going to fundamentally change the way we build software**, and if this is true, we should do everything we can to make sure that the benefits provided by such powerful technology are **accessible to everyone**. + +We believe in the power of open source to democratize access to cutting-edge AI technology. Just as the internet transformed how we share information, we envision a world where AI-powered development tools are available to every developer, regardless of their background or resources. + +If this resonates with you, we'd love to have you join us in our quest! + +## What Can You Build? + +There are countless ways to contribute to OpenHands. Whether you're a seasoned developer, a researcher, a designer, or someone just getting started, there's a place for you in our community. + +### Frontend & UI/UX +Make OpenHands more beautiful and user-friendly: +- **React & TypeScript Development** - Improve the web interface +- **UI/UX Design** - Enhance user experience and accessibility +- **Mobile Responsiveness** - Make OpenHands work great on all devices +- **Component Libraries** - Build reusable UI components + +*Small fixes are always welcome! For bigger changes, join our **#eng-ui-ux** channel in [Slack](https://openhands.dev/joinslack) first.* + +### Agent Development +Help make our AI agents smarter and more capable: +- **Prompt Engineering** - Improve how agents understand and respond +- **New Agent Types** - Create specialized agents for different tasks +- **Agent Evaluation** - Develop better ways to measure agent performance +- **Multi-Agent Systems** - Enable agents to work together + +*We use [SWE-bench](https://www.swebench.com/) to evaluate our agents. Join our [Slack](https://openhands.dev/joinslack) to learn more.* + +### Backend & Infrastructure +Build the foundation that powers OpenHands: +- **Python Development** - Core functionality and APIs +- **Runtime Systems** - Docker containers and sandboxes +- **Cloud Integrations** - Support for different cloud providers +- **Performance Optimization** - Make everything faster and more efficient + +### Testing & Quality Assurance +Help us maintain high quality: +- **Unit Testing** - Write tests for new features +- **Integration Testing** - Ensure components work together +- **Bug Hunting** - Find and report issues +- **Performance Testing** - Identify bottlenecks and optimization opportunities + +### Documentation & Education +Help others learn and contribute: +- **Technical Documentation** - API docs, guides, and tutorials +- **Video Tutorials** - Create learning content +- **Translation** - Make OpenHands accessible in more languages +- **Community Support** - Help other users and contributors + +### Research & Innovation +Push the boundaries of what's possible: +- **Academic Research** - Publish papers using OpenHands +- **Benchmarking** - Develop new evaluation methods +- **Experimental Features** - Try cutting-edge AI techniques +- **Data Analysis** - Study how developers use AI tools + +## 🚀 Getting Started + +Ready to contribute? Here's your path to making an impact: + +### 1. Quick Wins +Start with these easy contributions: +- **Use OpenHands** and [report issues](https://github.com/OpenHands/OpenHands/issues) you encounter +- **Give feedback** using the thumbs-up/thumbs-down buttons after each session +- **Star our repository** on [GitHub](https://github.com/OpenHands/OpenHands) +- **Share OpenHands** with other developers + +### 2. Set Up Your Development Environment +Follow our setup guide: +- **Requirements**: Linux/Mac/WSL, Docker, Python 3.12, Node.js 22+, Poetry 1.8+ +- **Quick setup**: `make build` to get everything ready +- **Configuration**: `make setup-config` to configure your LLM +- **Run locally**: `make run` to start the application + +*Full details in our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md)* + +### 3. Find Your First Issue +Look for beginner-friendly opportunities: +- Browse [good first issues](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) +- Check our [project boards](https://github.com/OpenHands/OpenHands/projects) for organized tasks +- Ask in [Slack](https://openhands.dev/joinslack) what needs help + +### 4. Join the Community +Connect with other contributors in our [Slack Community](https://openhands.dev/joinslack). You can connect with OpenHands contributors, maintainers, and more! + +## 📋 How to Contribute Code + +### Understanding the Codebase +Get familiar with our architecture: +- **[Frontend](https://github.com/OpenHands/OpenHands/tree/main/frontend/README.md)** - React application +- **[Backend](https://github.com/OpenHands/OpenHands/tree/main/openhands/README.md)** - Python core +- **[Agents](https://github.com/OpenHands/OpenHands/tree/main/openhands/agenthub/README.md)** - AI agent implementations +- **[Runtime](https://github.com/OpenHands/OpenHands/tree/main/openhands/runtime/README.md)** - Execution environments +- **[Evaluation](https://github.com/OpenHands/benchmarks)** - Testing and benchmarks + +### Pull Request Process +We welcome all pull requests! Here's how we evaluate them: + +#### Small Improvements +- Quick review and approval for obvious improvements +- Make sure CI tests pass +- Include clear description of changes + +#### Core Agent Changes +We're more careful with agent changes since they affect user experience: +- **Accuracy** - Does it make the agent better at solving problems? +- **Efficiency** - Does it improve speed or reduce resource usage? +- **Code Quality** - Is the code maintainable and well-tested? + +*Discuss major changes in [GitHub issues](https://github.com/OpenHands/OpenHands/issues) or [Slack](https://openhands.dev/joinslack) first!* + +### Pull Request Guidelines +We recommend the following for smooth reviews but they're not required. Just know that the more you follow these guidelines, the more likely you'll get your PR reviewed faster and reduce the quantity of revisions. + +**Title Format:** +- `feat: Add new agent capability` +- `fix: Resolve memory leak in runtime` +- `docs: Update installation guide` +- `style: Fix code formatting` +- `refactor: Simplify authentication logic` +- `test: Add unit tests for parser` + +**Description:** +- Explain what the PR does and why +- Link to related issues +- Include screenshots for UI changes +- Add changelog entry for user-facing changes + +## License + +OpenHands is released under the **MIT License**, which means: + +### You Can: +- **Use** OpenHands for any purpose, including commercial projects +- **Modify** the code to fit your needs +- **Share** your modifications +- **Distribute** or sell copies of OpenHands + +### You Must: +- **Include** the original copyright notice and license text +- **Preserve** the license in any substantial portions you use + +### No Warranty: +- OpenHands is provided "as is" without warranty +- Contributors are not liable for any damages + +*Full license text: [LICENSE](https://github.com/OpenHands/OpenHands/blob/main/LICENSE)* + +**Special Note:** Content in the `enterprise/` directory has a separate license. See `enterprise/LICENSE` for details. + +## Ready to make your first contribution? + +1. **⭐ Star** our [GitHub repository](https://github.com/OpenHands/OpenHands) +2. **🔧 Set up** your development environment using our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md) +3. **💬 Join** our [Slack community](https://openhands.dev/joinslack) to meet other contributors +4. **🎯 Find** a [good first issue](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) to work on +5. **📝 Read** our [Code of Conduct](https://github.com/OpenHands/OpenHands/blob/main/CODE_OF_CONDUCT.md) + +## Need Help? + +Don't hesitate to ask for help: +- **Slack**: [Join our community](https://openhands.dev/joinslack) for real-time support +- **GitHub Issues**: [Open an issue](https://github.com/OpenHands/OpenHands/issues) for bugs or feature requests +- **Email**: Contact us at [contact@openhands.dev](mailto:contact@openhands.dev) + +--- + +Thank you for considering contributing to OpenHands! Together, we're building tools that will democratize AI-powered software development and make it accessible to developers everywhere. Every contribution, no matter how small, helps us move closer to that vision. + +Welcome to the community! 🎉 + + +# FAQs +Source: https://docs.openhands.dev/overview/faqs + +## Getting Started + +### I'm new to OpenHands. Where should I start? + +1. **Quick start**: Use [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) to get started quickly with + [GitHub](/openhands/usage/cloud/github-installation), [GitLab](/openhands/usage/cloud/gitlab-installation), + [Bitbucket](/openhands/usage/cloud/bitbucket-installation), + and [Slack](/openhands/usage/cloud/slack-installation) integrations. +2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/openhands/usage/run-openhands/local-setup). +3. **First steps**: Read over the [first projects guidelines](/overview/first-projects) and + [prompting best practices](/openhands/usage/tips/prompting-best-practices) to learn the basics. + +### Can I use OpenHands for production workloads? + +OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant +deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability. + +If you're interested in running OpenHands in a multi-tenant environment, please [contact us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) about our enterprise deployment options. + + +Using OpenHands for work? We'd love to chat! Fill out +[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) +to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide +input on our product roadmap. + + +## Safety and Security + +### It's doing stuff without asking, is that safe? + +**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container +(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration: + +**What's protected:** +- Your host system files and programs (unless you mount them using [this feature](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)) +- Host system resources +- Other containers and processes + +**Potential risks to consider:** +- The agent can access the internet from within the container. +- If you provide credentials (API keys, tokens), the agent can use them. +- Mounted files and directories can be modified or deleted. +- Network requests can be made to external services. + +For detailed security information, see our [Runtime Architecture](/openhands/usage/architecture/runtime), +[Security Configuration](/openhands/usage/advanced/configuration-options#security-configuration), +and [Hardened Docker Installation](/openhands/usage/sandboxes/docker#hardened-docker-installation) documentation. + +## File Storage and Access + +### Where are my files stored? + +Your files are stored in different locations depending on how you've configured OpenHands: + +**Default behavior (no file mounting):** +- Files created by the agent are stored inside the runtime Docker container. +- These files are temporary and will be lost when the container is removed. +- The agent works in the `/workspace` directory inside the runtime container. + +**When you mount your local filesystem (following [this](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)):** +- Your local files are mounted into the container's `/workspace` directory. +- Changes made by the agent are reflected in your local filesystem. +- Files persist after the container is stopped. + + +Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory. + + +## Development Tools and Environment + +### How do I get the dev tools I need? + +OpenHands comes with a basic runtime environment that includes Python and Node.js. +It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment. + +If you would like to set things up more systematically, you can: +- **Use setup.sh**: Add a [setup.sh file](/openhands/usage/customization/repository#setup-script) file to + your repository, which will be run every time the agent starts. +- **Use a custom sandbox**: Use a [custom docker image](/openhands/usage/advanced/custom-sandbox-guide) to initialize the sandbox. + +### Something's not working. Where can I get help? + +1. **Search existing issues**: Check our [GitHub issues](https://github.com/OpenHands/OpenHands/issues) to see if + others have encountered the same problem. +2. **Join our community**: Get help from other users and developers: + - [Slack community](https://openhands.dev/joinslack) +3. **Check our troubleshooting guide**: Common issues and solutions are documented in + [Troubleshooting](/openhands/usage/troubleshooting/troubleshooting). +4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/OpenHands/OpenHands/issues/new) + and fill in as much detail as possible. + + +# First Projects +Source: https://docs.openhands.dev/overview/first-projects + +Like any tool, it works best when you know how to use it effectively. Whether you're experimenting with a small +script or making changes in a large codebase, this guide will show how to apply OpenHands in different scenarios. + +Let’s walk through a natural progression of using OpenHands: +- Try a simple prompt. +- Build a project from scratch. +- Add features to existing code. +- Refactor code. +- Debug and fix bugs. + +## First Steps: Hello World + +Start with a small task to get familiar with how OpenHands responds to prompts. + +Click `New Conversation` and try prompting: +> Write a bash script hello.sh that prints "hello world!" + +OpenHands will generate script, set the correct permissions, and even run it for you. + +Now try making small changes: + +> Modify hello.sh so that it accepts a name as the first argument, but defaults to "world". + +You can experiment in any language. For example: + +> Convert hello.sh to a Ruby script, and run it. + + + Start small and iterate. This helps you understand how OpenHands interprets and responds to different prompts. + + +## Build Something from Scratch + +Agents excel at "greenfield" tasks, where they don’t need context about existing code. +Begin with a simple task and iterate from there. Be specific about what you want and the tech stack. + +Click `New Conversation` and give it a clear goal: + +> Build a frontend-only TODO app in React. All state should be stored in localStorage. + +Once the basics are working, build on it just like you would in a real project: + +> Allow adding an optional due date to each task. + +You can also ask OpenHands to help with version control: + +> Commit the changes and push them to a new branch called "feature/due-dates". + + + Break your goals into small, manageable tasks.. Keep pushing your changes often. This makes it easier to recover + if something goes off track. + + +## Expand Existing Code + +Want to add new functionality to an existing repo? OpenHands can do that too. + + +If you're running OpenHands on your own, first add a +[GitHub token](/openhands/usage/settings/integrations-settings#github-setup), +[GitLab token](/openhands/usage/settings/integrations-settings#gitlab-setup) or +[Bitbucket token](/openhands/usage/settings/integrations-settings#bitbucket-setup). + + +Choose your repository and branch via `Open Repository`, and press `Launch`. + +Examples of adding new functionality: + +> Add a GitHub action that lints the code in this repository. + +> Modify ./backend/api/routes.js to add a new route that returns a list of all tasks. + +> Add a new React component to the ./frontend/components directory to display a list of Widgets. +> It should use the existing Widget component. + + + OpenHands can explore the codebase, but giving it context upfront makes it faster and less expensive. + + +## Refactor Code + +OpenHands does great at refactoring code in small chunks. Rather than rearchitecting the entire codebase, it's more +effective in focused refactoring tasks. Start by launching a conversation with +your repo and branch. Then guide it: + +> Rename all the single-letter variables in ./app.go. + +> Split the `build_and_deploy_widgets` function into two functions, `build_widgets` and `deploy_widgets` in widget.php. + +> Break ./api/routes.js into separate files for each route. + + + Focus on small, meaningful improvements instead of full rewrites. + + +## Debug and Fix Bugs + +OpenHands can help debug and fix issues, but it’s most effective when you’ve narrowed things down. + +Give it a clear description of the problem and the file(s) involved: + +> The email field in the `/subscribe` endpoint is rejecting .io domains. Fix this. + +> The `search_widgets` function in ./app.py is doing a case-sensitive search. Make it case-insensitive. + +For bug fixing, test-driven development can be really useful. You can ask OpenHands to write a new test and iterate +until the bug is fixed: + +> The `hello` function crashes on the empty string. Write a test that reproduces this bug, then fix the code so it passes. + + + Be as specific as possible. Include expected behavior, file names, and examples to speed things up. + + +## Using OpenHands Effectively + +OpenHands can assist with nearly any coding task, but it takes some practice to get the best results. +Keep these tips in mind: +* Keep your tasks small. +* Be clear and specific. +* Provide relevant context. +* Commit and push frequently. + +See [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) for more tips on how to get the most +out of OpenHands. + + +# Introduction +Source: https://docs.openhands.dev/overview/introduction + +🙌 Welcome to OpenHands, a [community](/overview/community) focused on AI-driven development. We'd love for you to [join us on Slack](https://openhands.dev/joinslack). + +There are a few ways to work with OpenHands: + +## OpenHands Software Agent SDK +The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below. + +Define agents in code, then run them locally, or scale to 1000s of agents in the cloud + +[Check out the docs](https://docs.openhands.dev/sdk) or [view the source](https://github.com/All-Hands-AI/agent-sdk/) + +## OpenHands CLI +The CLI is the easiest way to start using OpenHands. The experience will be familiar to anyone who has worked +with e.g. Claude Code or Codex. You can power it with Claude, GPT, or any other LLM. + +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/cli-mode) or [view the source](https://github.com/OpenHands/OpenHands-CLI) + +## OpenHands Local GUI +Use the Local GUI for running agents on your laptop. It comes with a REST API and a single-page React application. +The experience will be familiar to anyone who has used Devin or Jules. + +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup) or view the source in this repo. + +## OpenHands Cloud +This is a commercial deployment of OpenHands GUI, running on hosted infrastructure. + +You can try it with a free by [signing in with your GitHub account](https://app.all-hands.dev). + +OpenHands Cloud comes with source-available features and integrations: +- Deeper integrations with GitHub, GitLab, and Bitbucket +- Integrations with Slack, Jira, and Linear +- Multi-user support +- RBAC and permissions +- Collaboration features (e.g., conversation sharing) +- Usage reporting +- Budgeting enforcement + +## OpenHands Enterprise +Large enterprises can work with us to self-host OpenHands Cloud in their own VPC, via Kubernetes. +OpenHands Enterprise can also work with the CLI and SDK above. + +OpenHands Enterprise is source-available--you can see all the source code here in the enterprise/ directory, +but you'll need to purchase a license if you want to run it for more than one month. + +Enterprise contracts also come with extended support and access to our research team. + +Learn more at [openhands.dev/enterprise](https://openhands.dev/enterprise) + +## Everything Else + +Check out our [Product Roadmap](https://github.com/orgs/openhands/projects/1), and feel free to +[open up an issue](https://github.com/OpenHands/OpenHands/issues) if there's something you'd like to see! + +You might also be interested in our [evaluation infrastructure](https://github.com/OpenHands/benchmarks), our [chrome extension](https://github.com/OpenHands/openhands-chrome-extension/), or our [Theory-of-Mind module](https://github.com/OpenHands/ToM-SWE). + +All our work is available under the MIT license, except for the `enterprise/` directory in this repository (see the [enterprise license](https://github.com/OpenHands/OpenHands/blob/main/enterprise/LICENSE) for details). +The core `openhands` and `agent-server` Docker images are fully MIT-licensed as well. + +If you need help with anything, or just want to chat, [come find us on Slack](https://openhands.dev/joinslack). + + +# Model Context Protocol (MCP) +Source: https://docs.openhands.dev/overview/model-context-protocol + +Model Context Protocol (MCP) is an open standard that allows OpenHands to communicate with external tool servers, extending the agent's capabilities with custom tools, specialized data processing, external API access, and more. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). + +## How MCP Works + +When OpenHands starts, it: + +1. Reads the MCP configuration +2. Connects to configured servers (SSE, SHTTP, or stdio) +3. Registers tools provided by these servers with the agent +4. Routes tool calls to appropriate MCP servers during execution + +## MCP Support Matrix + +| Platform | Support Level | Configuration Method | Documentation | +|----------|---------------|---------------------|---------------| +| **CLI** | ✅ Full Support | `~/.openhands/mcp.json` file | [CLI MCP Servers](/openhands/usage/cli/mcp-servers) | +| **SDK** | ✅ Full Support | Programmatic configuration | [SDK MCP Guide](/sdk/guides/mcp) | +| **Local GUI** | ✅ Full Support | Settings UI + config files | [Local GUI](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI settings | [Cloud GUI](/openhands/usage/cloud/cloud-ui) | + +## Platform-Specific Differences + + + + - Configuration via `~/.openhands/mcp.json` file + - Real-time status monitoring with `/mcp` command + - Supports all MCP transport protocols (SSE, SHTTP, stdio) + - Manual configuration required + + + - Programmatic configuration in code + - Full control over MCP server lifecycle + - Dynamic server registration and management + - Integration with custom tool systems + + + - Visual configuration through Settings UI + - File-based configuration backup + - Real-time server status display + - Supports all transport protocols + + + - Cloud-based configuration management + - Managed MCP server hosting options + - Team-wide configuration sharing + - Enterprise security features + + + +## Getting Started with MCP + +- **For detailed configuration**: See [MCP Settings](/openhands/usage/settings/mcp-settings) +- **For SDK integration**: See [SDK MCP Guide](/sdk/guides/mcp) +- **For architecture details**: See [MCP Architecture](/sdk/arch/mcp) + + +# Quick Start +Source: https://docs.openhands.dev/overview/quickstart + +Get started with OpenHands in minutes. Choose the option that works best for you. + + + + **Recommended** + + The fastest way to get started. No setup required—just sign in and start coding. + + - Free usage of MiniMax M2.5 for a limited time + - No installation needed + - Managed infrastructure + + + Use OpenHands from your terminal. Perfect for automation and scripting. + + - IDE integrations available + - Headless mode for CI/CD + - Lightweight installation + + + Run OpenHands locally with a web-based interface. Bring your own LLM and API key. + + - Full control over your environment + - Works offline + - Docker-based setup + + + + +# Overview +Source: https://docs.openhands.dev/overview/skills + +Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. They provide consistent practices across projects and can be triggered automatically based on keywords or context. + + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers for automatic activation. See the [SDK Skills Guide](/sdk/guides/skill) for details on the SKILL.md format. + + +## Official Skill Registry + +The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands agents. You can browse available skills, contribute your own, and learn from examples created by the community. + +## How Skills Work + +Skills inject additional context and rules into the agent's behavior. + +At a high level, OpenHands supports two loading models: + +- **Always-on context** (e.g., `AGENTS.md`) that is injected into the system prompt at conversation start. +- **On-demand skills** that are either: + - **triggered by the user** (keyword matches), or + - **invoked by the agent** (the agent decides to look up the full skill content). + +## Permanent agent context (recommended) + +For repository-wide, always-on instructions, prefer a root-level `AGENTS.md` file. + +We also support model-specific variants: +- `GEMINI.md` for Gemini +- `CLAUDE.md` for Claude + +## Triggered and optional skills + +To add optional skills that are loaded on demand: + +- **AgentSkills standard (recommended for progressive disclosure)**: create one directory per skill and add a `SKILL.md` file. +- **Legacy/OpenHands format (simple)**: put markdown files in `.agents/skills/*.md` at the repository root. + + +Loaded skills take up space in the context window. On-demand skills help keep the system prompt smaller because the agent sees a summary first and reads the full content only when needed. + + +### Example Repository Structure + +``` +some-repository/ +├── AGENTS.md # Permanent repository guidelines (recommended) +└── .agents/ + └── skills/ + ├── rot13-encryption/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ ├── scripts/ + │ │ └── rot13.sh + │ └── references/ + │ └── README.md + ├── another-agentskill/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ └── scripts/ + │ └── placeholder.sh + └── legacy_trigger_this.md # Legacy/OpenHands format (keyword-triggered) +``` + +## Skill Loading Precedence + +For project location, paths are relative to the repository root; `.agents/skills/` is a subdirectory of the project directory. +For user home location, paths are relative to the user home: `~/` + +When multiple skills share the same name, OpenHands keeps the first match in this order: + +1. `.agents/skills/` (recommended) +2. `.openhands/skills/` (deprecated) +3. `.openhands/microagents/` (deprecated) + +Project-specific skills take precedence over user skills. + +## Skill Types + +Currently supported skill types: + +- **[Permanent Context](/overview/skills/repo)**: Repository-wide guidelines and best practices. We recommend `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`). +- **[Keyword-Triggered Skills](/overview/skills/keyword)**: Guidelines activated by specific keywords in user prompts. +- **[Organization Skills](/overview/skills/org)**: Team or organization-wide standards. +- **[Global Skills](/overview/skills/public)**: Community-shared skills and templates. + +### Skills Frontmatter Requirements + +Each skill file may include frontmatter that provides additional information. In some cases, this frontmatter is required: + +| Skill Type | Required | +|-------------|----------| +| General Skills | No | +| Keyword-Triggered Skills | Yes | + +## Skills Support Matrix + +| Platform | Support Level | Configuration Method | Implementation | Documentation | +|----------|---------------|---------------------|----------------|---------------| +| **CLI** | ✅ Full Support | `~/.agents/skills/` (user-level) and `.agents/skills/` (repo-level) | File-based markdown | [Skills Overview](/overview/skills) | +| **SDK** | ✅ Full Support | Programmatic `Skill` objects | Code-based configuration | [SDK Skills Guide](/sdk/guides/skill) | +| **Local GUI** | ✅ Full Support | `.agents/skills/` + UI | File-based with UI management | [Local Setup](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI + repository integration | Managed skill library | [Cloud UI](/openhands/usage/cloud/cloud-ui) | + +## Platform-Specific Differences + + + + - File-based configuration in two locations: + - `~/.agents/skills/` - User-level skills (all conversations). + - `.agents/skills/` - Repository-level skills (current directory) + - Markdown format for skill definitions + - Manual file management required + - Supports both general and keyword-triggered skills + + + - Programmatic `Skill` objects in code + - Dynamic skill creation and management + - Integration with custom workflows + - Full control over skill lifecycle + + + - Visual skill management through UI + - File-based storage with GUI editing + - Real-time skill status display + - Drag-and-drop skill organization + + + - Cloud-based skill library management + - Team-wide skill sharing and templates + - Organization-level skill policies + - Integrated skill marketplace + + + +## Learn More + +- **For SDK integration**: See [SDK Skills Guide](/sdk/guides/skill) +- **For architecture details**: See [Skills Architecture](/sdk/arch/skill) +- **For specific skill types**: See [Repository Skills](/overview/skills/repo), [Keyword Skills](/overview/skills/keyword), [Organization Skills](/overview/skills/org), and [Global Skills](/overview/skills/public) + + +# Keyword-Triggered Skills +Source: https://docs.openhands.dev/overview/skills/keyword + +## Usage + +These skills are only loaded when a prompt includes one of the trigger words. + +## Frontmatter Syntax + +Frontmatter is required for keyword-triggered skills. It must be placed at the top of the file, +above the guidelines. + +Enclose the frontmatter in triple dashes (---) and include the following fields: + +| Field | Description | Required | Default | +|------------|--------------------------------------------------|----------|------------------| +| `triggers` | A list of keywords that activate the skill. | Yes | None | + + +## Example + +Keyword-triggered skill file example located at `.agents/skills/yummy.md`: +``` +--- +triggers: +- yummyhappy +- happyyummy +--- + +The user has said the magic word. Respond with "That was delicious!" +``` + +[See examples of keyword-triggered skills in the official OpenHands Skills Registry](https://github.com/OpenHands/extensions) + + +# Organization and User Skills +Source: https://docs.openhands.dev/overview/skills/org + +## Usage + +These skills can be [any type of skill](/overview/skills#skill-types) and will be loaded +accordingly. However, they are applied to all repositories belonging to the organization or user. + +Add a `.agents` repository under the organization or user and create a `skills` directory and place the +skills in that directory. + +For GitLab organizations, use `openhands-config` as the repository name instead of `.agents`, since GitLab doesn't support repository names starting with non-alphanumeric characters. + +## Example + +General skill file example for organization `Great-Co` located inside the `.agents` repository: +`skills/org-skill.md`: +``` +* Use type hints and error boundaries; validate inputs at system boundaries and fail with meaningful error messages. +* Document interfaces and public APIs; use implementation comments only for non-obvious logic. +* Follow the same naming convention for variables, classes, constants, etc. already used in each repository. +``` + +For GitLab organizations, the same skill would be located inside the `openhands-config` repository. + +## User Skills When Running Openhands on Your Own + + + This works with CLI, headless and development modes. It does not work out of the box when running OpenHands using the docker command. + + +When running OpenHands on your own, you can place skills in the `~/.agents/skills` folder on your local +system and OpenHands will always load it for all your conversations. Repo-level overrides live in `.agents/skills`. + + +# Global Skills +Source: https://docs.openhands.dev/overview/skills/public + +## Global Skill Registry + +The official global skill registry is hosted at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands users. + +## Contributing a Global Skill + +You can create global skills and share with the community by opening a pull request to the official skill registry. + +See the [OpenHands Skill Registry](https://github.com/OpenHands/extensions) for specific instructions on how to contribute a global skill. + +### Global Skills Best Practices + +- **Clear Scope**: Keep the skill focused on a specific domain or task. +- **Explicit Instructions**: Provide clear, unambiguous guidelines. +- **Useful Examples**: Include practical examples of common use cases. +- **Safety First**: Include necessary warnings and constraints. +- **Integration Awareness**: Consider how the skill interacts with other components. + +### Steps to Contribute a Global Skill + +#### 1. Plan the Global Skill + +Before creating a global skill, consider: + +- What specific problem or use case will it address? +- What unique capabilities or knowledge should it have? +- What trigger words make sense for activating it? +- What constraints or guidelines should it follow? + +#### 2. Create File + +Create a new Markdown file with a descriptive name in the official skill registry: +[github.com/OpenHands/extensions](https://github.com/OpenHands/extensions) + +#### 3. Testing the Global Skill + +- Test the agent with various prompts. +- Verify trigger words activate the agent correctly. +- Ensure instructions are clear and comprehensive. +- Check for potential conflicts and overlaps with existing agents. + +#### 4. Submission Process + +Submit a pull request with: + +- The new skill file. +- Updated documentation if needed. +- Description of the agent's purpose and capabilities. + + +# General Skills +Source: https://docs.openhands.dev/overview/skills/repo + +## Usage + +These skills are always loaded as part of the context. + +## Frontmatter Syntax + +The frontmatter for this type of skill is optional. + +Frontmatter should be enclosed in triple dashes (---) and may include the following fields: + +| Field | Description | Required | Default | +|-----------|-----------------------------------------|----------|----------------| +| `agent` | The agent this skill applies to | No | 'CodeActAgent' | + +## Creating a Repository Agent + +To create an effective repository agent, you can ask OpenHands to analyze your repository with a prompt like: + +``` +Please browse the repository, look at the documentation and relevant code, and understand the purpose of this repository. + +Specifically, I want you to create an `AGENTS.md` file at the repository root. This file should contain succinct information that summarizes: +1. The purpose of this repository +2. The general setup of this repo +3. A brief description of the structure of this repo + +Read all the GitHub workflows under .github/ of the repository (if this folder exists) to understand the CI checks (e.g., linter, pre-commit), and include those in the `AGENTS.md` file. +``` + +This approach helps OpenHands capture repository context efficiently, reducing the need for repeated searches during conversations and ensuring more accurate solutions. + +## Example Content + +An `AGENTS.md` file should include: + +``` +# Repository Purpose +This project is a TODO application that allows users to track TODO items. + +# Setup Instructions +To set it up, you can run `npm run build`. + +# Repository Structure +- `/src`: Core application code +- `/tests`: Test suite +- `/docs`: Documentation +- `/.github`: CI/CD workflows + +# CI/CD Workflows +- `lint.yml`: Runs ESLint on all JavaScript files +- `test.yml`: Runs the test suite on pull requests + +# Development Guidelines +Always make sure the tests are passing before committing changes. You can run the tests by running `npm run test`. +``` + +[See more examples of general skills at OpenHands Skills registry.](https://github.com/OpenHands/extensions) + + +# Software Agent SDK +Source: https://docs.openhands.dev/sdk + +The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. + +You can use the OpenHands Software Agent SDK for: + +- One-off tasks, like building a README for your repo +- Routine maintenance tasks, like updating dependencies +- Major tasks that involve multiple agents, like refactors and rewrites + +You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + +Get started with some examples or keep reading to learn more. + +## Features + + + + A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. + + + Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. + + + A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. + + + +## Why OpenHands Software Agent SDK? + +### Emphasis on coding + +While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. + +While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. + +### State-of-the-Art Performance + +OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: + +- Task planning and decomposition +- Automatic context compression +- Security analysis +- Strong agent-computer interfaces + +OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. + +### Free and Open Source + +OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. + +Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! + +## Get Started + + + + Install the SDK, run your first agent, and explore the guides. + + + +## Learn the SDK + + + + Understand the SDK's architecture: agents, tools, workspaces, and more. + + + Explore the complete SDK API and source code. + + + +## Build with Examples + + + + Build local agents with custom tools and capabilities. + + + Run agents on remote servers with Docker sandboxing. + + + Automate repository tasks with agent-powered workflows. + + + +## Community + + + + Connect with the OpenHands community on Slack. + + + Contribute to the SDK or report issues on GitHub. + + + + +# openhands.sdk.agent +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent + +### class Agent + +Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) + +Main agent implementation for OpenHands. + +The Agent class provides the core functionality for running AI agents that can +interact with tools, process messages, and execute actions. It inherits from +AgentBase and implements the agent execution logic. Critic-related functionality +is provided by CriticMixin. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Tool +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] +>>> agent = Agent(llm=llm, tools=tools) +``` + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### init_state() + +Initialize conversation state. + +Invariants enforced by this method: +- If a SystemPromptEvent is already present, it must be within the first 3 + + events (index 0 or 1 in practice; index 2 is included in the scan window + to detect a user message appearing before the system prompt). +- A user MessageEvent should not appear before the SystemPromptEvent. + +These invariants keep event ordering predictable for downstream components +(condenser, UI, etc.) and also prevent accidentally materializing the full +event history during initialization. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +### class AgentBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for OpenHands agents. + +Agents are stateless and should be fully defined by their configuration. +This base class provides the common interface and functionality that all +agent implementations must follow. + + +#### Properties + +- `agent_context`: AgentContext | None +- `condenser`: CondenserBase | None +- `critic`: CriticBase | None +- `dynamic_context`: str | None + Get the dynamic per-conversation context. + This returns the context that varies between conversations, such as: + - Repository information and skills + - Runtime information (hosts, working directory) + - User-specific secrets and settings + - Conversation instructions + This content should NOT be included in the cached system prompt to enable + cross-conversation cache sharing. Instead, it is sent as a second content + block (without a cache marker) inside the system message. + * Returns: + The dynamic context string, or None if no context is configured. +- `filter_tools_regex`: str | None +- `include_default_tools`: list[str] +- `llm`: LLM +- `mcp_config`: dict[str, Any] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str + Returns the name of the Agent. +- `prompt_dir`: str + Returns the directory where this class’s module file is located. +- `security_policy_filename`: str +- `static_system_message`: str + Compute the static portion of the system message. + This returns only the base system prompt template without any dynamic + per-conversation context. This static portion can be cached and reused + across conversations for better prompt caching efficiency. + * Returns: + The rendered system prompt template without dynamic context. +- `system_message`: str + Return the combined system message (static + dynamic). +- `system_prompt_filename`: str +- `system_prompt_kwargs`: dict[str, object] +- `tools`: list[Tool] +- `tools_map`: dictstr, [ToolDefinition] + Get the initialized tools map. + :raises RuntimeError: If the agent has not been initialized. + +#### Methods + +#### get_all_llms() + +Recursively yield unique base-class LLM objects reachable from self. + +- Returns actual object references (not copies). +- De-dupes by id(LLM). +- Cycle-safe via a visited set for all traversed objects. +- Only yields objects whose type is exactly LLM (no subclasses). +- Does not handle dataclasses. + +#### init_state() + +Initialize the empty conversation state to prepare the agent for user +messages. + +Typically this involves adding system message + +NOTE: state will be mutated in-place. + +#### model_dump_succint() + +Like model_dump, but excludes None fields by default. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +#### Deprecated +Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and +[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation +enables cross-conversation prompt caching. Will be removed in 1.16.0. + +#### WARNING +Using this property DISABLES cross-conversation prompt caching because +it combines static and dynamic content into a single string. Use +[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately +to enable caching. + +#### Deprecated +Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. + +#### verify() + +Verify that we can resume this agent from persisted state. + +We do not merge configuration between persisted and runtime Agent +instances. Instead, we verify compatibility requirements and then +continue with the runtime-provided Agent. + +Compatibility requirements: +- Agent class/type must match. +- Tools must match exactly (same tool names). + +Tools are part of the system prompt and cannot be changed mid-conversation. +To use different tools, start a new conversation or use conversation forking +(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). + +All other configuration (LLM, agent_context, condenser, etc.) can be +freely changed between sessions. + +* Parameters: + * `persisted` – The agent loaded from persisted state. + * `events` – Unused, kept for API compatibility. +* Returns: + This runtime agent (self) if verification passes. +* Raises: + `ValueError` – If agent class or tools don’t match. + + +# openhands.sdk.conversation +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation + +### class BaseConversation + +Bases: `ABC` + +Abstract base class for conversation implementations. + +This class defines the interface that all conversation implementations must follow. +Conversations manage the interaction between users and agents, handling message +exchange, execution control, and state management. + + +#### Properties + +- `confirmation_policy_active`: bool +- `conversation_stats`: ConversationStats +- `id`: UUID +- `is_confirmation_mode_active`: bool + Check if confirmation mode is active. + Returns True if BOTH conditions are met: + 1. The conversation state has a security analyzer set (not None) + 2. The confirmation policy is active +- `state`: ConversationStateProtocol + +#### Methods + +#### __init__() + +Initialize the base conversation with span tracking. + +#### abstractmethod ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### abstractmethod close() + +#### static compose_callbacks() + +Compose multiple callbacks into a single callback function. + +* Parameters: + `callbacks` – An iterable of callback functions +* Returns: + A single callback function that calls all provided callbacks + +#### abstractmethod condense() + +Force condensation of the conversation history. + +This method uses the existing condensation request pattern to trigger +condensation. It adds a CondensationRequest event to the conversation +and forces the agent to take a single step to process it. + +The condensation will be applied immediately and will modify the conversation +state by adding a condensation event to the history. + +* Raises: + `ValueError` – If no condenser is configured or the condenser doesn’t + handle condensation requests. + +#### abstractmethod execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### abstractmethod generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### static get_persistence_dir() + +Get the persistence directory for the conversation. + +* Parameters: + * `persistence_base_dir` – Base directory for persistence. Can be a string + path or Path object. + * `conversation_id` – Unique conversation ID. +* Returns: + String path to the conversation-specific persistence directory. + Always returns a normalized string path even if a Path was provided. + +#### abstractmethod pause() + +#### abstractmethod reject_pending_actions() + +#### abstractmethod run() + +Execute the agent to process messages and perform actions. + +This method runs the agent until it finishes processing the current +message or reaches the maximum iteration limit. + +#### abstractmethod send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### abstractmethod set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### abstractmethod set_security_analyzer() + +Set the security analyzer for the conversation. + +#### abstractmethod update_secrets() + +### class Conversation + +### class Conversation + +Bases: `object` + +Factory class for creating conversation instances with OpenHands agents. + +This factory automatically creates either a LocalConversation or RemoteConversation +based on the workspace type provided. LocalConversation runs the agent locally, +while RemoteConversation connects to a remote agent server. + +* Returns: + LocalConversation if workspace is local, RemoteConversation if workspace + is remote. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Conversation +>>> from openhands.sdk.plugin import PluginSource +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> agent = Agent(llm=llm, tools=[]) +>>> conversation = Conversation( +... agent=agent, +... workspace="./workspace", +... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], +... ) +>>> conversation.send_message("Hello!") +>>> conversation.run() +``` + +### class ConversationExecutionStatus + +Bases: `str`, `Enum` + +Enum representing the current execution state of the conversation. + +#### Methods + +#### DELETING = 'deleting' + +#### ERROR = 'error' + +#### FINISHED = 'finished' + +#### IDLE = 'idle' + +#### PAUSED = 'paused' + +#### RUNNING = 'running' + +#### STUCK = 'stuck' + +#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' + +#### is_terminal() + +Check if this status represents a terminal state. + +Terminal states indicate the run has completed and the agent is no longer +actively processing. These are: FINISHED, ERROR, STUCK. + +Note: IDLE is NOT a terminal state - it’s the initial state of a conversation +before any run has started. Including IDLE would cause false positives when +the WebSocket delivers the initial state update during connection. + +* Returns: + True if this is a terminal status, False otherwise. + +### class ConversationState + +Bases: `OpenHandsModel` + + +#### Properties + +- `activated_knowledge_skills`: list[str] +- `agent`: AgentBase +- `agent_state`: dict[str, Any] +- `blocked_actions`: dict[str, str] +- `blocked_messages`: dict[str, str] +- `confirmation_policy`: ConfirmationPolicyBase +- `env_observation_persistence_dir`: str | None + Directory for persisting environment observation files. +- `events`: [EventLog](#class-eventlog) +- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) +- `id`: UUID +- `max_iterations`: int +- `persistence_dir`: str | None +- `secret_registry`: [SecretRegistry](#class-secretregistry) +- `security_analyzer`: SecurityAnalyzerBase | None +- `stats`: ConversationStats +- `stuck_detection`: bool +- `workspace`: BaseWorkspace + +#### Methods + +#### acquire() + +Acquire the lock. + +* Parameters: + * `blocking` – If True, block until lock is acquired. If False, return + immediately. + * `timeout` – Maximum time to wait for lock (ignored if blocking=False). + -1 means wait indefinitely. +* Returns: + True if lock was acquired, False otherwise. + +#### block_action() + +Persistently record a hook-blocked action. + +#### block_message() + +Persistently record a hook-blocked user message. + +#### classmethod create() + +Create a new conversation state or resume from persistence. + +This factory method handles both new conversation creation and resumption +from persisted state. + +New conversation: +The provided Agent is used directly. Pydantic validation happens via the +cls() constructor. + +Restored conversation: +The provided Agent is validated against the persisted agent using +agent.load(). Tools must match (they may have been used in conversation +history), but all other configuration can be freely changed: LLM, +agent_context, condenser, system prompts, etc. + +* Parameters: + * `id` – Unique conversation identifier + * `agent` – The Agent to use (tools must match persisted on restore) + * `workspace` – Working directory for agent operations + * `persistence_dir` – Directory for persisting state and events + * `max_iterations` – Maximum iterations per run + * `stuck_detection` – Whether to enable stuck detection + * `cipher` – Optional cipher for encrypting/decrypting secrets in + persisted state. If provided, secrets are encrypted when + saving and decrypted when loading. If not provided, secrets + are redacted (lost) on serialization. +* Returns: + ConversationState ready for use +* Raises: + * `ValueError` – If conversation ID or tools mismatch on restore + * `ValidationError` – If agent or other fields fail Pydantic validation + +#### static get_unmatched_actions() + +Find actions in the event history that don’t have matching observations. + +This method identifies ActionEvents that don’t have corresponding +ObservationEvents or UserRejectObservations, which typically indicates +actions that are pending confirmation or execution. + +* Parameters: + `events` – List of events to search through +* Returns: + List of ActionEvent objects that don’t have corresponding observations, + in chronological order + +#### locked() + +Return True if the lock is currently held by any thread. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### owned() + +Return True if the lock is currently held by the calling thread. + +#### pop_blocked_action() + +Remove and return a hook-blocked action reason, if present. + +#### pop_blocked_message() + +Remove and return a hook-blocked message reason, if present. + +#### release() + +Release the lock. + +* Raises: + `RuntimeError` – If the current thread doesn’t own the lock. + +#### set_on_state_change() + +Set a callback to be called when state changes. + +* Parameters: + `callback` – A function that takes an Event (ConversationStateUpdateEvent) + or None to remove the callback + +### class ConversationVisualizerBase + +Bases: `ABC` + +Base class for conversation visualizers. + +This abstract base class defines the interface that all conversation visualizers +must implement. Visualizers can be created before the Conversation is initialized +and will be configured with the conversation state automatically. + +The typical usage pattern: +1. Create a visualizer instance: + + viz = MyVisualizer() +1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) +2. Conversation automatically calls viz.initialize(state) to attach the state + +You can also pass the uninstantiated class if you don’t need extra args +: for initialization, and Conversation will create it: + : conv = Conversation(agent, visualizer=MyVisualizer) + +Conversation will then calls MyVisualizer() followed by initialize(state) + + +#### Properties + +- `conversation_stats`: ConversationStats | None + Get conversation stats from the state. + +#### Methods + +#### __init__() + +Initialize the visualizer base. + +#### create_sub_visualizer() + +Create a visualizer for a sub-agent during delegation. + +Override this method to support sub-agent visualization in multi-agent +delegation scenarios. The sub-visualizer will be used to display events +from the spawned sub-agent. + +By default, returns None which means sub-agents will not have visualization. +Subclasses that support delegation (like DelegationVisualizer) should +override this method to create appropriate sub-visualizers. + +* Parameters: + `agent_id` – The identifier of the sub-agent being spawned +* Returns: + A visualizer instance for the sub-agent, or None if sub-agent + visualization is not supported + +#### final initialize() + +Initialize the visualizer with conversation state. + +This method is called by Conversation after the state is created, +allowing the visualizer to access conversation stats and other +state information. + +Subclasses should not override this method, to ensure the state is set. + +* Parameters: + `state` – The conversation state object + +#### abstractmethod on_event() + +Handle a conversation event. + +This method is called for each event in the conversation and should +implement the visualization logic. + +* Parameters: + `event` – The event to visualize + +### class DefaultConversationVisualizer + +Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) + +Handles visualization of conversation events with Rich formatting. + +Provides Rich-formatted output with semantic dividers and complete content display. + +#### Methods + +#### __init__() + +Initialize the visualizer. + +* Parameters: + * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles + for highlighting keywords in the visualizer. + For example: (configuration object) + * `skip_user_messages` – If True, skip displaying user messages. Useful for + scenarios where user input is not relevant to show. + +#### on_event() + +Main event handler that displays events with Rich formatting. + +### class EventLog + +Bases: [`EventsListBase`](#class-eventslistbase) + +Persistent event log with locking for concurrent writes. + +This class provides thread-safe and process-safe event storage using +the FileStore’s locking mechanism. Events are persisted to disk and +can be accessed by index or event ID. + +#### Methods + +#### NOTE +For LocalFileStore, file locking via flock() does NOT work reliably +on NFS mounts or network filesystems. Users deploying with shared +storage should use alternative coordination mechanisms. + +#### __init__() + +#### append() + +Append an event with locking for thread/process safety. + +* Raises: + * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. + * `ValueError` – If an event with the same ID already exists. + +#### get_id() + +Return the event_id for a given index. + +#### get_index() + +Return the integer index for a given event_id. + +### class EventsListBase + +Bases: `Sequence`[`Event`], `ABC` + +Abstract base class for event lists that can be appended to. + +This provides a common interface for both local EventLog and remote +RemoteEventsList implementations, avoiding circular imports in protocols. + +#### Methods + +#### abstractmethod append() + +Add a new event to the list. + +### class LocalConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = True +- `id`: UUID + Get the unique ID of the conversation. +- `llm_registry`: LLMRegistry +- `max_iteration_per_run`: int +- `resolved_plugins`: list[ResolvedPluginSource] | None + Get the resolved plugin sources after plugins are loaded. + Returns None if plugins haven’t been loaded yet, or if no plugins + were specified. Use this for persistence to ensure conversation + resume uses the exact same plugin versions. +- `state`: [ConversationState](#class-conversationstate) + Get the conversation state. + It returns a protocol that has a subset of ConversationState methods + and properties. We will have the ability to access the same properties + of ConversationState on a remote conversation object. + But we won’t be able to access methods that mutate the state. +- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None + Get the stuck detector instance if enabled. +- `workspace`: LocalWorkspace + +#### Methods + +#### __init__() + +Initialize the conversation. + +* Parameters: + * `agent` – The agent to use for the conversation. + * `workspace` – Working directory for agent operations and tool execution. + Can be a string path, Path object, or LocalWorkspace instance. + * `plugins` – Optional list of plugins to load. Each plugin is specified + with a source (github:owner/repo, git URL, or local path), + optional ref (branch/tag/commit), and optional repo_path for + monorepos. Plugins are loaded in order with these merge + semantics: skills override by name (last wins), MCP config + override by key (last wins), hooks concatenate (all run). + * `persistence_dir` – Directory for persisting conversation state and events. + Can be a string path or Path object. + * `conversation_id` – Optional ID for the conversation. If provided, will + be used to identify the conversation. The user might want to + suffix their persistent filestore with this ID. + * `callbacks` – Optional list of callback functions to handle events + * `token_callbacks` – Optional list of callbacks invoked for streaming deltas + * `hook_config` – Optional hook configuration to auto-wire session hooks. + If plugins are loaded, their hooks are combined with this config. + * `max_iteration_per_run` – Maximum number of iterations per run + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `stuck_detection` – Whether to enable stuck detection + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted + state. If provided, secrets are encrypted when saving and + decrypted when loading. If not provided, secrets are redacted + (lost) on serialization. + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up all tool executors. + +#### condense() + +Synchronously force condense the conversation history. + +If the agent is currently running, condense() will wait for the +ongoing step to finish before proceeding. + +Raises ValueError if no compatible condenser exists. + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses self.agent.llm. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### pause() + +Pause agent execution. + +This method can be called from any thread to request that the agent +pause execution. The pause will take effect at the next iteration +of the run loop (between agent steps). + +Note: If called during an LLM completion, the pause will not take +effect until the current LLM call completes. + +#### reject_pending_actions() + +Reject all pending actions from the agent. + +This is a non-invasive method to reject actions between run() calls. +Also clears the agent_waiting_for_confirmation flag. + +#### run() + +Runs the conversation until the agent finishes. + +In confirmation mode: +- First call: creates actions but doesn’t execute them, stops and waits +- Second call: executes pending actions (implicit confirmation) + +In normal mode: +- Creates and executes actions immediately + +Can be paused between steps + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy and store it in conversation state. + +#### set_security_analyzer() + +Set the security analyzer for the conversation. + +#### update_secrets() + +Add secrets to the conversation. + +* Parameters: + `secrets` – Dictionary mapping secret keys to values or no-arg callables. + SecretValue = str | Callable[[], str]. Callables are invoked lazily + when a command references the secret key. + +### class RemoteConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = False +- `id`: UUID +- `max_iteration_per_run`: int +- `state`: RemoteState + Access to remote conversation state. +- `workspace`: RemoteWorkspace + +#### Methods + +#### __init__() + +Remote conversation proxy that talks to an agent server. + +* Parameters: + * `agent` – Agent configuration (will be sent to the server) + * `workspace` – The working directory for agent operations and tool execution. + * `plugins` – Optional list of plugins to load on the server. Each plugin + is a PluginSource specifying source, ref, and repo_path. + * `conversation_id` – Optional existing conversation id to attach to + * `callbacks` – Optional callbacks to receive events (not yet streamed) + * `max_iteration_per_run` – Max iterations configured on server + * `stuck_detection` – Whether to enable stuck detection on server + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `hook_config` – Optional hook configuration for session hooks + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `secrets` – Optional secrets to initialize the conversation with + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up resources. + +Note: We don’t close self._client here because it’s shared with the workspace. +The workspace owns the client and will close it during its own cleanup. +Closing it here would prevent the workspace from making cleanup API calls. + +#### condense() + +Force condensation of the conversation history. + +This method sends a condensation request to the remote agent server. +The server will use the existing condensation request pattern to trigger +condensation if a condenser is configured and handles condensation requests. + +The condensation will be applied on the server side and will modify the +conversation state by adding a condensation event to the history. + +* Raises: + `HTTPError` – If the server returns an error (e.g., no condenser configured). + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +Note: This method is not yet supported for RemoteConversation. +Tool execution for remote conversations happens on the server side +during the normal agent loop. + +* Parameters: + * `tool_name` – The name of the tool to execute + * `action` – The action to pass to the tool executor +* Raises: + `NotImplementedError` – Always, as this feature is not yet supported + for remote conversations. + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If provided, its usage_id + will be sent to the server. If not provided, uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. + +#### pause() + +#### reject_pending_actions() + +#### run() + +Trigger a run on the server. + +* Parameters: + * `blocking` – If True (default), wait for the run to complete by polling + the server. If False, return immediately after triggering the run. + * `poll_interval` – Time in seconds between status polls (only used when + blocking=True). Default is 1.0 second. + * `timeout` – Maximum time in seconds to wait for the run to complete + (only used when blocking=True). Default is 3600 seconds. +* Raises: + `ConversationRunError` – If the run fails or times out. + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### set_security_analyzer() + +Set the security analyzer for the remote conversation. + +#### property stuck_detector + +Stuck detector for compatibility. +Not implemented for remote conversations. + +#### update_secrets() + +### class SecretRegistry + +Bases: `OpenHandsModel` + +Manages secrets and injects them into bash commands when needed. + +The secret registry stores a mapping of secret keys to SecretSources +that retrieve the actual secret values. When a bash command is about to be +executed, it scans the command for any secret keys and injects the corresponding +environment variables. + +Secret sources will redact / encrypt their sensitive values as appropriate when +serializing, depending on the content of the context. If a context is present +and contains a ‘cipher’ object, this is used for encryption. If it contains a +boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. +Otherwise secrets are redacted. + +Additionally, it tracks the latest exported values to enable consistent masking +even when callable secrets fail on subsequent calls. + + +#### Properties + +- `secret_sources`: dict[str, SecretSource] + +#### Methods + +#### find_secrets_in_text() + +Find all secret keys mentioned in the given text. + +* Parameters: + `text` – The text to search for secret keys +* Returns: + Set of secret keys found in the text + +#### get_secrets_as_env_vars() + +Get secrets that should be exported as environment variables for a command. + +* Parameters: + `command` – The bash command to check for secret references +* Returns: + Dictionary of environment variables to export (key -> value) + +#### mask_secrets_in_output() + +Mask secret values in the given text. + +This method uses both the current exported values and attempts to get +fresh values from callables to ensure comprehensive masking. + +* Parameters: + `text` – The text to mask secrets in +* Returns: + Text with secret values replaced by `` + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### update_secrets() + +Add or update secrets in the manager. + +* Parameters: + `secrets` – Dictionary mapping secret keys to either string values + or callable functions that return string values + +### class StuckDetector + +Bases: `object` + +Detects when an agent is stuck in repetitive or unproductive patterns. + +This detector analyzes the conversation history to identify various stuck patterns: +1. Repeating action-observation cycles +2. Repeating action-error cycles +3. Agent monologue (repeated messages without user input) +4. Repeating alternating action-observation patterns +5. Context window errors indicating memory issues + + +#### Properties + +- `action_error_threshold`: int +- `action_observation_threshold`: int +- `alternating_pattern_threshold`: int +- `monologue_threshold`: int +- `state`: [ConversationState](#class-conversationstate) +- `thresholds`: StuckDetectionThresholds + +#### Methods + +#### __init__() + +#### is_stuck() + +Check if the agent is currently stuck. + +Note: To avoid materializing potentially large file-backed event histories, +only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. +If a user message exists within this window, only events after it are checked. +Otherwise, all events in the window are analyzed. + +#### __init__() + + +# openhands.sdk.event +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event + +### class ActionEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + + +#### Properties + +- `action`: Action | None +- `critic_result`: CriticResult | None +- `llm_response_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str | None +- `responses_reasoning_item`: ReasoningItemModel | None +- `security_risk`: SecurityRisk +- `source`: Literal['agent', 'user', 'environment'] +- `summary`: str | None +- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] +- `thought`: Sequence[TextContent] +- `tool_call`: MessageToolCall +- `tool_call_id`: str +- `tool_name`: str +- `visualize`: Text + Return Rich Text representation of this action event. + +#### Methods + +#### to_llm_message() + +Individual message - may be incomplete for multi-action batches + +### class AgentErrorEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Error triggered by the agent. + +Note: This event should not contain model “thought” or “reasoning_content”. It +represents an error produced by the agent/scaffold, not model output. + + +#### Properties + +- `error`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this agent error event. + +#### Methods + +#### to_llm_message() + +### class Condensation + +Bases: [`Event`](#class-event) + +This action indicates a condensation of the conversation history is happening. + + +#### Properties + +- `forgotten_event_ids`: list[[EventID](#class-eventid)] +- `has_summary_metadata`: bool + Checks if both summary and summary_offset are present. +- `llm_response_id`: [EventID](#class-eventid) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str | None +- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) + Generates a CondensationSummaryEvent. + Since summary events are not part of the main event store and are generated + dynamically, this property ensures the created event has a unique and consistent + ID based on the condensation event’s ID. + * Raises: + `ValueError` – If no summary is present. +- `summary_offset`: int | None +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### apply() + +Applies the condensation to a list of events. + +This method removes events that are marked to be forgotten and returns a new +list of events. If the summary metadata is present (both summary and offset), +the corresponding CondensationSummaryEvent will be inserted at the specified +offset _after_ the forgotten events have been removed. + +### class CondensationRequest + +Bases: [`Event`](#class-event) + +This action is used to request a condensation of the conversation history. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### action + +The action type, namely ActionType.CONDENSATION_REQUEST. + +* Type: + str + +### class CondensationSummaryEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +This event represents a summary generated by a condenser. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str + The summary text. + +#### Methods + +#### to_llm_message() + +### class ConversationStateUpdateEvent + +Bases: [`Event`](#class-event) + +Event that contains conversation state updates. + +This event is sent via websocket whenever the conversation state changes, +allowing remote clients to stay in sync without making REST API calls. + +All fields are serialized versions of the corresponding ConversationState fields +to ensure compatibility with websocket transmission. + + +#### Properties + +- `key`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `value`: Any + +#### Methods + +#### classmethod from_conversation_state() + +Create a state update event from a ConversationState object. + +This creates an event containing a snapshot of important state fields. + +* Parameters: + * `state` – The ConversationState to serialize + * `conversation_id` – The conversation ID for the event +* Returns: + A ConversationStateUpdateEvent with serialized state data + +#### classmethod validate_key() + +#### classmethod validate_value() + +### class Event + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Base class for all events. + + +#### Properties + +- `id`: str +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `timestamp`: str +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. +### class LLMCompletionLogEvent + +Bases: [`Event`](#class-event) + +Event containing LLM completion log data. + +When an LLM is configured with log_completions=True in a remote conversation, +this event streams the completion log data back to the client through WebSocket +instead of writing it to a file inside the Docker container. + + +#### Properties + +- `filename`: str +- `log_data`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_name`: str +- `source`: Literal['agent', 'user', 'environment'] +- `usage_id`: str +### class LLMConvertibleEvent + +Bases: [`Event`](#class-event), `ABC` + +Base class for events that can be converted to LLM messages. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### static events_to_messages() + +Convert event stream to LLM message stream, handling multi-action batches + +#### abstractmethod to_llm_message() + +### class MessageEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Message from either agent or user. + +This is originally the “MessageAction”, but it suppose not to be tool call. + + +#### Properties + +- `activated_skills`: list[str] +- `critic_result`: CriticResult | None +- `extended_content`: list[TextContent] +- `llm_message`: Message +- `llm_response_id`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str +- `sender`: str | None +- `source`: Literal['agent', 'user', 'environment'] +- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] + Return the Anthropic thinking blocks from the LLM message. +- `visualize`: Text + Return Rich Text representation of this message event. + +#### Methods + +#### to_llm_message() + +### class ObservationBaseEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Base class for anything as a response to a tool call. + +Examples include tool execution, error, user reject. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `tool_call_id`: str +- `tool_name`: str +### class ObservationEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `observation`: Observation +- `visualize`: Text + Return Rich Text representation of this observation event. + +#### Methods + +#### to_llm_message() + +### class PauseEvent + +Bases: [`Event`](#class-event) + +Event indicating that the agent execution was paused by user request. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this pause event. +### class SystemPromptEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +System prompt added by the agent. + +The system prompt can optionally include dynamic context that varies between +conversations. When `dynamic_context` is provided, it is included as a +second content block in the same system message. Cache markers are NOT +applied here - they are applied by `LLM._apply_prompt_caching()` when +caching is enabled, ensuring provider-specific cache control is only added +when appropriate. + + +#### Properties + +- `dynamic_context`: TextContent | None +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `system_prompt`: TextContent +- `tools`: list[ToolDefinition] +- `visualize`: Text + Return Rich Text representation of this system prompt event. + +#### Methods + +#### system_prompt + +The static system prompt text (cacheable across conversations) + +* Type: + openhands.sdk.llm.message.TextContent + +#### tools + +List of available tools + +* Type: + list[openhands.sdk.tool.tool.ToolDefinition] + +#### dynamic_context + +Optional per-conversation context (hosts, repo info, etc.) +Sent as a second TextContent block inside the system message. + +* Type: + openhands.sdk.llm.message.TextContent | None + +#### to_llm_message() + +Convert to a single system LLM message. + +When `dynamic_context` is present the message contains two content +blocks: the static prompt followed by the dynamic context. Cache markers +are NOT applied here - they are applied by `LLM._apply_prompt_caching()` +when caching is enabled, which marks the static block (index 0) and leaves +the dynamic block (index 1) unmarked for cross-conversation cache sharing. + +### class TokenEvent + +Bases: [`Event`](#class-event) + +Event from VLLM representing token IDs used in LLM interaction. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `prompt_token_ids`: list[int] +- `response_token_ids`: list[int] +- `source`: Literal['agent', 'user', 'environment'] +### class UserRejectObservation + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Observation when an action is rejected by user or hook. + +This event is emitted when: +- User rejects an action during confirmation mode (rejection_source=”user”) +- A PreToolUse hook blocks an action (rejection_source=”hook”) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `rejection_reason`: str +- `rejection_source`: Literal['user', 'hook'] +- `visualize`: Text + Return Rich Text representation of this user rejection event. + +#### Methods + +#### to_llm_message() + + +# openhands.sdk.llm +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm + +### class CredentialStore + +Bases: `object` + +Store and retrieve OAuth credentials for LLM providers. + + +#### Properties + +- `credentials_dir`: Path + Get the credentials directory, creating it if necessary. + +#### Methods + +#### __init__() + +Initialize the credential store. + +* Parameters: + `credentials_dir` – Optional custom directory for storing credentials. + Defaults to ~/.local/share/openhands/auth/ + +#### delete() + +Delete stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name +* Returns: + True if credentials were deleted, False if they didn’t exist + +#### get() + +Get stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name (e.g., ‘openai’) +* Returns: + OAuthCredentials if found and valid, None otherwise + +#### save() + +Save credentials for a vendor. + +* Parameters: + `credentials` – The OAuth credentials to save + +#### update_tokens() + +Update tokens for an existing credential. + +* Parameters: + * `vendor` – The vendor/provider name + * `access_token` – New access token + * `refresh_token` – New refresh token (if provided) + * `expires_in` – Token expiry in seconds +* Returns: + Updated credentials, or None if no existing credentials found + +### class ImageContent + +Bases: `BaseContent` + + +#### Properties + +- `image_urls`: list[str] +- `type`: Literal['image'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_llm_dict() + +Convert to LLM API format. + +### class LLM + +Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` + +Language model interface for OpenHands agents. + +The LLM class provides a unified interface for interacting with various +language models through the litellm library. It handles model configuration, +API authentication, +retry logic, and tool calling capabilities. + +#### Example + +```pycon +>>> from openhands.sdk import LLM +>>> from pydantic import SecretStr +>>> llm = LLM( +... model="claude-sonnet-4-20250514", +... api_key=SecretStr("your-api-key"), +... usage_id="my-agent" +... ) +>>> # Use with agent or conversation +``` + + +#### Properties + +- `api_key`: str | SecretStr | None +- `api_version`: str | None +- `aws_access_key_id`: str | SecretStr | None +- `aws_region_name`: str | None +- `aws_secret_access_key`: str | SecretStr | None +- `base_url`: str | None +- `caching_prompt`: bool +- `custom_tokenizer`: str | None +- `disable_stop_word`: bool | None +- `disable_vision`: bool | None +- `drop_params`: bool +- `enable_encrypted_reasoning`: bool +- `extended_thinking_budget`: int | None +- `extra_headers`: dict[str, str] | None +- `force_string_serializer`: bool | None +- `input_cost_per_token`: float | None +- `is_subscription`: bool + Check if this LLM uses subscription-based authentication. + Returns True when the LLM was created via LLM.subscription_login(), + which uses the ChatGPT subscription Codex backend rather than the + standard OpenAI API. + * Returns: + True if using subscription-based transport, False otherwise. + * Return type: + bool +- `litellm_extra_body`: dict[str, Any] +- `log_completions`: bool +- `log_completions_folder`: str +- `max_input_tokens`: int | None +- `max_message_chars`: int +- `max_output_tokens`: int | None +- `metrics`: [Metrics](#class-metrics) + Get usage metrics for this LLM instance. + * Returns: + Metrics object containing token usage, costs, and other statistics. +- `model`: str +- `model_canonical_name`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_info`: dict | None + Returns the model info dictionary. +- `modify_params`: bool +- `native_tool_calling`: bool +- `num_retries`: int +- `ollama_base_url`: str | None +- `openrouter_app_name`: str +- `openrouter_site_url`: str +- `output_cost_per_token`: float | None +- `prompt_cache_retention`: str | None +- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None +- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None +- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] +- `retry_max_wait`: int +- `retry_min_wait`: int +- `retry_multiplier`: float +- `safety_settings`: list[dict[str, str]] | None +- `seed`: int | None +- `stream`: bool +- `telemetry`: Telemetry + Get telemetry handler for this LLM instance. + * Returns: + Telemetry object for managing logging and metrics callbacks. +- `temperature`: float | None +- `timeout`: int | None +- `top_k`: float | None +- `top_p`: float | None +- `usage_id`: str + +#### Methods + +#### completion() + +Generate a completion from the language model. + +This is the method for getting responses from the model via Completion API. +It handles message formatting, tool calling, and response processing. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API +* Returns: + LLMResponse containing the model’s response and metadata. + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +* Raises: + `ValueError` – If streaming is requested (not supported). + +#### format_messages_for_llm() + +Formats Message objects for LLM consumption. + +#### format_messages_for_responses() + +Prepare (instructions, input[]) for the OpenAI Responses API. + +- Skips prompt caching flags and string serializer concerns +- Uses Message.to_responses_value to get either instructions (system) + or input items (others) +- Concatenates system instructions into a single instructions string +- For subscription mode, system prompts are prepended to user content + +#### get_token_count() + +#### is_caching_prompt_active() + +Check if prompt caching is supported and enabled for current model. + +* Returns: + True if prompt caching is supported and enabled for the given + : model. +* Return type: + boolean + +#### classmethod load_from_env() + +#### classmethod load_from_json() + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### reset_metrics() + +Reset metrics and telemetry to fresh instances. + +This is used by the LLMRegistry to ensure each registered LLM has +independent metrics, preventing metrics from being shared between +LLMs that were created via model_copy(). + +When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), +Pydantic’s model_copy() does a shallow copy of private attributes by default, +causing the original and copied LLM to share the same Metrics object. +This method allows the registry to fix this by resetting metrics to None, +which will be lazily recreated when accessed. + +#### responses() + +Alternative invocation path using OpenAI Responses API via LiteLLM. + +Maps Message[] -> (instructions, input[]) and returns LLMResponse. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `include` – Optional list of fields to include in response + * `store` – Whether to store the conversation + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming deltas + kwargs* – Additional arguments passed to the API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### restore_metrics() + +#### classmethod subscription_login() + +Authenticate with a subscription service and return an LLM instance. + +This method provides subscription-based access to LLM models that are +available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather +than API credits. It handles credential caching, token refresh, and +the OAuth login flow. + +Currently supported vendors: +- “openai”: ChatGPT Plus/Pro subscription for Codex models + +Supported OpenAI models: +- gpt-5.1-codex-max +- gpt-5.1-codex-mini +- gpt-5.2 +- gpt-5.2-codex + +* Parameters: + * `vendor` – The vendor/provider. Currently only “openai” is supported. + * `model` – The model to use. Must be supported by the vendor’s + subscription service. + * `force_login` – If True, always perform a fresh login even if valid + credentials exist. + * `open_browser` – Whether to automatically open the browser for the + OAuth login flow. + llm_kwargs* – Additional arguments to pass to the LLM constructor. +* Returns: + An LLM instance configured for subscription-based access. +* Raises: + * `ValueError` – If the vendor or model is not supported. + * `RuntimeError` – If authentication fails. + +#### uses_responses_api() + +Whether this model uses the OpenAI Responses API path. + +#### vision_is_active() + +### class LLMProfileStore + +Bases: `object` + +Standalone utility for persisting LLM configurations. + +#### Methods + +#### __init__() + +Initialize the profile store. + +* Parameters: + `base_dir` – Path to the directory where the profiles are stored. + If None is provided, the default directory is used, i.e., + ~/.openhands/profiles. + +#### delete() + +Delete an existing profile. + +If the profile is not present in the profile directory, it does nothing. + +* Parameters: + `name` – Name of the profile to delete. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +#### list() + +Returns a list of all profiles stored. + +* Returns: + List of profile filenames (e.g., [“default.json”, “gpt4.json”]). + +#### load() + +Load an LLM instance from the given profile name. + +* Parameters: + `name` – Name of the profile to load. +* Returns: + An LLM instance constructed from the profile configuration. +* Raises: + * `FileNotFoundError` – If the profile name does not exist. + * `ValueError` – If the profile file is corrupted or invalid. + * `TimeoutError` – If the lock cannot be acquired. + +#### save() + +Save a profile to the profile directory. + +Note that if a profile name already exists, it will be overwritten. + +* Parameters: + * `name` – Name of the profile to save. + * `llm` – LLM instance to save + * `include_secrets` – Whether to include the profile secrets. Defaults to False. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +### class LLMRegistry + +Bases: `object` + +A minimal LLM registry for managing LLM instances by usage ID. + +This registry provides a simple way to manage multiple LLM instances, +avoiding the need to recreate LLMs with the same configuration. + +The registry also ensures that each registered LLM has independent metrics, +preventing metrics from being shared between LLMs that were created via +model_copy(). This is important for scenarios like creating a condenser LLM +from an agent LLM, where each should track its own usage independently. + + +#### Properties + +- `registry_id`: str +- `retry_listener`: Callable[[int, int], None] | None +- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None +- `usage_to_llm`: MappingProxyType + Access the internal usage-ID-to-LLM mapping (read-only view). + +#### Methods + +#### __init__() + +Initialize the LLM registry. + +* Parameters: + `retry_listener` – Optional callback for retry events. + +#### add() + +Add an LLM instance to the registry. + +This method ensures that the LLM has independent metrics before +registering it. If the LLM’s metrics are shared with another +registered LLM (e.g., due to model_copy()), fresh metrics will +be created automatically. + +* Parameters: + `llm` – The LLM instance to register. +* Raises: + `ValueError` – If llm.usage_id already exists in the registry. + +#### get() + +Get an LLM instance from the registry. + +* Parameters: + `usage_id` – Unique identifier for the LLM usage slot. +* Returns: + The LLM instance. +* Raises: + `KeyError` – If usage_id is not found in the registry. + +#### list_usage_ids() + +List all registered usage IDs. + +#### notify() + +Notify subscribers of registry events. + +* Parameters: + `event` – The registry event to notify about. + +#### subscribe() + +Subscribe to registry events. + +* Parameters: + `callback` – Function to call when LLMs are created or updated. + +### class LLMResponse + +Bases: `BaseModel` + +Result of an LLM completion request. + +This type provides a clean interface for LLM completion results, exposing +only OpenHands-native types to consumers while preserving access to the +raw LiteLLM response for internal use. + + +#### Properties + +- `id`: str + Get the response ID from the underlying LLM response. + This property provides a clean interface to access the response ID, + supporting both completion mode (ModelResponse) and response API modes + (ResponsesAPIResponse). + * Returns: + The response ID from the LLM response +- `message`: [Message](#class-message) +- `metrics`: [MetricsSnapshot](#class-metricssnapshot) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `raw_response`: ModelResponse | ResponsesAPIResponse + +#### Methods + +#### message + +The completion message converted to OpenHands Message type + +* Type: + [openhands.sdk.llm.message.Message](#class-message) + +#### metrics + +Snapshot of metrics from the completion request + +* Type: + [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) + +#### raw_response + +The original LiteLLM response (ModelResponse or +ResponsesAPIResponse) for internal use + +* Type: + litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse + +### class Message + +Bases: `BaseModel` + + +#### Properties + +- `contains_image`: bool +- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str | None +- `reasoning_content`: str | None +- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None +- `role`: Literal['user', 'system', 'assistant', 'tool'] +- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] +- `tool_call_id`: str | None +- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None + +#### Methods + +#### classmethod from_llm_chat_message() + +Convert a LiteLLMMessage (Chat Completions) to our Message class. + +Provider-agnostic mapping for reasoning: +- Prefer message.reasoning_content if present (LiteLLM normalized field) +- Extract thinking_blocks from content array (Anthropic-specific) + +#### classmethod from_llm_responses_output() + +Convert OpenAI Responses API output items into a single assistant Message. + +Policy (non-stream): +- Collect assistant text by concatenating output_text parts from message items +- Normalize function_call items to MessageToolCall list + +#### to_chat_dict() + +Serialize message for OpenAI Chat Completions. + +* Parameters: + * `cache_enabled` – Whether prompt caching is active. + * `vision_enabled` – Whether vision/image processing is enabled. + * `function_calling_enabled` – Whether native function calling is enabled. + * `force_string_serializer` – Force string serializer instead of list format. + * `send_reasoning_content` – Whether to include reasoning_content in output. + +Chooses the appropriate content serializer and then injects threading keys: +- Assistant tool call turn: role == “assistant” and self.tool_calls +- Tool result turn: role == “tool” and self.tool_call_id (with name) + +#### to_responses_dict() + +Serialize message for OpenAI Responses (input parameter). + +Produces a list of “input” items for the Responses API: +- system: returns [], system content is expected in ‘instructions’ +- user: one ‘message’ item with content parts -> input_text / input_image +(when vision enabled) +- assistant: emits prior assistant content as input_text, +and function_call items for tool_calls +- tool: emits function_call_output items (one per TextContent) +with matching call_id + +#### to_responses_value() + +Return serialized form. + +Either an instructions string (for system) or input items (for other roles). + +### class MessageToolCall + +Bases: `BaseModel` + +Transport-agnostic tool call representation. + +One canonical id is used for linking across actions/observations and +for Responses function_call_output call_id. + + +#### Properties + +- `arguments`: str +- `id`: str +- `name`: str +- `origin`: Literal['completion', 'responses'] +- `costs`: list[Cost] +- `response_latencies`: list[ResponseLatency] +- `token_usages`: list[TokenUsage] + +#### Methods + +#### classmethod from_chat_tool_call() + +Create a MessageToolCall from a Chat Completions tool call. + +#### classmethod from_responses_function_call() + +Create a MessageToolCall from a typed OpenAI Responses function_call item. + +Note: OpenAI Responses function_call.arguments is already a JSON string. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_chat_dict() + +Serialize to OpenAI Chat Completions tool_calls format. + +#### to_responses_dict() + +Serialize to OpenAI Responses ‘function_call’ input item format. + +#### add_cost() + +#### add_response_latency() + +#### add_token_usage() + +Add a single usage record. + +#### deep_copy() + +Create a deep copy of the Metrics object. + +#### diff() + +Calculate the difference between current metrics and a baseline. + +This is useful for tracking metrics for specific operations like delegates. + +* Parameters: + `baseline` – A metrics object representing the baseline state +* Returns: + A new Metrics object containing only the differences since the baseline + +#### get() + +Return the metrics in a dictionary. + +#### get_snapshot() + +Get a snapshot of the current metrics without the detailed lists. + +#### initialize_accumulated_token_usage() + +#### log() + +Log the metrics. + +#### merge() + +Merge ‘other’ metrics into this one. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_accumulated_cost() + +### class MetricsSnapshot + +Bases: `BaseModel` + +A snapshot of metrics at a point in time. + +Does not include lists of individual costs, latencies, or token usages. + + +#### Properties + +- `accumulated_cost`: float +- `accumulated_token_usage`: TokenUsage | None +- `max_budget_per_task`: float | None +- `model_name`: str + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OAuthCredentials + +Bases: `BaseModel` + +OAuth credentials for subscription-based LLM access. + + +#### Properties + +- `access_token`: str +- `expires_at`: int +- `refresh_token`: str +- `type`: Literal['oauth'] +- `vendor`: str + +#### Methods + +#### is_expired() + +Check if the access token is expired. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OpenAISubscriptionAuth + +Bases: `object` + +Handle OAuth authentication for OpenAI ChatGPT subscription access. + + +#### Properties + +- `vendor`: str + Get the vendor name. + +#### Methods + +#### __init__() + +Initialize the OpenAI subscription auth handler. + +* Parameters: + * `credential_store` – Optional custom credential store. + * `oauth_port` – Port for the local OAuth callback server. + +#### create_llm() + +Create an LLM instance configured for Codex subscription access. + +* Parameters: + * `model` – The model to use (must be in OPENAI_CODEX_MODELS). + * `credentials` – OAuth credentials to use. If None, uses stored credentials. + * `instructions` – Optional instructions for the Codex model. + llm_kwargs* – Additional arguments to pass to LLM constructor. +* Returns: + An LLM instance configured for Codex access. +* Raises: + `ValueError` – If the model is not supported or no credentials available. + +#### get_credentials() + +Get stored credentials if they exist. + +#### has_valid_credentials() + +Check if valid (non-expired) credentials exist. + +#### async login() + +Perform OAuth login flow. + +This starts a local HTTP server to handle the OAuth callback, +opens the browser for user authentication, and waits for the +callback with the authorization code. + +* Parameters: + `open_browser` – Whether to automatically open the browser. +* Returns: + The obtained OAuth credentials. +* Raises: + `RuntimeError` – If the OAuth flow fails or times out. + +#### logout() + +Remove stored credentials. + +* Returns: + True if credentials were removed, False if none existed. + +#### async refresh_if_needed() + +Refresh credentials if they are expired. + +* Returns: + Updated credentials, or None if no credentials exist. +* Raises: + `RuntimeError` – If token refresh fails. + +### class ReasoningItemModel + +Bases: `BaseModel` + +OpenAI Responses reasoning item (non-stream, subset we consume). + +Do not log or render encrypted_content. + + +#### Properties + +- `content`: list[str] | None +- `encrypted_content`: str | None +- `id`: str | None +- `status`: str | None +- `summary`: list[str] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RedactedThinkingBlock + +Bases: `BaseModel` + +Redacted thinking block for previous responses without extended thinking. + +This is used as a placeholder for assistant messages that were generated +before extended thinking was enabled. + + +#### Properties + +- `data`: str +- `type`: Literal['redacted_thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RegistryEvent + +Bases: `BaseModel` + + +#### Properties + +- `llm`: [LLM](#class-llm) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RouterLLM + +Bases: [`LLM`](#class-llm) + +Base class for multiple LLM acting as a unified LLM. +This class provides a foundation for implementing model routing by +inheriting from LLM, allowing routers to work with multiple underlying +LLM models while presenting a unified LLM interface to consumers. +Key features: +- Works with multiple LLMs configured via llms_for_routing +- Delegates all other operations/properties to the selected LLM +- Provides routing interface through select_llm() method + + +#### Properties + +- `active_llm`: [LLM](#class-llm) | None +- `llms_for_routing`: dict[str, [LLM](#class-llm)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `router_name`: str + +#### Methods + +#### completion() + +This method intercepts completion calls and routes them to the appropriate +underlying LLM based on the routing logic implemented in select_llm(). + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod select_llm() + +Select which LLM to use based on messages and events. + +This method implements the core routing logic for the RouterLLM. +Subclasses should analyze the provided messages to determine which +LLM from llms_for_routing is most appropriate for handling the request. + +* Parameters: + `messages` – List of messages in the conversation that can be used + to inform the routing decision. +* Returns: + The key/name of the LLM to use from llms_for_routing dictionary. + +#### classmethod set_placeholder_model() + +Guarantee model exists before LLM base validation runs. + +#### classmethod validate_llms_not_empty() + +### class TextContent + +Bases: `BaseContent` + + +#### Properties + +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str +- `type`: Literal['text'] + +#### Methods + +#### to_llm_dict() + +Convert to LLM API format. + +### class ThinkingBlock + +Bases: `BaseModel` + +Anthropic thinking block for extended thinking feature. + +This represents the raw thinking blocks returned by Anthropic models +when extended thinking is enabled. These blocks must be preserved +and passed back to the API for tool use scenarios. + + +#### Properties + +- `signature`: str | None +- `thinking`: str +- `type`: Literal['thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + + +# openhands.sdk.security +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security + +### class AlwaysConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class ConfirmRisky + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + + +#### Properties + +- `confirm_unknown`: bool +- `threshold`: [SecurityRisk](#class-securityrisk) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +#### classmethod validate_threshold() + +### class ConfirmationPolicyBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class GraySwanAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. + +This analyzer sends conversation history and pending actions to the GraySwan +Cygnal API for security analysis. The API returns a violation score which is +mapped to SecurityRisk levels. + +Environment Variables: +: GRAYSWAN_API_KEY: Required API key for GraySwan authentication + GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy + +#### Example + +```pycon +>>> from openhands.sdk.security.grayswan import GraySwanAnalyzer +>>> analyzer = GraySwanAnalyzer() +>>> risk = analyzer.security_risk(action_event) +``` + + +#### Properties + +- `api_key`: SecretStr | None +- `api_url`: str +- `history_limit`: int +- `low_threshold`: float +- `max_message_chars`: int +- `medium_threshold`: float +- `policy_id`: str | None +- `timeout`: float + +#### Methods + +#### close() + +Clean up resources. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Initialize the analyzer after model creation. + +#### security_risk() + +Analyze action for security risks using GraySwan API. + +This method converts the conversation history and the pending action +to OpenAI message format and sends them to the GraySwan Cygnal API +for security analysis. + +* Parameters: + `action` – The ActionEvent to analyze +* Returns: + SecurityRisk level based on GraySwan analysis + +#### set_events() + +Set the events for context when analyzing actions. + +* Parameters: + `events` – Sequence of events to use as context for security analysis + +#### validate_thresholds() + +Validate that thresholds are properly ordered. + +### class LLMSecurityAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +LLM-based security analyzer. + +This analyzer respects the security_risk attribute that can be set by the LLM +when generating actions, similar to OpenHands’ LLMRiskAnalyzer. + +It provides a lightweight security analysis approach that leverages the LLM’s +understanding of action context and potential risks. + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### security_risk() + +Evaluate security risk based on LLM-provided assessment. + +This method checks if the action has a security_risk attribute set by the LLM +and returns it. The LLM may not always provide this attribute but it defaults to +UNKNOWN if not explicitly set. + +### class NeverConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class SecurityAnalyzerBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for security analyzers. + +Security analyzers evaluate the risk of actions before they are executed +and can influence the conversation flow based on security policies. + +This is adapted from OpenHands SecurityAnalyzer but designed to work +with the agent-sdk’s conversation-based architecture. + +#### Methods + +#### analyze_event() + +Analyze an event for security risks. + +This is a convenience method that checks if the event is an action +and calls security_risk() if it is. Non-action events return None. + +* Parameters: + `event` – The event to analyze +* Returns: + ActionSecurityRisk if event is an action, None otherwise + +#### analyze_pending_actions() + +Analyze all pending actions in a conversation. + +This method gets all unmatched actions from the conversation state +and analyzes each one for security risks. + +* Parameters: + `conversation` – The conversation to analyze +* Returns: + List of tuples containing (action, risk_level) for each pending action + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod security_risk() + +Evaluate the security risk of an ActionEvent. + +This is the core method that analyzes an ActionEvent and returns its risk level. +Implementations should examine the action’s content, context, and potential +impact to determine the appropriate risk level. + +* Parameters: + `action` – The ActionEvent to analyze for security risks +* Returns: + ActionSecurityRisk enum indicating the risk level + +#### should_require_confirmation() + +Determine if an action should require user confirmation. + +This implements the default confirmation logic based on risk level +and confirmation mode settings. + +* Parameters: + * `risk` – The security risk level of the action + * `confirmation_mode` – Whether confirmation mode is enabled +* Returns: + True if confirmation is required, False otherwise + +### class SecurityRisk + +Bases: `str`, `Enum` + +Security risk levels for actions. + +Based on OpenHands security risk levels but adapted for agent-sdk. +Integer values allow for easy comparison and ordering. + + +#### Properties + +- `description`: str + Get a human-readable description of the risk level. +- `visualize`: Text + Return Rich Text representation of this risk level. + +#### Methods + +#### HIGH = 'HIGH' + +#### LOW = 'LOW' + +#### MEDIUM = 'MEDIUM' + +#### UNKNOWN = 'UNKNOWN' + +#### get_color() + +Get the color for displaying this risk level in Rich text. + +#### is_riskier() + +Check if this risk level is riskier than another. + +Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is +less risky than HIGH. UNKNOWN is not comparable to any other level. + +To make this act like a standard well-ordered domain, we reflexively consider +risk levels to be riskier than themselves. That is: + + for risk_level in list(SecurityRisk): + : assert risk_level.is_riskier(risk_level) + + # More concretely: + assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) + assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) + assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) + +This can be disabled by setting the reflexive parameter to False. + +* Parameters: + other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. + reflexive (bool*) – Whether the relationship is reflexive. +* Raises: + `ValueError` – If either risk level is UNKNOWN. + + +# openhands.sdk.tool +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool + +### class Action + +Bases: `Schema`, `ABC` + +Base schema for input action. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `visualize`: Text + Return Rich Text representation of this action. + This method can be overridden by subclasses to customize visualization. + The base implementation displays all action fields systematically. +### class ExecutableTool + +Bases: `Protocol` + +Protocol for tools that are guaranteed to have a non-None executor. + +This eliminates the need for runtime None checks and type narrowing +when working with tools that are known to be executable. + + +#### Properties + +- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] +- `name`: str + +#### Methods + +#### __init__() + +### class FinishTool + +Bases: `ToolDefinition[FinishAction, FinishObservation]` + +Tool for signaling the completion of a task or conversation. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create FinishTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by FinishTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single FinishTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'finish' + +### class Observation + +Bases: `Schema`, `ABC` + +Base schema for output observation. + + +#### Properties + +- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' +- `content`: list[TextContent | ImageContent] +- `is_error`: bool +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str + Extract all text content from the observation. + * Returns: + Concatenated text from all TextContent items in content. +- `to_llm_content`: Sequence[TextContent | ImageContent] + Default content formatting for converting observation to LLM readable content. + Subclasses can override to provide richer content (e.g., images, diffs). +- `visualize`: Text + Return Rich Text representation of this observation. + Subclasses can override for custom visualization; by default we show the + same text that would be sent to the LLM. + +#### Methods + +#### classmethod from_text() + +Utility to create an Observation from a simple text string. + +* Parameters: + * `text` – The text content to include in the observation. + * `is_error` – Whether this observation represents an error. + kwargs* – Additional fields for the observation subclass. +* Returns: + An Observation instance with the text wrapped in a TextContent. + +### class ThinkTool + +Bases: `ToolDefinition[ThinkAction, ThinkObservation]` + +Tool for logging thoughts without making changes. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create ThinkTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by ThinkTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single ThinkTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'think' + +### class Tool + +Bases: `BaseModel` + +Defines a tool to be initialized for the agent. + +This is only used in agent-sdk for type schema for server use. + + +#### Properties + +- `name`: str +- `params`: dict[str, Any] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_name() + +Validate that name is not empty. + +#### classmethod validate_params() + +Convert None params to empty dict. + +### class ToolAnnotations + +Bases: `BaseModel` + +Annotations to provide hints about the tool’s behavior. + +Based on Model Context Protocol (MCP) spec: +[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) + + +#### Properties + +- `destructiveHint`: bool +- `idempotentHint`: bool +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `openWorldHint`: bool +- `readOnlyHint`: bool +- `title`: str | None +### class ToolDefinition + +Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` + +Base class for all tool implementations. + +This class serves as a base for the discriminated union of all tool types. +All tools must inherit from this class and implement the .create() method for +proper initialization with executors and parameters. + +Features: +- Normalize input/output schemas (class or dict) into both model+schema. +- Validate inputs before execute. +- Coerce outputs only if an output model is defined; else return vanilla JSON. +- Export MCP tool description. + +#### Examples + +Simple tool with no parameters: +: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): + : @classmethod + def create(cls, conv_state=None, + `
` + ``` + ** + ``` + `
` + params): + `
` + > return [cls(name=”finish”, …, executor=FinishExecutor())] + +Complex tool with initialization parameters: +: class TerminalTool(ToolDefinition[TerminalAction, + : TerminalObservation]): + @classmethod + def create(cls, conv_state, + `
` + ``` + ** + ``` + `
` + params): + `
` + > executor = TerminalExecutor( + > : working_dir=conv_state.workspace.working_dir, + > `
` + > ``` + > ** + > ``` + > `
` + > params, + `
` + > ) + > return [cls(name=”terminal”, …, executor=executor)] + + +#### Properties + +- `action_type`: type[[Action](#class-action)] +- `annotations`: [ToolAnnotations](#class-toolannotations) | None +- `description`: str +- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] +- `meta`: dict[str, Any] | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: ClassVar[str] = '' +- `observation_type`: type[[Observation](#class-observation)] | None +- `title`: str + +#### Methods + +#### action_from_arguments() + +Create an action from parsed arguments. + +This method can be overridden by subclasses to provide custom logic +for creating actions from arguments (e.g., for MCP tools). + +* Parameters: + `arguments` – The parsed arguments from the tool call. +* Returns: + The action instance created from the arguments. + +#### as_executable() + +Return this tool as an ExecutableTool, ensuring it has an executor. + +This method eliminates the need for runtime None checks by guaranteeing +that the returned tool has a non-None executor. + +* Returns: + This tool instance, typed as ExecutableTool. +* Raises: + `NotImplementedError` – If the tool has no executor. + +#### abstractmethod classmethod create() + +Create a sequence of Tool instances. + +This method must be implemented by all subclasses to provide custom +initialization logic, typically initializing the executor with parameters +from conv_state and other optional parameters. + +* Parameters: + args** – Variable positional arguments (typically conv_state as first arg). + kwargs* – Optional parameters for tool initialization. +* Returns: + A sequence of Tool instances. Even single tools are returned as a sequence + to provide a consistent interface and eliminate union return types. + +#### classmethod resolve_kind() + +Resolve a kind string to its corresponding tool class. + +* Parameters: + `kind` – The name of the tool class to resolve +* Returns: + The tool class corresponding to the kind +* Raises: + `ValueError` – If the kind is unknown + +#### set_executor() + +Create a new Tool instance with the given executor. + +#### to_mcp_tool() + +Convert a Tool to an MCP tool definition. + +Allow overriding input/output schemas (usually by subclasses). + +* Parameters: + * `input_schema` – Optionally override the input schema. + * `output_schema` – Optionally override the output schema. + +#### to_openai_tool() + +Convert a Tool to an OpenAI tool. + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + to the action schema for LLM to predict. This is useful for + tools that may have safety risks, so the LLM can reason about + the risk level before calling the tool. + * `action_type` – Optionally override the action_type to use for the schema. + This is useful for MCPTool to use a dynamically created action type + based on the tool’s input schema. + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +#### to_responses_tool() + +Convert a Tool to a Responses API function tool (LiteLLM typed). + +For Responses API, function tools expect top-level keys: +(JSON configuration object) + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + * `action_type` – Optional override for the action type + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +### class ToolExecutor + +Bases: `ABC`, `Generic` + +Executor function type for a Tool. + +#### Methods + +#### close() + +Close the executor and clean up resources. + +Default implementation does nothing. Subclasses should override +this method to perform cleanup (e.g., closing connections, +terminating processes, etc.). + + +# openhands.sdk.utils +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils + +Utility functions for the OpenHands SDK. + +### deprecated() + +Return a decorator that deprecates a callable with explicit metadata. + +Use this helper when you can annotate a function, method, or property with +@deprecated(…). It transparently forwards to `deprecation.deprecated()` +while filling in the SDK’s current version metadata unless custom values are +supplied. + +### maybe_truncate() + +Truncate the middle of content if it exceeds the specified length. + +Keeps the head and tail of the content to preserve context at both ends. +Optionally saves the full content to a file for later investigation. + +* Parameters: + * `content` – The text content to potentially truncate + * `truncate_after` – Maximum length before truncation. If None, no truncation occurs + * `truncate_notice` – Notice to insert in the middle when content is truncated + * `save_dir` – Working directory to save full content file in + * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) +* Returns: + Original content if under limit, or truncated content with head and tail + preserved and reference to saved file if applicable + +### sanitize_openhands_mentions() + +Sanitize @OpenHands mentions in text to prevent self-mention loops. + +This function inserts a zero-width joiner (ZWJ) after the @ symbol in +@OpenHands mentions, making them non-clickable in GitHub comments while +preserving readability. The original case of the mention is preserved. + +* Parameters: + `text` – The text to sanitize +* Returns: + Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) + +### Examples + +```pycon +>>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") +'Thanks @u200dOpenHands for the help!' +>>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") +'Check @u200dopenhands and @u200dOPENHANDS' +>>> sanitize_openhands_mentions("No mention here") +'No mention here' +``` + +### sanitized_env() + +Return a copy of env with sanitized values. + +PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored +libraries win. This function restores the original value so that subprocess +will not use them. + +### warn_deprecated() + +Emit a deprecation warning for dynamic access to a legacy feature. + +Prefer this helper when a decorator is not practical—e.g. attribute accessors, +data migrations, or other runtime paths that must conditionally warn. Provide +explicit version metadata so the SDK reports consistent messages and upgrades +to `deprecation.UnsupportedWarning` after the removal threshold. + + +# openhands.sdk.workspace +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace + +### class BaseWorkspace + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for workspace implementations. + +Workspaces provide a sandboxed environment where agents can execute commands, +read/write files, and perform other operations. All workspace implementations +support the context manager protocol for safe resource management. + +#### Example + +```pycon +>>> with workspace: +... result = workspace.execute_command("echo 'hello'") +... content = workspace.read_file("example.txt") +``` + + +#### Properties + +- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] + +#### Methods + +#### abstractmethod execute_command() + +Execute a bash command on the system. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory for the command (optional) + * `timeout` – Timeout in seconds (defaults to 30.0) +* Returns: + Result containing stdout, stderr, exit_code, and other + : metadata +* Return type: + [CommandResult](#class-commandresult) +* Raises: + `Exception` – If command execution fails + +#### abstractmethod file_download() + +Download a file from the system. + +* Parameters: + * `source_path` – Path to the source file on the system + * `destination_path` – Path where the file should be downloaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file download fails + +#### abstractmethod file_upload() + +Upload a file to the system. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be uploaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file upload fails + +#### abstractmethod git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### abstractmethod git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace to conserve resources. + +For local workspaces, this is a no-op. +For container-based workspaces, this pauses the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support pausing. + +#### resume() + +Resume a paused workspace. + +For local workspaces, this is a no-op. +For container-based workspaces, this resumes the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support resuming. + +### class CommandResult + +Bases: `BaseModel` + +Result of executing a command in the workspace. + + +#### Properties + +- `command`: str +- `exit_code`: int +- `stderr`: str +- `stdout`: str +- `timeout_occurred`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class FileOperationResult + +Bases: `BaseModel` + +Result of a file upload or download operation. + + +#### Properties + +- `destination_path`: str +- `error`: str | None +- `file_size`: int | None +- `source_path`: str +- `success`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class LocalWorkspace + +Bases: [`BaseWorkspace`](#class-baseworkspace) + +Local workspace implementation that operates on the host filesystem. + +LocalWorkspace provides direct access to the local filesystem and command execution +environment. It’s suitable for development and testing scenarios where the agent +should operate directly on the host system. + +#### Example + +```pycon +>>> workspace = LocalWorkspace(working_dir="/path/to/project") +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + +#### Methods + +#### __init__() + +Create a new model by parsing and validating input data from keyword arguments. + +Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be +validated to form a valid model. + +self is explicitly positional-only to allow self as a field name. + +#### execute_command() + +Execute a bash command locally. + +Uses the shared shell execution utility to run commands with proper +timeout handling, output streaming, and error management. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, command, and + : timeout_occurred +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download (copy) a file locally. + +For local systems, file download is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload (copy) a file locally. + +For local systems, file upload is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace (no-op for local workspaces). + +Local workspaces have nothing to pause since they operate directly +on the host filesystem. + +#### resume() + +Resume the workspace (no-op for local workspaces). + +Local workspaces have nothing to resume since they operate directly +on the host filesystem. + +### class RemoteWorkspace + +Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) + +Remote workspace implementation that connects to an OpenHands agent server. + +RemoteWorkspace provides access to a sandboxed environment running on a remote +OpenHands agent server. This is the recommended approach for production deployments +as it provides better isolation and security. + +#### Example + +```pycon +>>> workspace = RemoteWorkspace( +... host="https://agent-server.example.com", +... working_dir="/workspace" +... ) +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + + +#### Properties + +- `alive`: bool + Check if the remote workspace is alive by querying the health endpoint. + * Returns: + True if the health endpoint returns a successful response, False otherwise. +- `client`: Client + +#### Methods + +#### execute_command() + +Execute a bash command on the remote system. + +This method starts a bash command via the remote agent server API, +then polls for the output until the command completes. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, and other metadata +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download a file from the remote system. + +Requests the file from the remote system via HTTP API and saves it locally. + +* Parameters: + * `source_path` – Path to the source file on remote system + * `destination_path` – Path where the file should be saved locally +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload a file to the remote system. + +Reads the local file and sends it to the remote system via HTTP API. + +* Parameters: + * `source_path` – Path to the local source file + * `destination_path` – Path where the file should be uploaded on remote system +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Override this method to perform additional initialization after __init__ and model_construct. +This is useful if you want to do some validation that requires the entire model to be initialized. + +#### reset_client() + +Reset the HTTP client to force re-initialization. + +This is useful when connection parameters (host, api_key) have changed +and the client needs to be recreated with new values. + +### class Workspace + +### class Workspace + +Bases: `object` + +Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. + +Usage: +: - Workspace(working_dir=…) -> LocalWorkspace + - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace + + +# Agent +Source: https://docs.openhands.dev/sdk/arch/agent + +The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. + +**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) + +## Core Responsibilities + +The Agent system has four primary responsibilities: + +1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history +2. **Tool Orchestration** - Select and execute tools, handle results and errors +3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) +4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +flowchart TB + subgraph Input[" "] + Events["Event History"] + Context["Agent Context
Skills + Prompts"] + end + + subgraph Core["Agent Core"] + Condense["Condenser
History compression"] + Reason["LLM Query
Generate actions"] + Security["Security Analyzer
Risk assessment"] + end + + subgraph Execution[" "] + Tools["Tool Executor
Action → Observation"] + Results["Observation Events"] + end + + Events --> Condense + Context -.->|Skills| Reason + Condense --> Reason + Reason --> Security + Security --> Tools + Tools --> Results + Results -.->|Feedback| Events + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Reason primary + class Condense,Security secondary + class Tools tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | +| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | +| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | +| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | +| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | + +## Reasoning-Action Loop + +The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% +flowchart TB + Start["step() called"] + Pending{"Pending
actions?"} + ExecutePending["Execute pending actions"] + + HasCondenser{"Has
condenser?"} + Condense["Call condenser.condense()"] + CondenseResult{"Result
type?"} + EmitCondensation["Emit Condensation event"] + UseView["Use View events"] + UseRaw["Use raw events"] + + Query["Query LLM with messages"] + ContextExceeded{"Context
window
exceeded?"} + EmitRequest["Emit CondensationRequest"] + + Parse{"Response
type?"} + CreateActions["Create ActionEvents"] + CreateMessage["Create MessageEvent"] + + Confirmation{"Need
confirmation?"} + SetWaiting["Set WAITING_FOR_CONFIRMATION"] + + Execute["Execute actions"] + Observe["Create ObservationEvents"] + + Return["Return"] + + Start --> Pending + Pending -->|Yes| ExecutePending --> Return + Pending -->|No| HasCondenser + + HasCondenser -->|Yes| Condense + HasCondenser -->|No| UseRaw + Condense --> CondenseResult + CondenseResult -->|Condensation| EmitCondensation --> Return + CondenseResult -->|View| UseView --> Query + UseRaw --> Query + + Query --> ContextExceeded + ContextExceeded -->|Yes| EmitRequest --> Return + ContextExceeded -->|No| Parse + + Parse -->|Tool calls| CreateActions + Parse -->|Message| CreateMessage --> Return + + CreateActions --> Confirmation + Confirmation -->|Yes| SetWaiting --> Return + Confirmation -->|No| Execute + + Execute --> Observe + Observe --> Return + + style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Step Execution Flow:** + +1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return +2. **Condensation:** If condenser exists: + - Call `condenser.condense()` with current event view + - If returns `View`: use condensed events for LLM query (continue in same step) + - If returns `Condensation`: emit event and return (will be processed next step) +3. **LLM Query:** Query LLM with messages from event history + - If context window exceeded: emit `CondensationRequest` and return +4. **Response Parsing:** Parse LLM response into events + - Tool calls → create `ActionEvent`(s) + - Text message → create `MessageEvent` and return +5. **Confirmation Check:** If actions need user approval: + - Set conversation status to `WAITING_FOR_CONFIRMATION` and return +6. **Action Execution:** Execute tools and create `ObservationEvent`(s) + +**Key Characteristics:** +- **Stateless:** Agent holds no mutable state between steps +- **Event-Driven:** Reads from event history, writes new events +- **Interruptible:** Each step is atomic and can be paused/resumed + +## Agent Context + +The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Context["AgentContext"] + + subgraph Skills["Skills"] + Repo["repo
Always active"] + Knowledge["knowledge
Trigger-based"] + end + SystemAug["System prompt prefix/suffix
Per-conversation"] + System["Prompt template
Per-conversation"] + + subgraph Application["Applied to LLM"] + SysPrompt["System Prompt"] + UserMsg["User Messages"] + end + + Context --> Skills + Context --> SystemAug + Repo --> SysPrompt + Knowledge -.->|When triggered| UserMsg + System --> SysPrompt + SystemAug --> SysPrompt + + style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Skill Type | Activation | Use Case | +|------------|------------|----------| +| **repo** | Always included | Project-specific context, conventions | +| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | + +Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. + + +## Tool Execution + +Tools follow a **strict action-observation pattern**: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + LLM["LLM generates tool_call"] + Convert["Convert to ActionEvent"] + + Decision{"Confirmation
mode?"} + Defer["Store as pending"] + + Execute["Execute tool"] + Success{"Success?"} + + Obs["ObservationEvent
with result"] + Error["ObservationEvent
with error"] + + LLM --> Convert + Convert --> Decision + + Decision -->|Yes| Defer + Decision -->|No| Execute + + Execute --> Success + Success -->|Yes| Obs + Success -->|No| Error + + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation + +## Component Relationships + +### How Agent Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries + + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction + + +# Agent Server Package +Source: https://docs.openhands.dev/sdk/arch/agent-server + +The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. + +**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) + +## Purpose + +The Agent Server enables: +- **Remote execution**: Clients interact with agents via HTTP API +- **Multi-user isolation**: Each user gets isolated workspace +- **Container orchestration**: Manages Docker containers for workspaces +- **Centralized management**: Monitor and control all agents +- **Scalability**: Horizontal scaling with multiple servers + +## Architecture Overview + +```mermaid +graph TB + Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] + + API --> Auth[Authentication] + API --> Router[API Router] + + Router --> WS[Workspace Manager] + Router --> Conv[Conversation Handler] + + WS --> Docker[Docker Manager] + Docker --> C1[Container 1
User A] + Docker --> C2[Container 2
User B] + Docker --> C3[Container 3
User C] + + Conv --> Agent[Software Agent SDK] + Agent --> C1 + Agent --> C2 + Agent --> C3 + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style WS fill:#e8f5e8 + style Docker fill:#f3e5f5 + style Agent fill:#fce4ec +``` + +### Key Components + +**1. FastAPI Server** +- HTTP REST API endpoints +- Authentication and authorization +- Request validation +- WebSocket support for streaming + +**2. Workspace Manager** +- Creates and manages Docker containers +- Isolates workspaces per user +- Handles container lifecycle +- Manages resource limits + +**3. Conversation Handler** +- Routes requests to appropriate workspace +- Manages conversation state +- Handles concurrent requests +- Supports streaming responses + +**4. Docker Manager** +- Interfaces with Docker daemon +- Builds and pulls images +- Creates and destroys containers +- Monitors container health + +## Design Decisions + +### Why HTTP API? + +Alternative approaches considered: +- **gRPC**: More efficient but harder for web clients +- **WebSockets only**: Good for streaming but not RESTful +- **HTTP + WebSockets**: Best of both worlds + +**Decision**: HTTP REST for operations, WebSockets for streaming +- ✅ Works from any client (web, mobile, CLI) +- ✅ Easy to debug (curl, Postman) +- ✅ Standard authentication (API keys, OAuth) +- ✅ Streaming where needed + +### Why Container Per User? + +Alternative approaches: +- **Shared container**: Multiple users in one container +- **Container per session**: New container each conversation +- **Container per user**: One container per user (chosen) + +**Decision**: Container per user +- ✅ Strong isolation between users +- ✅ Persistent workspace across sessions +- ✅ Better resource management +- ⚠️ More containers, but worth it for isolation + +### Why FastAPI? + +Alternative frameworks: +- **Flask**: Simpler but less type-safe +- **Django**: Too heavyweight +- **FastAPI**: Modern, fast, type-safe (chosen) + +**Decision**: FastAPI +- ✅ Automatic API documentation (OpenAPI) +- ✅ Type validation with Pydantic +- ✅ Async support for performance +- ✅ WebSocket support built-in + +## API Design + +### Key Endpoints + +**Workspace Management** +``` +POST /workspaces Create new workspace +GET /workspaces/{id} Get workspace info +DELETE /workspaces/{id} Delete workspace +POST /workspaces/{id}/execute Execute command +``` + +**Conversation Management** +``` +POST /conversations Create conversation +GET /conversations/{id} Get conversation +POST /conversations/{id}/messages Send message +GET /conversations/{id}/stream Stream responses (WebSocket) +``` + +**Health & Monitoring** +``` +GET /health Server health check +GET /metrics Prometheus metrics +``` + +### Authentication + +**API Key Authentication** +```bash +curl -H "Authorization: Bearer YOUR_API_KEY" \ + https://agent-server.example.com/conversations +``` + +**Per-user workspace isolation** +- API key → user ID mapping +- Each user gets separate workspace +- Users can't access each other's workspaces + +### Streaming Responses + +**WebSocket for real-time updates** +```python +async with websocket_connect(url) as ws: + # Send message + await ws.send_json({"message": "Hello"}) + + # Receive events + async for event in ws: + if event["type"] == "message": + print(event["content"]) +``` + +**Why streaming?** +- Real-time feedback to users +- Show agent thinking process +- Better UX for long-running tasks + +## Deployment Models + +### 1. Local Development + +Run server locally for testing: +```bash +# Start server +openhands-agent-server --port 8000 + +# Or with Docker +docker run -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +**Use case**: Development and testing + +### 2. Single-Server Deployment + +Deploy on one server (VPS, EC2, etc.): +```bash +# Install +pip install openhands-agent-server + +# Run with systemd/supervisor +openhands-agent-server \ + --host 0.0.0.0 \ + --port 8000 \ + --workers 4 +``` + +**Use case**: Small deployments, prototypes, MVPs + +### 3. Multi-Server Deployment + +Scale horizontally with load balancer: +``` + Load Balancer + | + +-------------+-------------+ + | | | + Server 1 Server 2 Server 3 + (Agents) (Agents) (Agents) + | | | + +-------------+-------------+ + | + Shared State Store + (Database, Redis, etc.) +``` + +**Use case**: Production SaaS, high traffic, need redundancy + +### 4. Kubernetes Deployment + +Container orchestration with Kubernetes: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + template: + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 +``` + +**Use case**: Enterprise deployments, auto-scaling, high availability + +## Resource Management + +### Container Limits + +Set per-workspace resource limits: +```python +# In server configuration +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "2g", # 2GB RAM + "cpus": "2", # 2 CPU cores + "disk": "10g" # 10GB disk + }, + "timeout": 300, # 5 min timeout +} +``` + +**Why limit resources?** +- Prevent one user from consuming all resources +- Fair usage across users +- Protect server from runaway processes +- Cost control + +### Cleanup & Garbage Collection + +**Container lifecycle**: +- Containers created on first use +- Kept alive between requests (warm) +- Cleaned up after inactivity timeout +- Force cleanup on server shutdown + +**Storage management**: +- Old workspaces deleted automatically +- Disk usage monitored +- Alerts when approaching limits + +## Security Considerations + +### Multi-Tenant Isolation + +**Container isolation**: +- Each user gets separate container +- Containers can't communicate +- Network isolation (optional) +- File system isolation + +**API isolation**: +- API keys mapped to users +- Users can only access their workspaces +- Server validates all permissions + +### Input Validation + +**Server validates**: +- API request schemas +- Command injection attempts +- Path traversal attempts +- File size limits + +**Defense in depth**: +- API validation +- Container validation +- Docker security features +- OS-level security + +### Network Security + +**Best practices**: +- HTTPS only (TLS certificates) +- Firewall rules (only port 443/8000) +- Rate limiting +- DDoS protection + +**Container networking**: +```python +# Disable network for workspace +WORKSPACE_CONFIG = { + "network_mode": "none" # No network access +} + +# Or allow specific hosts +WORKSPACE_CONFIG = { + "allowed_hosts": ["api.example.com"] +} +``` + +## Monitoring & Observability + +### Health Checks + +```bash +# Simple health check +curl https://agent-server.example.com/health + +# Response +{ + "status": "healthy", + "docker": "connected", + "workspaces": 15, + "uptime": 86400 +} +``` + +### Metrics + +**Prometheus metrics**: +- Request count and latency +- Active workspaces +- Container resource usage +- Error rates + +**Logging**: +- Structured JSON logs +- Per-request tracing +- Workspace events +- Error tracking + +### Alerting + +**Alert on**: +- Server down +- High error rate +- Resource exhaustion +- Container failures + +## Client SDK + +Python SDK for interacting with Agent Server: + +```python +from openhands.client import AgentServerClient + +client = AgentServerClient( + url="https://agent-server.example.com", + api_key="your-api-key" +) + +# Create conversation +conversation = client.create_conversation() + +# Send message +response = client.send_message( + conversation_id=conversation.id, + message="Hello, agent!" +) + +# Stream responses +for event in client.stream_conversation(conversation.id): + if event.type == "message": + print(event.content) +``` + +**Client handles**: +- Authentication +- Request/response serialization +- Error handling +- Streaming +- Retries + +## Cost Considerations + +### Server Costs + +**Compute**: CPU and memory for containers +- Each active workspace = 1 container +- Typically 1-2 GB RAM per workspace +- 0.5-1 CPU core per workspace + +**Storage**: Workspace files and conversation state +- ~1-10 GB per workspace (depends on usage) +- Conversation history in database + +**Network**: API requests and responses +- Minimal (mostly text) +- Streaming adds bandwidth + +### Cost Optimization + +**1. Idle timeout**: Shutdown containers after inactivity +```python +WORKSPACE_CONFIG = { + "idle_timeout": 3600 # 1 hour +} +``` + +**2. Resource limits**: Don't over-provision +```python +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "1g", # Smaller limit + "cpus": "0.5" # Fractional CPU + } +} +``` + +**3. Shared resources**: Use single server for multiple low-traffic apps + +**4. Auto-scaling**: Scale servers based on demand + +## When to Use Agent Server + +### Use Agent Server When: + +✅ **Multi-user system**: Web app with many users +✅ **Remote clients**: Mobile app, web frontend +✅ **Centralized management**: Need to monitor all agents +✅ **Workspace isolation**: Users shouldn't interfere +✅ **SaaS product**: Building agent-as-a-service +✅ **Scaling**: Need to handle concurrent users + +**Examples**: +- Chatbot platforms +- Code assistant web apps +- Agent marketplaces +- Enterprise agent deployments + +### Use Standalone SDK When: + +✅ **Single-user**: Personal tool or script +✅ **Local execution**: Running on your machine +✅ **Full control**: Need programmatic access +✅ **Simpler deployment**: No server management +✅ **Lower latency**: No network overhead + +**Examples**: +- CLI tools +- Automation scripts +- Local development +- Desktop applications + +### Hybrid Approach + +Use SDK locally but RemoteAPIWorkspace for execution: +- Agent logic in your Python code +- Execution happens on remote server +- Best of both worlds + +## Building Custom Agent Server + +The server is extensible for custom needs: + +**Custom authentication**: +```python +from openhands.agent_server import AgentServer + +class CustomAgentServer(AgentServer): + async def authenticate(self, request): + # Custom auth logic + return await oauth_verify(request) +``` + +**Custom workspace configuration**: +```python +server = AgentServer( + workspace_factory=lambda user: DockerWorkspace( + image=f"custom-image-{user.tier}", + resource_limits=user.resource_limits + ) +) +``` + +**Custom middleware**: +```python +@server.middleware +async def logging_middleware(request, call_next): + # Custom logging + response = await call_next(request) + return response +``` + +## Next Steps + +### For Usage Examples + +- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API +- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options + +### For Related Architecture + +- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details +- [SDK Architecture](/sdk/arch/sdk) - Core framework +- [Architecture Overview](/sdk/arch/overview) - System design + +### For Implementation Details + +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + + +# Condenser +Source: https://docs.openhands.dev/sdk/arch/condenser + +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +## Core Responsibilities + +The Condenser system has four primary responsibilities: + +1. **History Compression** - Reduce event lists to fit within context windows +2. **Threshold Detection** - Determine when condensation should trigger +3. **Summary Generation** - Create meaningful summaries via LLM or heuristics +4. **View Management** - Transform event history into LLM-ready views + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["CondenserBase
Abstract base"] + end + + subgraph Implementations["Concrete Implementations"] + NoOp["NoOpCondenser
No compression"] + LLM["LLMSummarizingCondenser
LLM-based"] + Pipeline["PipelineCondenser
Multi-stage"] + end + + subgraph Process["Condensation Process"] + View["View
Event history"] + Check["should_condense()?"] + Condense["get_condensation()"] + Result["View | Condensation"] + end + + subgraph Output["Condensation Output"] + CondEvent["Condensation Event
Summary metadata"] + NewView["Condensed View
Reduced tokens"] + end + + Base --> NoOp + Base --> LLM + Base --> Pipeline + + View --> Check + Check -->|Yes| Condense + Check -->|No| Result + Condense --> CondEvent + CondEvent --> NewView + NewView --> Result + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class LLM,Pipeline secondary + class Check,Condense tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | +| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | +| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | +| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | +| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | +| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | + +## Condenser Types + +### NoOpCondenser + +Pass-through condenser that performs no compression: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["View"] + NoOp["NoOpCondenser"] + Same["Same View"] + + View --> NoOp --> Same + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +### LLMSummarizingCondenser + +Uses an LLM to generate summaries of conversation history: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + View["Long View
120+ events"] + Check["Threshold
exceeded?"] + Summarize["LLM Summarization"] + Summary["Summary Text"] + Metadata["Condensation Event"] + AddToHistory["Add to History"] + NextStep["Next Step: View.from_events()"] + NewView["Condensed View"] + + View --> Check + Check -->|Yes| Summarize + Summarize --> Summary + Summary --> Metadata + Metadata --> AddToHistory + AddToHistory --> NextStep + NextStep --> NewView + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Process:** +1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) +2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) +3. **LLM Call:** Generate summary of middle events using dedicated LLM +4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` +5. **Add to History:** Agent adds `Condensation` to event log and returns early +6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary + +**Configuration:** +- **`max_size`:** Event count threshold before condensation triggers (default: 120) +- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) +- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) + +### PipelineCondenser + +Chains multiple condensers in sequence: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["Original View"] + C1["Condenser 1"] + C2["Condenser 2"] + C3["Condenser 3"] + Final["Final View"] + + View --> C1 --> C2 --> C3 --> Final + + style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) + +## Condensation Flow + +### Trigger Mechanisms + +Condensers can be triggered in two ways: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Automatic["Automatic Trigger"] + Agent1["Agent Step"] + Build1["View.from_events()"] + Check1["condenser.condense(view)"] + Trigger1["should_condense()?"] + end + + Agent1 --> Build1 --> Check1 --> Trigger1 + + style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Automatic Trigger:** +- **When:** Threshold exceeded (e.g., event count > `max_size`) +- **Who:** Agent calls `condenser.condense()` each step +- **Purpose:** Proactively keep context within limits + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Manual["Manual Trigger"] + Error["LLM Context Error"] + Request["CondensationRequest Event"] + NextStep["Next Agent Step"] + Trigger2["condense() detects request"] + end + + Error --> Request --> NextStep --> Trigger2 + + style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` +**Manual Trigger:** +- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) +- **Who:** Agent (on LLM context window error) or application code +- **Purpose:** Force compression when context limit exceeded + +### Condensation Workflow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent calls condense(view)"] + + Decision{"should_condense?"} + + ReturnView["Return View
Agent proceeds"] + + Extract["Select Events to Keep/Forget"] + Generate["LLM Generates Summary"] + Create["Create Condensation Event"] + ReturnCond["Return Condensation"] + AddHistory["Agent adds to history"] + NextStep["Next Step: View.from_events()"] + FilterEvents["Filter forgotten events"] + InsertSummary["Insert summary at offset"] + NewView["New condensed view"] + + Start --> Decision + Decision -->|No| ReturnView + Decision -->|Yes| Extract + Extract --> Generate + Generate --> Create + Create --> ReturnCond + ReturnCond --> AddHistory + AddHistory --> NextStep + NextStep --> FilterEvents + FilterEvents --> InsertSummary + InsertSummary --> NewView + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Key Steps:** + +1. **Threshold Check:** `should_condense()` determines if condensation needed +2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) +3. **Summary Generation:** LLM creates compressed representation of forgotten events +4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary +5. **Return to Agent:** Condenser returns `Condensation` (not `View`) +6. **History Update:** Agent adds `Condensation` to event log and exits step +7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary + +## View and Condensation + +### View Structure + +A `View` represents the conversation history as it will be sent to the LLM: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Full Event List
+ Condensation events"] + FromEvents["View.from_events()"] + Filter["Filter forgotten events"] + Insert["Insert summary"] + View["View
LLMConvertibleEvents"] + Convert["events_to_messages()"] + LLM["LLM Input"] + + Events --> FromEvents + FromEvents --> Filter + Filter --> Insert + Insert --> View + View --> Convert + Convert --> LLM + + style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**View Components:** +- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) +- **`unhandled_condensation_request`:** Flag for pending manual condensation +- **`condensations`:** List of all Condensation events processed +- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics + +### Condensation Event + +When condensation occurs, a `Condensation` event is created: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Old["Middle Events
~60 events"] + Summary["Summary Text
LLM-generated"] + Event["Condensation Event
forgotten_event_ids"] + Applied["View.from_events()"] + New["New View
~60 events + summary"] + + Old -.->|Summarized| Summary + Summary --> Event + Event --> Applied + Applied --> New + + style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Condensation Fields:** +- **`forgotten_event_ids`:** List of event IDs to filter out +- **`summary`:** Compressed text representation of forgotten events +- **`summary_offset`:** Index where summary event should be inserted +- Inherits from `Event`: `id`, `timestamp`, `source` + +## Rolling Window Pattern + +`RollingCondenser` implements a common pattern for threshold-based condensation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + View["Current View
120+ events"] + Check["Count Events"] + + Compare{"Count >
max_size?"} + + Keep["Keep All Events"] + + Split["Split Events"] + Head["Head
First 4 events"] + Middle["Middle
~56 events"] + Tail["Tail
~56 events"] + Summarize["LLM Summarizes Middle"] + Result["Head + Summary + Tail
~60 events total"] + + View --> Check + Check --> Compare + + Compare -->|Under| Keep + Compare -->|Over| Split + + Split --> Head + Split --> Middle + Split --> Tail + + Middle --> Summarize + Head --> Result + Summarize --> Result + Tail --> Result + + style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Rolling Window Strategy:** +1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts +2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context +3. **Summarize Middle:** Compress events between head and tail into summary +4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) + +## Component Relationships + +### How Condenser Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Condenser["Condenser"] + State["Conversation State"] + Events["Event Log"] + + Agent -->|"View.from_events()"| State + State -->|View| Agent + Agent -->|"condense(view)"| Condenser + Condenser -->|"View | Condensation"| Agent + Agent -->|Adds Condensation| Events + + style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → State**: Calls `View.from_events()` to get current view +- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered +- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) +- **Agent → Events**: Adds `Condensation` event to log when returned + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning +- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management +- **[Events](/sdk/arch/events)** - Condensation event type and append-only log +- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers + + +# Conversation +Source: https://docs.openhands.dev/sdk/arch/conversation + +The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. + +**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) + +## Core Responsibilities + +The Conversation system has four primary responsibilities: + +1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents +2. **State Orchestration** - Maintain conversation history, events, and execution status +3. **Workspace Coordination** - Bridge agent operations with execution environments +4. **Runtime Services** - Provide persistence, monitoring, security, and visualization + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart LR + User["User Code"] + + subgraph Factory[" "] + Entry["Conversation()"] + end + + subgraph Implementations[" "] + Local["LocalConversation
Direct execution"] + Remote["RemoteConversation
Via agent-server API"] + end + + subgraph Core[" "] + State["ConversationState
• agent
workspace • stats • ..."] + EventLog["ConversationState.events
Event storage"] + end + + User --> Entry + Entry -.->|LocalWorkspace| Local + Entry -.->|RemoteWorkspace| Remote + + Local --> State + Remote --> State + + State --> EventLog + + classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + + class Entry factory + class Local,Remote impl + class State,EventLog core + class Persist,Stuck,Viz,Secrets service +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | +| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | +| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | +| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | +| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | + +## Factory Pattern + +The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Input["Conversation(agent, workspace)"] + Check{Workspace Type?} + Local["LocalConversation
Agent runs in-process"] + Remote["RemoteConversation
Agent runs via API"] + + Input --> Check + Check -->|str or LocalWorkspace| Local + Check -->|RemoteWorkspace| Remote + + style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Dispatch Logic:** +- **Local:** String paths or `LocalWorkspace` → in-process execution +- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket + +This abstraction enables switching deployment modes without code changes—just swap the workspace type. + +## State Management + +State updates follow a **two-path pattern** depending on the type of change: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["State Update Request"] + Lock["Acquire FIFO Lock"] + Decision{New Event?} + + StateOnly["Update State Fields
stats, status, metadata"] + EventPath["Append to Event Log
messages, actions, observations"] + + Callback["Trigger Callbacks"] + Release["Release Lock"] + + Start --> Lock + Lock --> Decision + Decision -->|No| StateOnly + Decision -->|Yes| EventPath + StateOnly --> Callback + EventPath --> Callback + Callback --> Release + + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Two Update Patterns:** + +1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) +2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur + +**Thread Safety:** +- FIFO Lock ensures ordered, atomic updates +- Callbacks fire after successful commit +- Read operations never block writes + +## Execution Models + +The conversation system supports two execution models with identical APIs: + +### Local vs Remote Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Local["LocalConversation"] + L1["User sends message"] + L2["Agent executes in-process"] + L3["Direct tool calls"] + L4["Events via callbacks"] + L1 --> L2 --> L3 --> L4 + end + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Remote["RemoteConversation"] + R1["User sends message"] + R2["HTTP → Agent Server"] + R3["Isolated container execution"] + R4["WebSocket event stream"] + R1 --> R2 --> R3 --> R4 + end + style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Aspect | LocalConversation | RemoteConversation | +|--------|-------------------|-------------------| +| **Execution** | In-process | Remote container/server | +| **Communication** | Direct function calls | HTTP + WebSocket | +| **State Sync** | Immediate | Network serialized | +| **Use Case** | Development, CLI tools | Production, web apps | +| **Isolation** | Process-level | Container-level | + +**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. + +## Auxiliary Services + +The conversation system provides pluggable services that operate independently on the event stream: + +| Service | Purpose | Architecture Pattern | +|---------|---------|---------------------| +| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | +| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | +| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | +| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | + +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point + +## Component Relationships + +### How Conversation Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Conv["Conversation"] + Agent["Agent"] + WS["Workspace"] + Tools["Tools"] + LLM["LLM"] + + Conv -->|Delegates to| Agent + Conv -->|Configures| WS + Agent -.->|Updates| Conv + Agent -->|Uses| Tools + Agent -->|Queries| LLM + + style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: One-way orchestration, agent reports back via state updates +- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation +- **Agent → Conversation**: Indirect via state events + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design +- **[Event System](/sdk/arch/events)** - Event types and flow +- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples + + +# Design Principles +Source: https://docs.openhands.dev/sdk/arch/design + +The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. + +[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. + +## Optional Isolation over Mandatory Sandboxing + + +**V0 Challenge:** +Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. +Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. + + +**V1 Principle:** +**Sandboxing should be opt-in, not universal.** +V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. +When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. + +## Stateless by Default, One Source of Truth for State + + +**V0 Challenge:** +V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. + + +**V1 Principle:** +**Keep everything stateless, with exactly one mutable state.** +All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. +The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. + +## Clear Boundaries between Agent and Applications + + +**V0 Challenge:** +The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. +Heavy research dependencies and benchmark integrations further bloated production builds. + + +**V1 Principle:** +**Maintain strict separation of concerns.** +V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). +Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. + + +## Composable Components for Extensibility + + +**V0 Challenge:** +Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. + + +**V1 Principle:** +**Everything should be composable and safe to extend.** +Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. +Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. + + +# Events +Source: https://docs.openhands.dev/sdk/arch/events + +The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. + +**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +## Core Responsibilities + +The Event System has four primary responsibilities: + +1. **Type Safety** - Enforce event schemas through Pydantic models +2. **LLM Integration** - Convert events to/from LLM message formats +3. **Append-Only Log** - Maintain immutable event history +4. **Service Integration** - Enable observers to react to event streams + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% +flowchart TB + Base["Event
Base class"] + LLMBase["LLMConvertibleEvent
Abstract base"] + + subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] + Message["MessageEvent
User/assistant text"] + Action["ActionEvent
Tool calls"] + System["SystemPromptEvent
Initial system prompt"] + CondSummary["CondensationSummaryEvent
Condenser summary"] + + ObsBase["ObservationBaseEvent
Base for tool responses"] + Observation["ObservationEvent
Tool results"] + UserReject["UserRejectObservation
User rejected action"] + AgentError["AgentErrorEvent
Agent error"] + end + + subgraph Internals["Internal Events
NOT visible to the LLM"] + ConvState["ConversationStateUpdateEvent
State updates"] + CondReq["CondensationRequest
Request compression"] + Cond["Condensation
Compression result"] + Pause["PauseEvent
User pause"] + end + + Base --> LLMBase + Base --> Internals + LLMBase --> LLMTypes + ObsBase --> Observation + ObsBase --> UserReject + ObsBase --> AgentError + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base,LLMBase,Message,Action,SystemPromptEvent primary + class ObsBase,Observation,UserReject,AgentError secondary + class ConvState,CondReq,Cond,Pause tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | +| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | +| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | +| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | +| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | +| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | +| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | +| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | +| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | +| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | +| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | +| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | +| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | + +## Event Types + +### LLM-Convertible Events + +Events that participate in agent reasoning and can be converted to LLM messages: + + +| Event Type | Source | Content | LLM Role | +|------------|--------|---------|----------| +| **MessageEvent (user)** | user | Text, images | `user` | +| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | +| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | +| **ObservationEvent** | environment | Tool execution result | `tool` | +| **UserRejectObservation** | environment | Rejection reason | `tool` | +| **AgentErrorEvent** | agent | Error details | `tool` | +| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | +| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | + +The event system bridges agent events to LLM messages: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event List"] + Filter["Filter LLMConvertibleEvent"] + Group["Group ActionEvents
by llm_response_id"] + Convert["Convert to Messages"] + LLM["LLM Input"] + + Events --> Filter + Filter --> Group + Group --> Convert + Convert --> LLM + + style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Special Handling - Parallel Function Calling:** + +When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): +1. Group all ActionEvents by `llm_response_id` +2. Combine into single Message with multiple `tool_calls` +3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included +4. All subsequent events in the batch have empty thought fields + +**Example:** +``` +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + + +### Internal Events + +Events for metadata, control flow, and user actions (not sent to LLM): + +| Event Type | Source | Purpose | Key Fields | +|------------|--------|---------|------------| +| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | +| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | +| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + +## Component Relationships + +### How Events Integrate + +## `source` vs LLM `role` + +Events often carry **two different concepts** that are easy to confuse: + +- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. +- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. + +These fields are **intentionally independent**. + +Common examples include: + +- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. +- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. + +**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event System"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + Services["Auxiliary Services"] + + Agent -->|Reads| Events + Agent -->|Writes| Events + Conversation -->|Manages| Events + Tools -->|Creates| Events + Events -.->|Stream| Services + + style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Events**: Reads history for context, writes actions/messages +- **Conversation → Events**: Owns and persists event log +- **Tools → Events**: Create ObservationEvents after execution +- **Services → Events**: Read-only observers for monitoring, visualization + +## Error Events: Agent vs Conversation + +Two distinct error events exist in the SDK, with different purpose and visibility: + +- AgentErrorEvent + - Type: ObservationBaseEvent (LLM-convertible) + - Scope: Error for a specific tool call (has tool_name and tool_call_id) + - Source: "agent" + - LLM visibility: Sent as a tool message so the model can react/recover + - Effect: Conversation continues; not a terminal state + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py + +- ConversationErrorEvent + - Type: Event (not LLM-convertible) + - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) + - Source: typically "environment" + - LLM visibility: Not sent to the model + - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events +- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management +- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation +- **[Condenser](/sdk/arch/condenser)** - Event history compression + + +# LLM +Source: https://docs.openhands.dev/sdk/arch/llm + +The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. + +**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) + +## Core Responsibilities + +The LLM system has five primary responsibilities: + +1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers +2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) +3. **Configuration Management** - Load from environment, JSON, or programmatic configuration +4. **Telemetry & Cost** - Track usage, latency, and costs across providers +5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% +flowchart TB + subgraph Configuration["Configuration Sources"] + Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] + JSON["JSON Files
config/llm.json"] + Code["Programmatic
LLM(...)"] + end + + subgraph Core["Core LLM"] + Model["LLM Model
Pydantic configuration"] + Pipeline["Request Pipeline
Retry, timeout, telemetry"] + end + + subgraph Backend["LiteLLM Backend"] + Providers["100+ Providers
OpenAI, Anthropic, etc."] + end + + subgraph Output["Telemetry"] + Usage["Token Usage"] + Cost["Cost Tracking"] + Latency["Latency Metrics"] + end + + Env --> Model + JSON --> Model + Code --> Model + + Model --> Pipeline + Pipeline --> Providers + + Pipeline --> Usage + Pipeline --> Cost + Pipeline --> Latency + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Model primary + class Pipeline secondary + class LiteLLM tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | +| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | +| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | +| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | +| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | +| **Telemetry** | Usage tracking | Token counts, costs, latency | + +## Configuration + +See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. + +### Programmatic Configuration + +Create LLM instances directly in code: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Code["Python Code"] + LLM["LLM(model=...)"] + Agent["Agent"] + + Code --> LLM + LLM --> Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Example:** +```python +from pydantic import SecretStr +from openhands.sdk import LLM + +llm = LLM( + model="anthropic/claude-sonnet-4.1", + api_key=SecretStr("sk-ant-123"), + temperature=0.1, + timeout=120, +) +``` + +### Environment Variable Configuration + +Load from environment using naming convention: + +**Environment Variable Pattern:** +- **Prefix:** All variables start with `LLM_` +- **Mapping:** `LLM_FIELD` → `field` (lowercased) +- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr + +**Common Variables:** +```bash +export LLM_MODEL="anthropic/claude-sonnet-4.1" +export LLM_API_KEY="sk-ant-123" +export LLM_USAGE_ID="primary" +export LLM_TIMEOUT="120" +export LLM_NUM_RETRIES="5" +``` + +### JSON Configuration + +Serialize and load from JSON files: + +**Example:** +```python +# Save +llm.model_dump_json(exclude_none=True, indent=2) + +# Load +llm = LLM.load_from_json("config/llm.json") +``` + +**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). +If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. + + +## Request Pipeline + +### Completion Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% +flowchart TB + Request["completion() or responses() call"] + Validate["Validate Config"] + + Attempt["LiteLLM Request"] + Success{"Success?"} + + Retry{"Retries
remaining?"} + Wait["Exponential Backoff"] + + Telemetry["Record Telemetry"] + Response["Return Response"] + Error["Raise Error"] + + Request --> Validate + Validate --> Attempt + Attempt --> Success + + Success -->|Yes| Telemetry + Success -->|No| Retry + + Retry -->|Yes| Wait + Retry -->|No| Error + + Wait --> Attempt + Telemetry --> Response + + style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Pipeline Stages:** + +1. **Validation:** Check required fields (model, messages) +2. **Request:** Call LiteLLM with provider-specific formatting +3. **Retry Logic:** Exponential backoff on failures (configurable) +4. **Telemetry:** Record tokens, cost, latency +5. **Response:** Return completion or raise error + +### Responses API Support + +In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. + +#### Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Check{"Model supports
Responses API?"} + + subgraph Standard["Standard Path"] + ChatFormat["Format as
Chat Messages"] + ChatCall["litellm.completion()"] + end + + subgraph ResponsesPath["Responses Path"] + RespFormat["Format as
instructions + input[]"] + RespCall["litellm.responses()"] + end + + ChatResponse["ModelResponse"] + RespResponse["ResponsesAPIResponse"] + + Parse["Parse to Message"] + Return["LLMResponse"] + + Check -->|No| ChatFormat + Check -->|Yes| RespFormat + + ChatFormat --> ChatCall + RespFormat --> RespCall + + ChatCall --> ChatResponse + RespCall --> RespResponse + + ChatResponse --> Parse + RespResponse --> Parse + + Parse --> Return + + style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +#### Supported Models + +Models that automatically use the Responses API path: + +| Pattern | Examples | Documentation | +|---------|----------|---------------| +| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | + +**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). + + +## Provider Integration + +### LiteLLM Abstraction + +Software Agent SDK uses LiteLLM for provider abstraction: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + SDK["Software Agent SDK"] + LiteLLM["LiteLLM"] + + subgraph Providers["100+ Providers"] + OpenAI["OpenAI"] + Anthropic["Anthropic"] + Google["Google"] + Azure["Azure"] + Others["..."] + end + + SDK --> LiteLLM + LiteLLM --> OpenAI + LiteLLM --> Anthropic + LiteLLM --> Google + LiteLLM --> Azure + LiteLLM --> Others + + style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Benefits:** +- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. +- **Unified API:** Same interface regardless of provider +- **Format Translation:** Provider-specific request/response formatting +- **Error Handling:** Normalized error codes and messages + +### LLM Providers + +Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. +The pages linked below live under the OpenHands app section but apply +verbatim to SDK applications because both layers wrap the same +`openhands.sdk.llm.LLM` interface. + +| Provider / scenario | Documentation | +| --- | --- | +| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | +| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | +| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | +| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | +| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | +| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | +| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | +| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | +| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | +| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | + +When you follow any of those guides while building with the SDK, create an +`LLM` object using the documented parameters (for example, API keys, base URLs, +or custom headers) and pass it into your agent or registry. The OpenHands UI +surfacing is simply a convenience layer on top of the same configuration model. + + +## Telemetry and Cost Tracking + +### Telemetry Collection + +LLM requests automatically collect metrics: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Request["LLM Request"] + + subgraph Metrics + Tokens["Token Counts
Input/Output"] + Cost["Cost
USD"] + Latency["Latency
ms"] + end + + Events["Event Log"] + + Request --> Tokens + Request --> Cost + Request --> Latency + + Tokens --> Events + Cost --> Events + Latency --> Events + + style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Tracked Metrics:** +- **Token Usage:** Input tokens, output tokens, total +- **Cost:** Per-request cost using configured rates +- **Latency:** Request duration in milliseconds +- **Errors:** Failure types and retry counts + +### Cost Configuration + +Configure per-token costs for custom models: + +```python +llm = LLM( + model="custom/my-model", + input_cost_per_token=0.00001, # $0.01 per 1K tokens + output_cost_per_token=0.00003, # $0.03 per 1K tokens +) +``` + +**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) + +**Custom Costs:** Override for: +- Internal models +- Custom pricing agreements +- Cost estimation for budgeting + +## Component Relationships + +### How LLM Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + LLM["LLM"] + Agent["Agent"] + Conversation["Conversation"] + Events["Events"] + Security["Security Analyzer"] + Condenser["Context Condenser"] + + Agent -->|Uses| LLM + LLM -->|Records| Events + Security -.->|Optional| LLM + Condenser -.->|Optional| LLM + Conversation -->|Provides context| Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → LLM**: Agent uses LLM for reasoning and tool calls +- **LLM → Events**: LLM requests/responses recorded as events +- **Security → LLM**: Optional security analyzer can use separate LLM +- **Condenser → LLM**: Optional context condenser can use separate LLM +- **Configuration**: LLM configured independently, passed to agent +- **Telemetry**: LLM metrics flow through event system to UI/logging + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions +- **[Events](/sdk/arch/events)** - LLM request/response event types +- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis +- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration + + +# MCP Integration +Source: https://docs.openhands.dev/sdk/arch/mcp + +The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. + +**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +## Core Responsibilities + +The MCP Integration system has four primary responsibilities: + +1. **MCP Client Management** - Connect to and communicate with MCP servers +2. **Tool Discovery** - Enumerate available tools from MCP servers +3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions +4. **Execution Bridge** - Execute MCP tool calls from agent actions + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Client["MCP Client"] + Sync["MCPClient
Sync/Async bridge"] + Async["AsyncMCPClient
FastMCP base"] + end + + subgraph Bridge["Tool Bridge"] + Def["MCPToolDefinition
Schema conversion"] + Exec["MCPToolExecutor
Execution handler"] + end + + subgraph Integration["Agent Integration"] + Action["MCPToolAction
Dynamic model"] + Obs["MCPToolObservation
Result wrapper"] + end + + subgraph External["External"] + Server["MCP Server
stdio/HTTP"] + Tools["External Tools"] + end + + Sync --> Async + Async --> Server + + Server --> Def + Def --> Exec + + Exec --> Action + Action --> Server + Server --> Obs + + Server -.->|Spawns| Tools + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Sync,Async primary + class Def,Exec secondary + class Action,Obs tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | + +## MCP Client + +### Sync/Async Bridge + +The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Sync["Sync Code
Agent execution"] + Bridge["call_async_from_sync()"] + Executor["AsyncExecutor
Background loop"] + Async["Async MCP Call"] + Server["MCP Server"] + Result["Result"] + + Sync --> Bridge + Bridge --> Executor + Executor --> Async + Async --> Server + Server --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Bridge Pattern:** +- **Problem:** MCP protocol is async, but agent tools run synchronously +- **Solution:** Background event loop that executes async code from sync contexts +- **Benefit:** Agents use MCP tools without async/await in tool definitions + +**Client Features:** +- **Lifecycle Management:** `__enter__`/`__exit__` for context manager +- **Timeout Support:** Configurable timeouts for MCP operations +- **Error Handling:** Wraps MCP errors in observations +- **Connection Pooling:** Reuses connections across tool calls + +### MCP Server Configuration + +MCP servers are configured using the FastMCP format: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } +} +``` + +**Configuration Fields:** +- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) +- **args:** Arguments to pass to command +- **env:** Environment variables (optional) + +## Tool Discovery and Conversion + +### Discovery Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Config"] + Spawn["Spawn Server"] + List["List Tools"] + + subgraph Convert["Convert Each Tool"] + Schema["MCP Schema"] + Action["Generate Action Model"] + Def["Create ToolDefinition"] + end + + Register["Register in ToolRegistry"] + + Config --> Spawn + Spawn --> List + List --> Schema + + Schema --> Action + Action --> Def + Def --> Register + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** + +1. **Spawn Server:** Launch MCP server via stdio +2. **List Tools:** Call `tools/list` MCP endpoint +3. **Parse Schemas:** Extract tool names, descriptions, parameters +4. **Generate Models:** Dynamically create Pydantic models for actions +5. **Create Definitions:** Wrap in `ToolDefinition` objects +6. **Register:** Add to agent's tool registry + +### Schema Conversion + +MCP tool schemas are converted to SDK tool definitions: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP Tool Schema
JSON Schema"] + Parse["Parse Parameters"] + Model["Dynamic Pydantic Model
MCPToolAction"] + Def["ToolDefinition
SDK format"] + + MCP --> Parse + Parse --> Model + Model --> Def + + style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Conversion Rules:** + +| MCP Schema | SDK Action Model | +|------------|------------------| +| **name** | Class name (camelCase) | +| **description** | Docstring | +| **inputSchema** | Pydantic fields | +| **required** | Field(required=True) | +| **type** | Python type hints | + +**Example:** + +```python +# MCP Schema +{ + "name": "fetch_url", + "description": "Fetch content from URL", + "inputSchema": { + "type": "object", + "properties": { + "url": {"type": "string"}, + "timeout": {"type": "number"} + }, + "required": ["url"] + } +} + +# Generated Action Model +class FetchUrl(MCPToolAction): + """Fetch content from URL""" + url: str + timeout: float | None = None +``` + +## Tool Execution + +### Execution Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Agent["Agent generates action"] + Action["MCPToolAction"] + Executor["MCPToolExecutor"] + + Convert["Convert to MCP format"] + Call["MCP call_tool"] + Server["MCP Server"] + + Result["MCP Result"] + Obs["MCPToolObservation"] + Return["Return to Agent"] + + Agent --> Action + Action --> Executor + Executor --> Convert + Convert --> Call + Call --> Server + Server --> Result + Result --> Obs + Obs --> Return + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Steps:** + +1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` +2. **Executor Lookup:** Find `MCPToolExecutor` for tool name +3. **Format Conversion:** Convert action fields to MCP arguments +4. **MCP Call:** Execute `call_tool` via MCP client +5. **Result Parsing:** Parse MCP result (text, images, resources) +6. **Observation Creation:** Wrap in `MCPToolObservation` +7. **Error Handling:** Catch exceptions, return error observations + +### MCPToolExecutor + +Executors bridge SDK actions to MCP calls: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Executor["MCPToolExecutor"] + Client["MCP Client"] + Name["tool_name"] + + Executor -->|Uses| Client + Executor -->|Knows| Name + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Executor Responsibilities:** +- **Client Management:** Hold reference to MCP client +- **Tool Identification:** Know which MCP tool to call +- **Argument Conversion:** Transform action fields to MCP format +- **Result Handling:** Parse MCP responses +- **Error Recovery:** Handle connection errors, timeouts, server failures + +## MCP Tool Lifecycle + +### From Configuration to Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Load["Load MCP Config"] + Start["Start Conversation"] + Spawn["Spawn MCP Servers"] + Discover["Discover Tools"] + Register["Register Tools"] + + Ready["Agent Ready"] + + Step["Agent Step"] + LLM["LLM Tool Call"] + Execute["Execute MCP Tool"] + Result["Return Observation"] + + End["End Conversation"] + Cleanup["Close MCP Clients"] + + Load --> Start + Start --> Spawn + Spawn --> Discover + Discover --> Register + Register --> Ready + + Ready --> Step + Step --> LLM + LLM --> Execute + Execute --> Result + Result --> Step + + Step --> End + End --> Cleanup + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Lifecycle Phases:** + +| Phase | Operations | Components | +|-------|-----------|------------| +| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | +| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | +| **Execution** | Handle tool calls | Agent, MCPToolAction | +| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | + +## MCP Annotations + +MCP tools can include metadata hints for agents: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Tool["MCP Tool"] + + subgraph Annotations + ReadOnly["readOnlyHint"] + Destructive["destructiveHint"] + Progress["progressEnabled"] + end + + Security["Security Analysis"] + + Tool --> ReadOnly + Tool --> Destructive + Tool --> Progress + + ReadOnly --> Security + Destructive --> Security + + style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Annotation Types:** + +| Annotation | Meaning | Use Case | +|------------|---------|----------| +| **readOnlyHint** | Tool doesn't modify state | Lower security risk | +| **destructiveHint** | Tool modifies/deletes data | Require confirmation | +| **progressEnabled** | Tool reports progress | Show progress UI | + +These annotations feed into the security analyzer for risk assessment. + +## Component Relationships + +### How MCP Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP System"] + Skills["Skills"] + Tools["Tool Registry"] + Agent["Agent"] + Security["Security"] + + Skills -->|Configures| MCP + MCP -->|Registers| Tools + Agent -->|Uses| Tools + MCP -->|Provides hints| Security + + style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → MCP**: Repository skills can embed MCP configurations +- **MCP → Tools**: MCP tools registered alongside native tools +- **Agent → Tools**: Agents use MCP tools like any other tool +- **MCP → Security**: Annotations inform security risk assessment +- **Transparent Integration**: Agent doesn't distinguish MCP from native tools + +## Design Rationale + +**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. + +**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. + +**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. + +**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. + +**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. + +**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. + +## See Also + +- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment +- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + + +# Overview +Source: https://docs.openhands.dev/sdk/arch/overview + +The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. + +Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. + +## Relationship with OpenHands Applications + +The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. +- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. +- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. +- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% +graph TB + subgraph Interfaces["OpenHands Interfaces"] + UI[OpenHands GUI
React frontend] + CLI[OpenHands CLI
Command-line interface] + Custom[Your Custom Client
Automations & workflows] + end + + SDK[Software Agent SDK
openhands.sdk + tools + workspace] + + subgraph External["External Services"] + LLM[LLM Providers
OpenAI, Anthropic, etc.] + Runtime[Runtime Services
Docker, Remote API, etc.] + end + + UI --> SDK + CLI --> SDK + Custom --> SDK + + SDK --> LLM + SDK --> Runtime + + classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class UI,CLI,Custom interface + class SDK sdk + class LLM,Runtime external +``` + + +## Four-Package Architecture + +The agent-sdk is organized into four distinct Python packages: + +| Package | What It Does | When You Need It | +|---------|-------------|------------------| +| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | +| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | +| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | +| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | + +### Two Deployment Modes + +The SDK supports two deployment architectures depending on your needs: + +#### Mode 1: Local Development + +**Installation:** Just install `openhands-sdk` + `openhands-tools` + +```bash +pip install openhands-sdk openhands-tools +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools + + SDK -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 +``` + +- `LocalWorkspace` included in SDK (no extra install) +- Everything runs in one process +- Perfect for prototyping and simple use cases +- Quick setup, no Docker required + +#### Mode 2: Production / Sandboxed + +**Installation:** Install all 4 packages + +```bash +pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% +flowchart LR + + WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk + + subgraph WS[" "] + direction LR + Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws + Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws + end + + Server["openhands.agent_server
FastAPI + WebSocket"]:::server + Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · …"]:::tools + + WSBase -.->|extended by| Docker + WSBase -.->|extended by| Remote + Docker -->|spawns container with| Server + Remote -->|connects via HTTP to| Server + Server -->|runs| Agent + Agent -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 + classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 + classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + + style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none +``` + +- `RemoteWorkspace` auto-spawns agent-server in containers +- Sandboxed execution for security +- Multi-user deployments +- Distributed systems (e.g., Kubernetes) support + + +**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). + + +### SDK Package (`openhands.sdk`) + +**Purpose:** Core components and base classes for OpenHands agent. + +**Key Components:** +- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop +- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle +- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry +- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration +- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) +- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) +- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation +- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management +- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution + +**Design:** Stateless, immutable components with type-safe Pydantic models. + +**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. + +**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) + +### Tools Package (`openhands.tools`) + + + +**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. + + +**Purpose:** Pre-built tools following consistent patterns. + +**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. + + +For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. + + + +### Workspace Package (`openhands.workspace`) + +**Purpose:** Workspace implementations extending SDK base classes. + +**Key Components:** Docker Workspace, Remote API Workspace, and more. + +**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. + +**Use Cases:** Sandboxed execution, multi-user deployments, production environments. + + +For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). + + +### Agent Server Package (`openhands.agent_server`) + +**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. + +**Features:** +- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode +- Service management with isolated per-user sessions +- API key authentication and health checking + +**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). + +**Use Cases:** Multi-user web apps, SaaS products, distributed systems. + + +For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). + + +## How Components Work Together + +### Basic Execution Flow (Local) + +When you send a message to an agent, here's what happens: + +```mermaid +sequenceDiagram + participant You + participant Conversation + participant Agent + participant LLM + participant Tool + + You->>Conversation: "Create hello.txt" + Conversation->>Agent: Process message + Agent->>LLM: What should I do? + LLM-->>Agent: Use BashTool("touch hello.txt") + Agent->>Tool: Execute action + Note over Tool: Runs in same environment
as Agent (local/container/remote) + Tool-->>Agent: Observation + Agent->>LLM: Got result, continue? + LLM-->>Agent: Done + Agent-->>Conversation: Update state + Conversation-->>You: "File created!" +``` + +**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. + +### Deployment Flexibility + +The same agent code runs in different environments by swapping workspace configuration: + +```mermaid +graph TB + subgraph "Your Code (Unchanged)" + Code["Agent + Tools + LLM"] + end + + subgraph "Deployment Options" + Local["Local
Direct execution"] + Docker["Docker
Containerized"] + Remote["Remote
Multi-user server"] + end + + Code -->|LocalWorkspace| Local + Code -->|DockerWorkspace| Docker + Code -->|RemoteAPIWorkspace| Remote + + style Code fill:#e1f5fe + style Local fill:#e8f5e8 + style Docker fill:#e8f5e8 + style Remote fill:#e8f5e8 +``` + +## Next Steps + +### Get Started +- [Getting Started](/sdk/getting-started) – Build your first agent +- [Hello World](/sdk/guides/hello-world) – Minimal example + +### Explore Components + +**SDK Package:** +- [Agent](/sdk/arch/agent) – Core reasoning-action loop +- [Conversation](/sdk/arch/conversation) – State management and lifecycle +- [LLM](/sdk/arch/llm) – Language model integration +- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern +- [Events](/sdk/arch/events) – Typed event framework +- [Workspace](/sdk/arch/workspace) – Base workspace architecture + +**Tools Package:** +- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details + +**Workspace Package:** +- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details + +**Agent Server:** +- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details + +### Deploy +- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service +- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server + +### Source Code +- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework +- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools +- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples + + +# SDK Package +Source: https://docs.openhands.dev/sdk/arch/sdk + +The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. + +**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) + +## Purpose + +The SDK package handles: +- **Agent reasoning loop**: How agents process messages and make decisions +- **State management**: Conversation lifecycle and persistence +- **LLM integration**: Provider-agnostic language model access +- **Tool system**: Typed actions and observations +- **Workspace abstraction**: Where code executes +- **Extensibility**: Skills, condensers, MCP, security + +## Core Components + +```mermaid +graph TB + Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] + + Agent --> LLM[LLM
Language Model] + Agent --> Tools[Tool System
Capabilities] + Agent --> Micro[Skills
Behavior Modules] + Agent --> Cond[Condenser
Memory Manager] + + Tools --> Workspace[Workspace
Execution] + + Conv --> Events[Events
Communication] + Tools --> MCP[MCP
External Tools] + Workspace --> Security[Security
Validation] + + style Conv fill:#e1f5fe + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fff3e0 + style Workspace fill:#fce4ec +``` + +### 1. Conversation - State & Lifecycle + +**What it does**: Manages the entire conversation lifecycle and state. + +**Key responsibilities**: +- Maintains conversation state (immutable) +- Handles message flow between user and agent +- Manages turn-taking and async execution +- Persists and restores conversation state +- Emits events for monitoring + +**Design decisions**: +- **Immutable state**: Each operation returns a new Conversation instance +- **Serializable**: Can be saved to disk or database and restored +- **Async-first**: Built for streaming and concurrent execution + +**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. + +**Example use cases**: +- Saving conversation to database after each turn +- Implementing undo/redo functionality +- Building multi-session chatbots +- Time-travel debugging + +**Learn more**: +- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) +- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) +- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) + +--- + +### 2. Agent - The Reasoning Loop + +**What it does**: The core reasoning engine that processes messages and decides what to do. + +**Key responsibilities**: +- Receives messages and current state +- Consults LLM to reason about next action +- Validates and executes tool calls +- Processes observations and loops until completion +- Integrates with skills for specialized behavior + +**Design decisions**: +- **Stateless**: Agent doesn't hold state, operates on Conversation +- **Extensible**: Behavior can be modified via skills +- **Provider-agnostic**: Works with any LLM through unified interface + +**The reasoning loop**: +1. Receive message from Conversation +2. Add message to context +3. Consult LLM with full conversation history +4. If LLM returns tool call → validate and execute tool +5. If tool returns observation → add to context, go to step 3 +6. If LLM returns response → done, return to user + +**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. + +**Example use cases**: +- Planning agents that break tasks into steps +- Code review agents with specific checks +- Agents with domain-specific reasoning patterns + +**Learn more**: +- Guide: [Custom Agents](/sdk/guides/agent-custom) +- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) +- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) + +--- + +### 3. LLM - Language Model Integration + +**What it does**: Provides a provider-agnostic interface to language models. + +**Key responsibilities**: +- Abstracts different LLM providers (OpenAI, Anthropic, etc.) +- Handles message formatting and conversion +- Manages streaming responses +- Supports tool calling and reasoning modes +- Handles retries and error recovery + +**Design decisions**: +- **Provider-agnostic**: Same API works with any provider +- **Streaming-first**: Built for real-time responses +- **Type-safe**: Pydantic models for all messages +- **Extensible**: Easy to add new providers + +**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: +- Cost optimization (switch to cheaper models) +- Testing with different models +- Avoiding vendor lock-in +- Supporting customer choice + +**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. + +**Example use cases**: +- Routing requests to different models based on complexity +- Implementing custom caching strategies +- Adding observability hooks + +**Learn more**: +- Guide: [LLM Registry](/sdk/guides/llm-registry) +- Guide: [LLM Routing](/sdk/guides/llm-routing) +- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) +- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) + +--- + +### 4. Tool System - Typed Capabilities + +**What it does**: Defines what agents can do through a typed action/observation pattern. + +**Key responsibilities**: +- Defines tool schemas (inputs and outputs) +- Validates actions before execution +- Executes tools and returns typed observations +- Generates JSON schemas for LLM tool calling +- Registers tools with the agent + +**Design decisions**: +- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs +- **Schema generation**: Pydantic models auto-generate JSON schemas +- **Executor pattern**: Separation of tool definition and execution +- **Composable**: Tools can call other tools + +**The three components**: +1. **Action**: Input schema (what the tool accepts) +2. **Observation**: Output schema (what the tool returns) +3. **ToolExecutor**: Logic that transforms Action → Observation + +**Why this pattern?** +- Type safety catches errors early +- LLMs get accurate schemas for tool calling +- Tools are testable in isolation +- Easy to compose tools + +**When to customize**: When you need domain-specific capabilities not covered by built-in tools. + +**Example use cases**: +- Database query tools +- API integration tools +- Custom file format parsers +- Domain-specific calculators + +**Learn more**: +- Guide: [Custom Tools](/sdk/guides/custom-tools) +- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) + +--- + +### 5. Workspace - Execution Abstraction + +**What it does**: Abstracts *where* code executes (local, Docker, remote). + +**Key responsibilities**: +- Provides unified interface for code execution +- Handles file operations across environments +- Manages working directories +- Supports different isolation levels + +**Design decisions**: +- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package +- **Environment-agnostic**: Code works the same locally or remotely +- **Lazy initialization**: Workspace setup happens on first use + +**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. + +**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. + +**Learn more**: +- Architecture: [Workspace Architecture](/sdk/arch/workspace) +- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) +- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +--- + +### 6. Events - Component Communication + +**What it does**: Enables observability and debugging through event emissions. + +**Key responsibilities**: +- Defines event types (messages, actions, observations, errors) +- Emitted by Conversation, Agent, Tools +- Enables logging, debugging, and monitoring +- Supports custom event handlers + +**Design decisions**: +- **Immutable**: Events are snapshots, not mutable objects +- **Serializable**: Can be logged, stored, replayed +- **Type-safe**: Pydantic models for all events + +**Why events?** They provide a timeline of what happened during agent execution. Essential for: +- Debugging agent behavior +- Understanding decision-making +- Building observability dashboards +- Implementing custom logging + +**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. + +**Learn more**: +- Guide: [Metrics and Observability](/sdk/guides/metrics) +- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +--- + +### 7. Condenser - Memory Management + +**What it does**: Compresses conversation history when it gets too long. + +**Key responsibilities**: +- Monitors conversation length +- Summarizes older messages +- Preserves important context +- Keeps conversation within token limits + +**Design decisions**: +- **Pluggable**: Different condensing strategies +- **Automatic**: Triggered when context gets large +- **Preserves semantics**: Important information retained + +**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. + +**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. + +**Example strategies**: +- Summarize old messages +- Keep only last N turns +- Preserve task-related messages + +**Learn more**: +- Guide: [Context Condenser](/sdk/guides/context-condenser) +- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +--- + +### 8. MCP - Model Context Protocol + +**What it does**: Integrates external tool servers via Model Context Protocol. + +**Key responsibilities**: +- Connects to MCP-compatible tool servers +- Translates MCP tools to SDK tool format +- Manages server lifecycle +- Handles server communication + +**Design decisions**: +- **Standard protocol**: Uses MCP specification +- **Transparent integration**: MCP tools look like regular tools to agents +- **Process management**: Handles server startup/shutdown + +**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. + +**When to use**: When you need tools that: +- Already have MCP servers (fetch, filesystem, etc.) +- Are too complex to rewrite as SDK tools +- Need to run in separate processes +- Are provided by third parties + +**Learn more**: +- Guide: [MCP Integration](/sdk/guides/mcp) +- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) +- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +--- + +### 9. Skills (formerly Microagents) - Behavior Modules + +**What it does**: Specialized modules that modify agent behavior for specific tasks. + +**Key responsibilities**: +- Provide domain-specific instructions +- Modify system prompts +- Guide agent decision-making +- Compose to create specialized agents + +**Design decisions**: +- **Composable**: Multiple skills can work together +- **Declarative**: Defined as configuration, not code +- **Reusable**: Share skills across agents + +**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. + +**Example skills**: +- GitHub operations (issue creation, PRs) +- Code review guidelines +- Documentation style enforcement +- Project-specific conventions + +**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. + +**Learn more**: +- Guide: [Agent Skills & Context](/sdk/guides/skill) +- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +--- + +### 10. Security - Validation & Sandboxing + +**What it does**: Validates inputs and enforces security constraints. + +**Key responsibilities**: +- Input validation +- Command sanitization +- Path traversal prevention +- Resource limits + +**Design decisions**: +- **Defense in depth**: Multiple validation layers +- **Fail-safe**: Rejects suspicious inputs by default +- **Configurable**: Adjust security levels as needed + +**Why needed?** Agents execute arbitrary code and file operations. Security prevents: +- Malicious prompts escaping sandboxes +- Path traversal attacks +- Resource exhaustion +- Unintended system access + +**When to customize**: When you need domain-specific validation rules or want to adjust security policies. + +**Learn more**: +- Guide: [Security and Secrets](/sdk/guides/security) +- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +--- + +## How Components Work Together + +### Example: User asks agent to create a file + +``` +1. User → Conversation: "Create a file called hello.txt with 'Hello World'" + +2. Conversation → Agent: New message event + +3. Agent → LLM: Full conversation history + available tools + +4. LLM → Agent: Tool call for FileEditorTool.create() + +5. Agent → Tool System: Validate FileEditorAction + +6. Tool System → Tool Executor: Execute action + +7. Tool Executor → Workspace: Create file (local/docker/remote) + +8. Workspace → Tool Executor: Success + +9. Tool Executor → Tool System: FileEditorObservation (success=true) + +10. Tool System → Agent: Observation + +11. Agent → LLM: Updated history with observation + +12. LLM → Agent: "File created successfully" + +13. Agent → Conversation: Done, final response + +14. Conversation → User: "File created successfully" +``` + +Throughout this flow: +- **Events** are emitted for observability +- **Condenser** may trigger if history gets long +- **Skills** influence LLM's decision-making +- **Security** validates file paths and operations +- **MCP** could provide additional tools if configured + +## Design Patterns + +### Immutability + +All core objects are immutable. Operations return new instances: + +```python +conversation = Conversation(...) +new_conversation = conversation.add_message(message) +# conversation is unchanged, new_conversation has the message +``` + +**Why?** Makes debugging easier, enables time-travel, ensures serializability. + +### Composition Over Inheritance + +Agents are composed from: +- LLM provider +- Tool list +- Skill list +- Condenser strategy +- Security policy + +You don't subclass Agent - you configure it. + +**Why?** More flexible, easier to test, enables runtime configuration. + +### Type Safety + +Everything uses Pydantic models: +- Messages, actions, observations are typed +- Validation happens automatically +- Schemas generate from types + +**Why?** Catches errors early, provides IDE support, self-documenting. + +## Next Steps + +### For Usage Examples + +- [Getting Started](/sdk/getting-started) - Build your first agent +- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities +- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers +- [Conversation Management](/sdk/guides/convo-persistence) - State handling + +### For Related Architecture + +- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations +- [Workspace Architecture](/sdk/arch/workspace) - Execution environments +- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution + +### For Implementation Details + +- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code +- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code +- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + + +# Security +Source: https://docs.openhands.dev/sdk/arch/security + +The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. + +**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +## Core Responsibilities + +The Security system has four primary responsibilities: + +1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions +2. **Confirmation Policy** - Determine when user approval is required based on risk +3. **Action Validation** - Enforce security policies before execution +4. **Audit Trail** - Record security decisions in event history + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["SecurityAnalyzerBase
Abstract analyzer"] + end + + subgraph Implementations["Concrete Analyzers"] + LLM["LLMSecurityAnalyzer
Inline risk prediction"] + NoOp["NoOpSecurityAnalyzer
No analysis"] + end + + subgraph Risk["Risk Levels"] + Low["LOW
Safe operations"] + Medium["MEDIUM
Moderate risk"] + High["HIGH
Dangerous ops"] + Unknown["UNKNOWN
Unanalyzed"] + end + + subgraph Policy["Confirmation Policy"] + Check["should_require_confirmation()"] + Mode["Confirmation Mode"] + Decision["Require / Allow"] + end + + Base --> LLM + Base --> NoOp + + Implementations --> Low + Implementations --> Medium + Implementations --> High + Implementations --> Unknown + + Low --> Check + Medium --> Check + High --> Check + Unknown --> Check + + Check --> Mode + Mode --> Decision + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + + class Base primary + class LLM secondary + class High danger + class Check tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | +| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | +| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | +| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | +| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | + +## Risk Levels + +Security analyzers return one of four risk levels: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + Action["ActionEvent"] + Analyze["Security Analyzer"] + + subgraph Levels["Risk Levels"] + Low["LOW
Read-only, safe"] + Medium["MEDIUM
Modify files"] + High["HIGH
Delete, execute"] + Unknown["UNKNOWN
Not analyzed"] + end + + Action --> Analyze + Analyze --> Low + Analyze --> Medium + Analyze --> High + Analyze --> Unknown + + style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px + style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px + style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px +``` + +### Risk Level Definitions + +| Level | Characteristics | Examples | +|-------|----------------|----------| +| **LOW** | Read-only, no state changes | File reading, directory listing, search | +| **MEDIUM** | Modifies user data | File editing, creating files, API calls | +| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | +| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | + +## Security Analyzers + +### LLMSecurityAnalyzer + +Leverages the LLM's inline risk assessment during action generation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Schema["Tool Schema
+ security_risk param"] + LLM["LLM generates action
with security_risk"] + ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] + Extract["Extract security_risk
from arguments"] + ActionEvent["ActionEvent
with security_risk set"] + Analyzer["LLMSecurityAnalyzer
returns security_risk"] + + Schema --> LLM + LLM --> ToolCall + ToolCall --> Extract + Extract --> ActionEvent + ActionEvent --> Analyzer + + style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Analysis Process:** + +1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema +2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments +3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments +4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` +5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level +6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step + +**Example Tool Call:** +```json +{ + "name": "execute_bash", + "arguments": { + "command": "rm -rf /tmp/cache", + "security_risk": "HIGH" + } +} +``` + +The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. + +**Configuration:** +- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent +- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools +- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation + +### NoOpSecurityAnalyzer + +Passthrough analyzer that skips analysis: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Action["ActionEvent"] + NoOp["NoOpSecurityAnalyzer"] + Unknown["SecurityRisk.UNKNOWN"] + + Action --> NoOp --> Unknown + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Use Case:** Development, trusted environments, or when confirmation mode handles all actions + +## Confirmation Policy + +The confirmation policy determines when user approval is required. There are three policy implementations: + +**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) + +### Policy Types + +| Policy | Behavior | Use Case | +|--------|----------|----------| +| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | +| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | +| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | + +### ConfirmRisky (Default Policy) + +The most flexible policy with configurable thresholds: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Risk["SecurityRisk"] + CheckUnknown{"Risk ==
UNKNOWN?"} + UseConfirmUnknown{"confirm_unknown
setting?"} + CheckThreshold{"risk.is_riskier
(threshold)?"} + + Confirm["Require Confirmation"] + Allow["Allow Execution"] + + Risk --> CheckUnknown + CheckUnknown -->|Yes| UseConfirmUnknown + CheckUnknown -->|No| CheckThreshold + + UseConfirmUnknown -->|True| Confirm + UseConfirmUnknown -->|False| Allow + + CheckThreshold -->|Yes| Confirm + CheckThreshold -->|No| Allow + + style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px +``` + +**Configuration:** +- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required + - Cannot be set to `UNKNOWN` + - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` +- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation + +### Confirmation Rules by Policy + +#### ConfirmRisky with threshold=HIGH (Default) + +| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | +|------------|----------------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | ✅ Allow | ✅ Allow | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=MEDIUM + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=LOW + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +**Key Rules:** +- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` +- **UNKNOWN handling** is configurable via `confirm_unknown` flag +- **Threshold cannot be UNKNOWN** - validated at policy creation time + + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Security["Security Analyzer"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + MCP["MCP Tools"] + + Agent -->|Validates actions| Security + Security -->|Checks| Tools + Security -->|Uses hints| MCP + Conversation -->|Pauses for confirmation| Agent + + style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Security**: Validates actions before execution +- **Security → Tools**: Examines tool characteristics (annotations) +- **Security → MCP**: Uses MCP hints for risk assessment +- **Conversation → Agent**: Pauses for user confirmation when required +- **Optional Component**: Security analyzer can be disabled for trusted environments + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers +- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints +- **[Security Guide](/sdk/guides/security)** - Configuring security policies + + +# Skill +Source: https://docs.openhands.dev/sdk/arch/skill + +The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. + +**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +## Core Responsibilities + +The Skill system has four primary responsibilities: + +1. **Context Injection** - Add specialized prompts to agent context based on triggers +2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) +3. **MCP Integration** - Load MCP tools associated with repository skills +4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Types["Skill Types"] + Repo["Repository Skill
trigger: None"] + Knowledge["Knowledge Skill
trigger: KeywordTrigger"] + Task["Task Skill
trigger: TaskTrigger"] + end + + subgraph Triggers["Trigger Evaluation"] + Always["Always Active
Repository guidelines"] + Keyword["Keyword Match
String matching on user messages"] + TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] + end + + subgraph Content["Skill Content"] + Markdown["Markdown with Frontmatter"] + MCPTools["MCP Tools Config
Repo skills only"] + Inputs["Input Metadata
Task skills only"] + end + + subgraph Integration["Agent Integration"] + Context["Agent Context"] + Prompt["System Prompt"] + end + + Repo --> Always + Knowledge --> Keyword + Task --> TaskMatch + + Always --> Markdown + Keyword --> Markdown + TaskMatch --> Markdown + + Repo -.->|Optional| MCPTools + Task -.->|Requires| Inputs + + Markdown --> Context + MCPTools --> Context + Context --> Prompt + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Repo,Knowledge,Task primary + class Always,Keyword,TaskMatch secondary + class Context tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | +| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | +| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | +| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | +| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | + +## Skill Types + +### Repository Skills + +Always-active, repository-specific guidelines. + +**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + File["AGENTS.md"] + Parse["Parse Frontmatter"] + Skill["Skill(trigger=None)"] + Context["Always in Context"] + + File --> Parse + Parse --> Skill + Skill --> Context + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `None` (always active) +- **Purpose:** Project conventions, coding standards, architecture rules +- **MCP Tools:** Can include MCP tool configuration +- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) + +**Example Files (permanent context):** +- `AGENTS.md` - General agent instructions +- `GEMINI.md` - Gemini-specific instructions +- `CLAUDE.md` - Claude-specific instructions + +**Other supported formats:** +- `.cursorrules` - Cursor IDE guidelines +- `agents.md` / `agent.md` - General agent instructions + +### Knowledge Skills + +Keyword-triggered skills for specialized domains: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Check["Check Keywords"] + Match{"Match?"} + Activate["Activate Skill"] + Skip["Skip Skill"] + Context["Add to Context"] + + User --> Check + Check --> Match + Match -->|Yes| Activate + Match -->|No| Skip + Activate --> Context + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `KeywordTrigger` with regex patterns +- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") +- **Activation:** Keywords detected in user messages +- **Location:** System or user-defined knowledge base + +**Trigger Example:** +```yaml +--- +name: kubernetes +trigger: + type: keyword + keywords: ["kubernetes", "k8s", "kubectl"] +--- +``` + +### Task Skills + +Keyword-triggered skills with structured inputs for guided workflows: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Match{"Keyword
Match?"} + Inputs["Collect User Inputs"] + Template["Apply Template"] + Context["Add to Context"] + Skip["Skip Skill"] + + User --> Match + Match -->|Yes| Inputs + Match -->|No| Skip + Inputs --> Template + Template --> Context + + style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) +- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) +- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) +- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) +- **Location:** System-defined or custom task templates + +**Trigger Example:** +```yaml +--- +name: bug_fix +triggers: ["/bug_fix", "fix bug", "bug report"] +inputs: + - name: bug_description + description: "Describe the bug" + required: true +--- +``` + +**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. + +## Trigger Evaluation + +Skills are evaluated at different points in the agent lifecycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent Step Start"] + + Repo["Check Repository Skills
trigger: None"] + AddRepo["Always Add to Context"] + + Message["Check User Message"] + Keyword["Match Keyword Triggers"] + AddKeyword["Add Matched Skills"] + + TaskType["Check Task Type"] + TaskMatch["Match Task Triggers"] + AddTask["Add Task Skill"] + + Build["Build Agent Context"] + + Start --> Repo + Repo --> AddRepo + + Start --> Message + Message --> Keyword + Keyword --> AddKeyword + + Start --> TaskType + TaskType --> TaskMatch + TaskMatch --> AddTask + + AddRepo --> Build + AddKeyword --> Build + AddTask --> Build + + style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Evaluation Rules:** + +| Trigger Type | Evaluation Point | Activation Condition | +|--------------|------------------|----------------------| +| **None** | Every step | Always active | +| **KeywordTrigger** | On user message | Keyword/string match in message | +| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | + +**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. + +## MCP Tool Integration + +Repository skills can include MCP tool configurations: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skill["Repository Skill"] + MCPConfig["mcp_tools Config"] + Client["MCP Client"] + Tools["Tool Registry"] + + Skill -->|Contains| MCPConfig + MCPConfig -->|Spawns| Client + Client -->|Registers| Tools + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**MCP Configuration Format:** + +Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): + +```yaml +--- +name: repo_skill +mcp_tools: + mcpServers: + filesystem: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] +--- +``` + +**Workflow:** +1. **Load Skill:** Parse markdown file with frontmatter +2. **Extract MCP Config:** Read `mcp_tools` field +3. **Spawn MCP Servers:** Create MCP clients for each server +4. **Register Tools:** Add MCP tools to agent's tool registry +5. **Inject Context:** Add skill content to agent prompt + +## Skill File Format + +Skills are defined in markdown files with YAML frontmatter: + +```markdown +--- +name: skill_name +trigger: + type: keyword + keywords: ["pattern1", "pattern2"] +--- + +# Skill Content + +This is the instruction text that will be added to the agent's context. +``` + +**Frontmatter Fields:** + +| Field | Required | Description | +|-------|----------|-------------| +| **name** | Yes | Unique skill identifier | +| **trigger** | Yes* | Activation trigger (`null` for always active) | +| **mcp_tools** | No | MCP server configuration (repo skills only) | +| **inputs** | No | User input metadata (task skills only) | + +*Repository skills use `trigger: null` (or omit trigger field) + +## Component Relationships + +### How Skills Integrate + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skills["Skill System"] + Context["Agent Context"] + Agent["Agent"] + MCP["MCP Client"] + + Skills -->|Injects content| Context + Skills -.->|Spawns tools| MCP + Context -->|System prompt| Agent + MCP -->|Tool| Agent + + style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → Agent Context**: Active skills contribute their content to system prompt +- **Skills → MCP**: Repository skills can spawn MCP servers and register tools +- **Context → Agent**: Combined skill content becomes part of agent's instructions +- **Skills Lifecycle**: Loaded at conversation start, evaluated each step + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context +- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management +- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications + + +# Tool System & MCP +Source: https://docs.openhands.dev/sdk/arch/tool-system + +The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. + +**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) + +## Core Responsibilities + +The Tool System has four primary responsibilities: + +1. **Type Safety** - Enforce action/observation schemas via Pydantic models +2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas +3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs +4. **Tool Registry** - Discover and resolve tools by name or pattern + +## Tool System + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Definition["Tool Definition"] + Action["Action
Input schema"] + Observation["Observation
Output schema"] + Executor["Executor
Business logic"] + end + + subgraph Framework["Tool Framework"] + Base["ToolBase
Abstract base"] + Impl["Tool Implementation
Concrete tool"] + Registry["Tool Registry
Spec → Tool"] + end + + Agent["Agent"] + LLM["LLM"] + ToolSpec["Tool Spec
name + params"] + + Base -.->|Extends| Impl + + ToolSpec -->|resolve_tool| Registry + Registry -->|Create instances| Impl + Impl -->|Available in| Agent + Impl -->|Generate schema| LLM + LLM -->|Generate tool call| Agent + Agent -->|Parse & validate| Action + Agent -->|Execute via Tool.\_\_call\_\_| Executor + Executor -->|Return| Observation + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Action,Observation,Executor secondary + class Registry tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | +| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | +| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | +| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | +| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | +| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | +| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | +| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | + +### Action-Observation Pattern + +The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Agent["Agent Layer"] + ToolCall["MessageToolCall
from LLM"] + ParseJSON["Parse JSON
arguments"] + CreateAction["tool.action_from_arguments()
Pydantic validation"] + WrapAction["ActionEvent
wraps Action"] + WrapObs["ObservationEvent
wraps Observation"] + Error["AgentErrorEvent"] + end + + subgraph ToolSystem["Tool System"] + ActionType["Action
Pydantic model"] + ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] + Execute["ToolExecutor
business logic"] + ObsType["Observation
Pydantic model"] + end + + ToolCall --> ParseJSON + ParseJSON -->|Valid JSON| CreateAction + ParseJSON -->|Invalid JSON| Error + CreateAction -->|Valid| ActionType + CreateAction -->|Invalid| Error + ActionType --> WrapAction + ActionType --> ToolCall2 + ToolCall2 --> Execute + Execute --> ObsType + ObsType --> WrapObs + + style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px + style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px +``` + +**Tool System Boundary:** +- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance +- **Output**: `Observation` instance with structured result +- **No knowledge of**: Events, LLM messages, conversation state + +### Tool Definition + +Tools are defined using two patterns depending on complexity: + +#### Pattern 1: Direct Instantiation (Simple Tools) + +For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
stateless logic"] + Tool["ToolDefinition(...,
executor=Executor())"] + + Action --> Tool + Obs --> Tool + Exec --> Tool + + style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Components:** +1. **Action** - Pydantic model with `visualize` property for display +2. **Observation** - Pydantic model with `to_llm_content` property for LLM +3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` +4. **ToolDefinition** - Direct instantiation with executor instance + +#### Pattern 2: Subclass with Factory (Stateful Tools) + +For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
with \_\_init\_\_ and state"] + Subclass["class MyTool(ToolDefinition)
with create() method"] + Instance["Return [MyTool(...,
executor=instance)]"] + + Action --> Subclass + Obs --> Subclass + Exec --> Subclass + Subclass --> Instance + + style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Components:** +1. **Action/Observation** - Same as Pattern 1 +2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup +3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method +4. **Factory Method** - Returns sequence of configured tool instances + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Pattern1["Pattern 1: Direct Instantiation"] + P1A["Define Action/Observation
with visualize/to_llm_content"] + P1E["Define ToolExecutor
with \_\_call\_\_()"] + P1T["ToolDefinition(...,
executor=Executor())"] + end + + subgraph Pattern2["Pattern 2: Subclass with Factory"] + P2A["Define Action/Observation
with visualize/to_llm_content"] + P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] + P2C["class MyTool(ToolDefinition)
@classmethod create()"] + P2I["Return [MyTool(...,
executor=instance)]"] + end + + P1A --> P1E + P1E --> P1T + + P2A --> P2E + P2E --> P2C + P2C --> P2I + + style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Key Design Elements:** + +| Component | Purpose | Requirements | +|-----------|---------|--------------| +| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | +| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | +| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | +| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | + +**When to Use Each Pattern:** + +| Pattern | Use Case | Examples | +|---------|----------|----------| +| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | +| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | + +### Tool Annotations + +Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: + +| Field | Meaning | Examples | +|-------|---------|----------| +| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | +| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | +| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | +| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | + +**Key Behaviors:** +- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` +- Annotations help LLMs reason about tool safety and side effects + +### Tool Registry + +The registry enables **dynamic tool discovery** and instantiation from tool specifications: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + ToolSpec["Tool Spec
name + params"] + + subgraph Registry["Tool Registry"] + Resolver["Resolver
name → factory"] + Factory["Factory
create(params)"] + end + + Instance["Tool Instance
with executor"] + Agent["Agent"] + + ToolSpec -->|"resolve_tool(spec)"| Resolver + Resolver -->|Lookup factory| Factory + Factory -->|"create(**params)"| Instance + Instance -->|Used by| Agent + + style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Resolution Workflow:** + +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + + +## MCP Integration + +The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. + +**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph External["External MCP Server"] + Server["MCP Server
stdio/HTTP"] + ExtTools["External Tools"] + end + + subgraph Bridge["MCP Integration Layer"] + MCPClient["MCPClient
Sync/Async bridge"] + Convert["Schema Conversion
MCP → MCPToolDefinition"] + MCPExec["MCPToolExecutor
Bridges to MCP calls"] + end + + subgraph Agent["Agent System"] + ToolsMap["tools_map
str -> ToolDefinition"] + AgentLogic["Agent Execution"] + end + + Server -.->|Spawns| ExtTools + MCPClient --> Server + Server --> Convert + Convert -->|create_mcp_tools| MCPExec + MCPExec -->|Added during
agent.initialize| ToolsMap + ToolsMap --> AgentLogic + AgentLogic -->|Tool call| MCPExec + MCPExec --> MCPClient + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class MCPClient primary + class Convert,MCPExec secondary + class Server,ExtTools external +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | MCP server connection | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Tool wrapper | Wraps MCP tools as SDK `ToolDefinition` with dynamic validation | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP tool calls via MCPClient | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Generic action wrapper | Simple `dict[str, Any]` wrapper for MCP tool arguments | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results as observations with content blocks | +| **[`_create_mcp_action_type()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Dynamic schema | Runtime Pydantic model generated from MCP `inputSchema` for validation | + +### Sync/Async Bridge + +MCP protocol is asynchronous, but SDK tools execute synchronously. The bridge pattern in [client.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py) solves this: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Sync["Sync Tool Execution"] + Bridge["call_async_from_sync()"] + Loop["Background Event Loop"] + Async["Async MCP Call"] + Result["Return Result"] + + Sync --> Bridge + Bridge --> Loop + Loop --> Async + Async --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Loop fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Bridge Features:** +- **Background Event Loop** - Executes async code from sync contexts +- **Timeout Support** - Configurable timeouts for MCP operations +- **Error Handling** - Wraps MCP errors in observations +- **Connection Pooling** - Reuses connections across tool calls + +### Tool Discovery Flow + +**Source:** [`create_mcp_tools()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/utils.py) | [`agent._initialize()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py) + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Server Config
command + args"] + Spawn["Spawn Server Process
MCPClient"] + List["List Available Tools
client.list_tools()"] + + subgraph Convert["For Each MCP Tool"] + Store["Store MCP metadata
name, description, inputSchema"] + CreateExec["Create MCPToolExecutor
bound to tool + client"] + Def["Create MCPToolDefinition
generic MCPToolAction type"] + end + + Register["Add to Agent's tools_map
bypasses ToolRegistry"] + Ready["Tools Available
Dynamic models created on-demand"] + + Config --> Spawn + Spawn --> List + List --> Store + Store --> CreateExec + CreateExec --> Def + Def --> Register + Register --> Ready + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Def fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** +1. **Spawn Server** - Launch MCP server via stdio protocol (using `MCPClient`) +2. **List Tools** - Call MCP `tools/list` endpoint to retrieve available tools +3. **Parse Schemas** - Extract tool names, descriptions, and `inputSchema` from MCP response +4. **Create Definitions** - For each tool, call `MCPToolDefinition.create()` which: + - Creates an `MCPToolExecutor` instance bound to the tool name and client + - Wraps the MCP tool metadata in `MCPToolDefinition` + - Uses generic `MCPToolAction` as the action type (NOT dynamic models yet) +5. **Add to Agent** - All `MCPToolDefinition` instances are added to agent's `tools_map` during `initialize()` (bypasses ToolRegistry) +6. **Lazy Validation** - Dynamic Pydantic models are generated lazily when: + - `action_from_arguments()` is called (argument validation) + - `to_openai_tool()` is called (schema export to LLM) + +**Schema Handling:** + +| MCP Schema | SDK Integration | When Used | +|------------|----------------|-----------| +| `name` | Tool name (stored in `MCPToolDefinition`) | Discovery, execution | +| `description` | Tool description for LLM | Discovery, LLM prompt | +| `inputSchema` | Stored in `mcp_tool.inputSchema` | Lazy model generation | +| `inputSchema` fields | Converted to Pydantic fields via `Schema.from_mcp_schema()` | Validation, schema export | +| `annotations` | Mapped to `ToolAnnotations` | Security analysis, LLM hints | + +### MCP Server Configuration + +MCP servers are configured via the `mcp_config` field on the `Agent` class. Configuration follows [FastMCP config format](https://gofastmcp.com/clients/client#configuration-format): + +```python +from openhands.sdk import Agent + +agent = Agent( + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } + } +) +``` + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Sources["Tool Sources"] + Native["Native Tools"] + MCP["MCP Tools"] + end + + Registry["Tool Registry
resolve_tool"] + ToolsMap["Agent.tools_map
Merged tool dict"] + + subgraph AgentSystem["Agent System"] + Agent["Agent Logic"] + LLM["LLM"] + end + + Security["Security Analyzer"] + Conversation["Conversation State"] + + Native -->|register_tool| Registry + Registry --> ToolsMap + MCP -->|create_mcp_tools| ToolsMap + ToolsMap -->|Provide schemas| LLM + Agent -->|Execute tools| ToolsMap + ToolsMap -.->|Action risk| Security + ToolsMap -.->|Read state| Conversation + + style ToolsMap fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Native → Registry → tools_map**: Native tools resolved via `ToolRegistry` +- **MCP → tools_map**: MCP tools bypass registry, added directly during `initialize()` +- **tools_map → LLM**: Generate schemas describing all available capabilities +- **Agent → tools_map**: Execute actions, receive observations +- **tools_map → Conversation**: Read state for context-aware execution +- **tools_map → Security**: Tool annotations inform risk assessment + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents select and execute tools +- **[Events](/sdk/arch/events)** - ActionEvent and ObservationEvent structures +- **[Security Analyzer](/sdk/arch/security)** - Action risk assessment +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + + +# Workspace +Source: https://docs.openhands.dev/sdk/arch/workspace + +The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. + +**Source:** [`openhands/sdk/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +## Core Responsibilities + +The Workspace system has four primary responsibilities: + +1. **Execution Abstraction** - Unified interface for command execution across environments +2. **File Operations** - Upload, download, and manipulate files in workspace +3. **Resource Management** - Context manager protocol for setup/teardown +4. **Environment Isolation** - Separate agent execution from host system + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 60}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["BaseWorkspace
Abstract base class"] + end + + subgraph Implementations["Concrete Implementations"] + Local["LocalWorkspace
Direct subprocess"] + Remote["RemoteWorkspace
HTTP API calls"] + end + + subgraph Operations["Core Operations"] + Command["execute_command()"] + Upload["file_upload()"] + Download["file_download()"] + Context["__enter__ / __exit__"] + end + + subgraph Targets["Execution Targets"] + Process["Local Process"] + Container["Docker Container"] + Server["Remote Server"] + end + + Base --> Local + Base --> Remote + + Base -.->|Defines| Operations + + Local --> Process + Remote --> Container + Remote --> Server + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Local,Remote secondary + class Command,Upload tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`BaseWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)** | Abstract interface | Defines execution and file operation contracts | +| **[`LocalWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/local.py)** | Local execution | Subprocess-based command execution | +| **[`RemoteWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/remote/base.py)** | Remote execution | HTTP API-based execution via agent-server | +| **[`CommandResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | Execution output | Structured result with stdout, stderr, exit_code | +| **[`FileOperationResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | File op outcome | Success status and metadata | + +## Workspace Types + +### Local vs Remote Execution + + +| Aspect | LocalWorkspace | RemoteWorkspace | +|--------|----------------|-----------------| +| **Execution** | Direct subprocess | HTTP → agent-server | +| **Isolation** | Process-level | Container/VM-level | +| **Performance** | Fast (no network) | Network overhead | +| **Security** | Host system access | Sandboxed | +| **Use Case** | Development, CLI | Production, web apps | + +## Core Operations + +### Command Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + Tool["Tool invokes
execute_command()"] + + Decision{"Workspace
type?"} + + LocalExec["subprocess.run()
Direct execution"] + RemoteExec["POST /command
HTTP API"] + + Result["CommandResult
stdout, stderr, exit_code"] + + Tool --> Decision + Decision -->|Local| LocalExec + Decision -->|Remote| RemoteExec + + LocalExec --> Result + RemoteExec --> Result + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style LocalExec fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style RemoteExec fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Command Result Structure:** + +| Field | Type | Description | +|-------|------|-------------| +| **stdout** | str | Standard output stream | +| **stderr** | str | Standard error stream | +| **exit_code** | int | Process exit code (0 = success) | +| **timeout** | bool | Whether command timed out | +| **duration** | float | Execution time in seconds | + +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + +## Resource Management + +Workspaces use **context manager** for safe resource handling: + +**Lifecycle Hooks:** + +| Phase | LocalWorkspace | RemoteWorkspace | +|-------|----------------|-----------------| +| **Enter** | Create working directory | Connect to agent-server, verify | +| **Use** | Execute commands | Proxy commands via HTTP | +| **Exit** | No cleanup (persistent) | Disconnect, optionally stop container | + +## Remote Workspace Extensions + +The SDK provides remote workspace implementations in `openhands-workspace` package: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 50}} }%% +flowchart TB + Base["RemoteWorkspace
SDK base class"] + + Docker["DockerWorkspace
Auto-spawn containers"] + API["RemoteAPIWorkspace
Connect to existing server"] + + Base -.->|Extended by| Docker + Base -.->|Extended by| API + + Docker -->|Creates| Container["Docker Container
with agent-server"] + API -->|Connects| Server["Remote Agent Server"] + + style Base fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Docker fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style API fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Implementation Comparison:** + +| Type | Setup | Isolation | Use Case | +|------|-------|-----------|----------| +| **LocalWorkspace** | Immediate | Process | Development, trusted code | +| **DockerWorkspace** | Spawn container | Container | Multi-user, untrusted code | +| **RemoteAPIWorkspace** | Connect to URL | Remote server | Distributed systems, cloud | + +**Source:** +- **DockerWorkspace**: [`openhands-workspace/openhands/workspace/docker`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/docker) +- **RemoteAPIWorkspace**: [`openhands-workspace/openhands/workspace/remote_api`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/remote_api) + +## Component Relationships + +### How Workspace Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Workspace["Workspace"] + Conversation["Conversation"] + AgentServer["Agent Server"] + + Conversation -->|Configures| Workspace + Workspace -.->|Remote type| AgentServer + + style Workspace fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conversation fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Workspace**: Conversation factory uses workspace type to select LocalConversation or RemoteConversation +- **Workspace → Agent Server**: RemoteWorkspace delegates operations to agent-server API +- **Tools Independence**: Tools run in the same environment as workspace + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - How workspace type determines conversation implementation +- **[Agent Server](/sdk/arch/agent-server)** - Remote execution API +- **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution + + +# FAQ +Source: https://docs.openhands.dev/sdk/faq + +## How do I use AWS Bedrock with the SDK? + +**Yes, the OpenHands SDK supports AWS Bedrock through LiteLLM.** + +Since LiteLLM requires `boto3` for Bedrock requests, you need to install it alongside the SDK. + + + +### Step 1: Install boto3 + +Install the SDK with boto3: + +```bash +# Using pip +pip install openhands-sdk boto3 + +# Using uv +uv pip install openhands-sdk boto3 + +# Or when installing as a CLI tool +uv tool install openhands --with boto3 +``` + +### Step 2: Configure Authentication + +You have two authentication options: + +**Option A: API Key Authentication (Recommended)** + +Use the `AWS_BEARER_TOKEN_BEDROCK` environment variable: + +```bash +export AWS_BEARER_TOKEN_BEDROCK="your-bedrock-api-key" +``` + +**Option B: AWS Credentials** + +Use traditional AWS credentials: + +```bash +export AWS_ACCESS_KEY_ID="your-access-key" +export AWS_SECRET_ACCESS_KEY="your-secret-key" +export AWS_REGION_NAME="us-west-2" +``` + +### Step 3: Configure the Model + +Use the `bedrock/` prefix for your model name: + +```python +from openhands.sdk import LLM, Agent + +llm = LLM( + model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0", + # api_key is read from AWS_BEARER_TOKEN_BEDROCK automatically +) +``` + +For cross-region inference profiles, include the region prefix: + +```python +llm = LLM( + model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0", # US region + # or + model="bedrock/apac.anthropic.claude-sonnet-4-20250514-v1:0", # APAC region +) +``` + + + +For more details on Bedrock configuration options, see the [LiteLLM Bedrock documentation](https://docs.litellm.ai/docs/providers/bedrock). + +## Does the agent SDK support parallel tool calling? + +**Yes, the OpenHands SDK supports parallel tool calling by default.** + +The SDK automatically handles parallel tool calls when the underlying LLM (like Claude or GPT-4) returns multiple tool calls in a single response. This allows agents to execute multiple independent actions before the next LLM call. + + +When the LLM generates multiple tool calls in parallel, the SDK groups them using a shared `llm_response_id`: + +```python +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +# Combined into: Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + +Multiple `ActionEvent`s with the same `llm_response_id` are grouped together and combined into a single LLM message with multiple `tool_calls`. Only the first event's thought/reasoning is included. The parallel tool calling implementation can be found in the [Events Architecture](/sdk/arch/events#event-types) for detailed explanation of how parallel function calling works, the [`prepare_llm_messages` in utils.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/utils.py) which groups ActionEvents by `llm_response_id` when converting events to LLM messages, the [agent step method](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py#L200-L300) where actions are created with shared `llm_response_id`, and the [`ActionEvent` class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py) which includes the `llm_response_id` field. For more details, see the **[Events Architecture](/sdk/arch/events)** for a deep dive into the event system and parallel function calling, the **[Tool System](/sdk/arch/tool-system)** for understanding how tools work with the agent, and the **[Agent Architecture](/sdk/arch/agent)** for how agents process and execute actions. + + +## Does the agent SDK support image content? + +**Yes, the OpenHands SDK fully supports image content for vision-capable LLMs.** + +The SDK supports both HTTP/HTTPS URLs and base64-encoded images through the `ImageContent` class. + + + +### Check Vision Support + +Before sending images, verify your LLM supports vision: + +```python +from openhands.sdk import LLM +from pydantic import SecretStr + +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent" +) + +# Check if vision is active +assert llm.vision_is_active(), "Model does not support vision" +``` + +### Using HTTP URLs + +```python +from openhands.sdk import ImageContent, Message, TextContent + +message = Message( + role="user", + content=[ + TextContent(text="What do you see in this image?"), + ImageContent(image_urls=["https://example.com/image.png"]), + ], +) +``` + +### Using Base64 Images + +Base64 images are supported using data URLs: + +```python +import base64 +from openhands.sdk import ImageContent, Message, TextContent + +# Read and encode an image file +with open("my_image.png", "rb") as f: + image_base64 = base64.b64encode(f.read()).decode("utf-8") + +# Create message with base64 image +message = Message( + role="user", + content=[ + TextContent(text="Describe this image"), + ImageContent(image_urls=[f"data:image/png;base64,{image_base64}"]), + ], +) +``` + +### Supported Image Formats + +The data URL format is: `data:;base64,` + +Supported MIME types: +- `image/png` +- `image/jpeg` +- `image/gif` +- `image/webp` +- `image/bmp` + +### Built-in Image Support + +Several SDK tools automatically handle images: + +- **FileEditorTool**: When viewing image files (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`), they're automatically converted to base64 and sent to the LLM +- **BrowserUseTool**: Screenshots are captured and sent as base64 images +- **MCP Tools**: Image content from MCP tool results is automatically converted to base64 data URLs + +### Disabling Vision + +To disable vision for cost reduction (even on vision-capable models): + +```python +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent", + disable_vision=True, # Images will be filtered out +) +``` + + + +For a complete example, see the [image input example](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) in the SDK repository. + +## How do I handle MessageEvent in one-off tasks? + +**The SDK provides utilities to automatically respond to agent messages when running tasks end-to-end.** + +When running one-off tasks, some models may send a `MessageEvent` (proposing an action or asking for confirmation) instead of directly using tools. This causes `conversation.run()` to return, even though the agent hasn't finished the task. + + + +When an agent sends a message (via `MessageEvent`) instead of using the `finish` tool, the conversation ends because it's waiting for user input. In automated pipelines, there's no human to respond, so the task appears incomplete. + +**Key event types:** +- `ActionEvent`: Agent uses a tool (terminal, file editor, etc.) +- `MessageEvent`: Agent sends a text message (waiting for user response) +- `FinishAction`: Agent explicitly signals task completion + +The solution is to automatically send a "fake user response" when the agent sends a message, prompting it to continue. + + + + + +The [`run_conversation_with_fake_user_response`](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) function wraps your conversation and automatically handles agent messages: + +```python +from openhands.sdk.conversation.state import ConversationExecutionStatus +from openhands.sdk.event import ActionEvent, MessageEvent +from openhands.sdk.tool.builtins.finish import FinishAction + +def run_conversation_with_fake_user_response(conversation, max_responses: int = 10): + """Run conversation, auto-responding to agent messages until finish or limit.""" + for _ in range(max_responses): + conversation.run() + if conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + break + events = list(conversation.state.events) + # Check if agent used finish tool + if any(isinstance(e, ActionEvent) and isinstance(e.action, FinishAction) for e in reversed(events)): + break + # Check if agent sent a message (needs response) + if not any(isinstance(e, MessageEvent) and e.source == "agent" for e in reversed(events)): + break + # Send continuation prompt + conversation.send_message( + "Please continue. Use the finish tool when done. DO NOT ask for human help." + ) +``` + + + + + +```python +from openhands.sdk import Agent, Conversation, LLM +from openhands.workspace import DockerWorkspace +from openhands.tools.preset.default import get_default_tools + +llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key="...") +agent = Agent(llm=llm, tools=get_default_tools()) +workspace = DockerWorkspace() +conversation = Conversation(agent=agent, workspace=workspace, max_iteration_per_run=100) + +conversation.send_message("Fix the bug in src/utils.py") +run_conversation_with_fake_user_response(conversation, max_responses=10) +# Results available in conversation.state.events +``` + + + + +**Pro tip:** Add a hint to your task prompt: +> "If you're 100% done with the task, use the finish action. Otherwise, keep going until you're finished." + +This encourages the agent to use the finish tool rather than asking for confirmation. + + +For the full implementation used in OpenHands benchmarks, see the [fake_user_response.py](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) module. + +## More questions? + +If you have additional questions: + +- **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion + + +# Getting Started +Source: https://docs.openhands.dev/sdk/getting-started + +The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. + +## Prerequisites + +Install the **[uv package manager](https://docs.astral.sh/uv/)** (version 0.8.13+): + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +## Installation + +### Step 1: Acquire an LLM API Key + +The SDK requires an LLM API key from any [LiteLLM-supported provider](https://docs.litellm.ai/docs/providers). See our [recommended models](/openhands/usage/llms/llms) for best results. + + + + Bring your own API key from providers like: + - [Anthropic](https://console.anthropic.com/) + - [OpenAI](https://platform.openai.com/) + - [Other LiteLLM-supported providers](https://docs.litellm.ai/docs/providers) + + Example: + ```bash + export LLM_API_KEY="your-api-key" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + + + Sign up for [OpenHands Cloud](https://app.all-hands.dev) and get an LLM API key from the [API keys page](https://app.all-hands.dev/settings/api-keys). This gives you access to models verified to work well with OpenHands, with no markup. + + Example: + ```bash + export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + [Learn more →](/openhands/usage/llms/openhands-llms) + + + + If you have a ChatGPT Plus or Pro subscription, you can use `LLM.subscription_login()` to authenticate with your ChatGPT account and access Codex models without consuming API credits. + + ```python + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + [Learn more →](/sdk/guides/llm-subscriptions) + + + +> Tip: Model name prefixes depend on your provider +> +> - If you bring your own provider key (Anthropic/OpenAI/etc.), use that provider's model name, e.g. `anthropic/claude-sonnet-4-5-20250929` +OpenHands supports [dozens of models](https://docs.openhands.dev/sdk/arch/llm#llm-providers), you can choose the model you want to try. +> - If you use OpenHands Cloud, use `openhands/`-prefixed models, e.g. `openhands/claude-sonnet-4-5-20250929` +> +> Many examples in the docs read the model from the `LLM_MODEL` environment variable. You can set it like: +> +> ```bash +> export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" # for OpenHands Provider +> ``` + +**Set Your API Key:** + +```bash +export LLM_API_KEY=your-api-key-here +``` + +### Step 2: Install the SDK + + + + ```bash + pip install openhands-sdk # Core SDK (openhands.sdk) + pip install openhands-tools # Built-in tools (openhands.tools) + # Optional: required for sandboxed workspaces in Docker or remote servers + pip install openhands-workspace # Workspace backends (openhands.workspace) + pip install openhands-agent-server # Remote agent server (openhands.agent_server) + ``` + + + + ```bash + # Clone the repository + git clone https://github.com/OpenHands/software-agent-sdk.git + cd software-agent-sdk + + # Install dependencies and setup development environment + make build + ``` + + + + +### Step 3: Run Your First Agent + +Here's a complete example that creates an agent and asks it to perform a simple task: + +```python icon="python" expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + +Run the example: + +```bash +# Using a direct provider key (Anthropic/OpenAI/etc.) +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +```bash +# Using OpenHands Cloud +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +You should see the agent understand your request, explore the project, and create a file with facts about it. + +## Core Concepts + +**Agent**: An AI-powered entity that can reason, plan, and execute actions using tools. + +**Tools**: Capabilities like executing bash commands, editing files, or browsing the web. + +**Workspace**: The execution environment where agents operate (local, Docker, or remote). + +**Conversation**: Manages the interaction lifecycle between you and the agent. + +## Basic Workflow + +1. **Configure LLM**: Choose model and provide API key +2. **Create Agent**: Use preset or custom configuration +3. **Add Tools**: Enable capabilities (bash, file editing, etc.) +4. **Start Conversation**: Create conversation context +5. **Send Message**: Provide task description +6. **Run Agent**: Agent executes until task completes or stops +7. **Get Result**: Review agent's output and actions + + +## Try More Examples + +The repository includes 24+ examples demonstrating various capabilities: + +```bash +# Simple hello world +uv run python examples/01_standalone_sdk/01_hello_world.py + +# Custom tools +uv run python examples/01_standalone_sdk/02_custom_tools.py + +# With skills +uv run python examples/01_standalone_sdk/03_activate_microagent.py + +# See all examples +ls examples/01_standalone_sdk/ +``` + + +## Next Steps + +### Explore Documentation + +- **[SDK Architecture](/sdk/arch/sdk)** - Deep dive into components +- **[Tool System](/sdk/arch/tool-system)** - Available tools +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environments +- **[LLM Configuration](/sdk/arch/llm)** - Deep dive into language model configuration + +### Build Custom Solutions + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools to expand agent capabilities +- **[MCP Integration](/sdk/guides/mcp)** - Connect to external tools via Model Context Protocol +- **[Docker Workspaces](/sdk/guides/agent-server/docker-sandbox)** - Sandbox agent execution in containers + +### Get Help + +- **[Slack Community](https://openhands.dev/joinslack)** - Ask questions and share projects +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features +- **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples + + +# Browser Use +Source: https://docs.openhands.dev/sdk/guides/agent-browser-use + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built +on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, +and extracting content - all through natural language instructions. + +## How It Works + +The [ready-to-run example](#ready-to-run-example) demonstrates combining multiple tools to create a capable web research agent: + +1. **BrowserToolSet**: Provides automated browser control for web interaction +2. **FileEditorTool**: Allows the agent to read and write files if needed +3. **BashTool**: Enables command-line operations for additional functionality + +The agent uses these tools to: +- Navigate to specified URLs +- Interact with web page elements (clicking, scrolling, etc.) +- Extract and analyze content from web pages +- Summarize information from multiple sources + +In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. + +## Customization + +For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually +register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual +tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. +This gives you fine-grained control over which browser capabilities are exposed to the agent. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) + + +```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=BrowserToolSet.name), +] + +# If you need fine-grained browser control, you can manually register individual browser +# tools by creating a BrowserToolExecutor and providing factories that return customized +# Tool instances before constructing the Agent. + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services + + +# Creating Custom Agent +Source: https://docs.openhands.dev/sdk/guides/agent-custom + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. + + +This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) + + + +The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. + +```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py +#!/usr/bin/env python3 +""" +Planning Agent Workflow Example + +This example demonstrates a two-stage workflow: +1. Planning Agent: Analyzes the task and creates a detailed implementation plan +2. Execution Agent: Implements the plan with full editing capabilities + +The task: Create a Python web scraper that extracts article titles and URLs +from a news website, handles rate limiting, and saves results to JSON. +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.llm import content_to_str +from openhands.tools.preset.default import get_default_agent +from openhands.tools.preset.planning import get_planning_agent + + +def get_event_content(event): + """Extract content from an event.""" + if hasattr(event, "llm_message"): + return "".join(content_to_str(event.llm_message.content)) + return str(event) + + +"""Run the planning agent workflow example.""" + +# Create a temporary workspace +workspace_dir = Path(tempfile.mkdtemp()) +print(f"Working in: {workspace_dir}") + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="agent", +) + +# Task description +task = """ +Create a Python web scraper with the following requirements: +- Scrape article titles and URLs from a news website +- Handle HTTP errors gracefully with retry logic +- Save results to a JSON file with timestamp +- Use requests and BeautifulSoup for scraping + +Do NOT ask for any clarifying questions. Directly create your implementation plan. +""" + +print("=" * 80) +print("PHASE 1: PLANNING") +print("=" * 80) + +# Create Planning Agent with read-only tools +planning_agent = get_planning_agent(llm=llm) + +# Create conversation for planning +planning_conversation = Conversation( + agent=planning_agent, + workspace=str(workspace_dir), +) + +# Run planning phase +print("Planning Agent is analyzing the task and creating implementation plan...") +planning_conversation.send_message( + f"Please analyze this web scraping task and create a detailed " + f"implementation plan:\n\n{task}" +) +planning_conversation.run() + +print("\n" + "=" * 80) +print("PLANNING COMPLETE") +print("=" * 80) +print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") + +print("\n" + "=" * 80) +print("PHASE 2: EXECUTION") +print("=" * 80) + +# Create Execution Agent with full editing capabilities +execution_agent = get_default_agent(llm=llm, cli_mode=True) + +# Create conversation for execution +execution_conversation = Conversation( + agent=execution_agent, + workspace=str(workspace_dir), +) + +# Prepare execution prompt with reference to the plan file +execution_prompt = f""" +Please implement the web scraping project according to the implementation plan. + +The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md + +Please read the plan from PLAN.md and implement all components according to it. + +Create all necessary files, implement the functionality, and ensure everything +works together properly. +""" + +print("Execution Agent is implementing the plan...") +execution_conversation.send_message(execution_prompt) +execution_conversation.run() + +# Get the last message from the conversation +execution_result = execution_conversation.state.events[-1] + +print("\n" + "=" * 80) +print("EXECUTION RESULT:") +print("=" * 80) +print(get_event_content(execution_result)) + +print("\n" + "=" * 80) +print("WORKFLOW COMPLETE") +print("=" * 80) +print(f"Project files created in: {workspace_dir}") + +# List created files +print("\nCreated files:") +for file_path in workspace_dir.rglob("*"): + if file_path.is_file(): + print(f" - {file_path.relative_to(workspace_dir)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Anatomy of a Custom Agent + +The planning agent demonstrates the two key components for creating specialized agent: + +### 1. Custom Tool Selection + +Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: + +```python icon="python" + +def register_planning_tools() -> None: + """Register the planning agent tools.""" + from openhands.tools.glob import GlobTool + from openhands.tools.grep import GrepTool + from openhands.tools.planning_file_editor import PlanningFileEditorTool + + register_tool("GlobTool", GlobTool) + logger.debug("Tool: GlobTool registered.") + register_tool("GrepTool", GrepTool) + logger.debug("Tool: GrepTool registered.") + register_tool("PlanningFileEditorTool", PlanningFileEditorTool) + logger.debug("Tool: PlanningFileEditorTool registered.") + + +def get_planning_tools() -> list[Tool]: + """Get the planning agent tool specifications. + + Returns: + List of tools optimized for planning and analysis tasks, including + file viewing and PLAN.md editing capabilities for advanced + code discovery and navigation. + """ + register_planning_tools() + + return [ + Tool(name="GlobTool"), + Tool(name="GrepTool"), + Tool(name="PlanningFileEditorTool"), + ] +``` + +The planning agent uses: +- **GlobTool**: For discovering files and directories matching patterns +- **GrepTool**: For searching specific content across files +- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only + +This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. + +### 2. System Prompt Customization + +Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: +1. **Objective**: Clear goal statement +2. **Context Summary**: Relevant system components and constraints +3. **Approach Overview**: High-level strategy and rationale +4. **Implementation Steps**: Detailed step-by-step execution plan +5. **Testing and Validation**: Verification methods and success criteria + +### Complete Implementation Reference + +For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case +- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management +- **[MCP Integration](/sdk/guides/mcp)** - Add MCP + + +# Sub-Agent Delegation +Source: https://docs.openhands.dev/sdk/guides/agent-delegation + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Agent delegation allows a main agent to spawn multiple sub-agents and delegate tasks to them for parallel processing. Each sub-agent runs independently with its own conversation context and returns results that the main agent can consolidate and process further. + +This pattern is useful when: +- Breaking down complex problems into independent subtasks +- Processing multiple related tasks in parallel +- Separating concerns between different specialized sub-agents +- Improving throughput for parallelizable work + +## How It Works + +The delegation system consists of two main operations: + +### 1. Spawning Sub-Agents + +Before delegating work, the agent must first spawn sub-agents with meaningful identifiers: + +```python icon="python" wrap +# Agent uses the delegate tool to spawn sub-agents +{ + "command": "spawn", + "ids": ["lodging", "activities"] +} +``` + +Each spawned sub-agent: +- Gets a unique identifier that the agent specify (e.g., "lodging", "activities") +- Inherits the same LLM configuration as the parent agent +- Operates in the same workspace as the main agent +- Maintains its own independent conversation context + +### 2. Delegating Tasks + +Once sub-agents are spawned, the agent can delegate tasks to them: + +```python icon="python" wrap +# Agent uses the delegate tool to assign tasks +{ + "command": "delegate", + "tasks": { + "lodging": "Find the best budget-friendly areas to stay in London", + "activities": "List top 5 must-see attractions and hidden gems in London" + } +} +``` + +The delegate operation: +- Runs all sub-agent tasks in parallel using threads +- Blocks until all sub-agents complete their work +- Returns a single consolidated observation with all results +- Handles errors gracefully and reports them per sub-agent + +## Setting Up the DelegateTool + + + + ### Register the Tool + + ```python icon="python" wrap + from openhands.sdk.tool import register_tool + from openhands.tools.delegate import DelegateTool + + register_tool("DelegateTool", DelegateTool) + ``` + + + ### Add to Agent Tools + + ```python icon="python" wrap + from openhands.sdk import Tool + from openhands.tools.preset.default import get_default_tools + + tools = get_default_tools(enable_browser=False) + tools.append(Tool(name="DelegateTool")) + + agent = Agent(llm=llm, tools=tools) + ``` + + + ### Configure Maximum Sub-Agents (Optional) + + The user can limit the maximum number of concurrent sub-agents: + + ```python icon="python" wrap + from openhands.tools.delegate import DelegateTool + + class CustomDelegateTool(DelegateTool): + @classmethod + def create(cls, conv_state, max_children: int = 3): + # Only allow up to 3 sub-agents + return super().create(conv_state, max_children=max_children) + + register_tool("DelegateTool", CustomDelegateTool) + ``` + + + + +## Tool Commands + +### spawn + +Initialize sub-agents with meaningful identifiers. + +**Parameters:** +- `command`: `"spawn"` +- `ids`: List of string identifiers (e.g., `["research", "implementation", "testing"]`) + +**Returns:** +A message indicating the sub-agents were successfully spawned. + +**Example:** +```python icon="python" wrap +{ + "command": "spawn", + "ids": ["research", "implementation", "testing"] +} +``` + +### delegate + +Send tasks to specific sub-agents and wait for results. + +**Parameters:** +- `command`: `"delegate"` +- `tasks`: Dictionary mapping sub-agent IDs to task descriptions + +**Returns:** +A consolidated message containing all results from the sub-agents. + +**Example:** +```python icon="python" wrap +{ + "command": "delegate", + "tasks": { + "research": "Find best practices for async code", + "implementation": "Refactor the MyClass class", + "testing": "Write unit tests for the refactored code" + } +} +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/25_agent_delegation.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_agent_delegation.py) + + +```python icon="python" expandable examples/01_standalone_sdk/25_agent_delegation.py +""" +Agent Delegation Example + +This example demonstrates the agent delegation feature where a main agent +delegates tasks to sub-agents for parallel processing. +Each sub-agent runs independently and returns its results to the main agent, +which then merges both analyses into a single consolidated report. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Tool, + get_logger, +) +from openhands.sdk.context import Skill +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import ( + DelegateTool, + DelegationVisualizer, + register_agent, +) +from openhands.tools.preset.default import get_default_tools + + +ONLY_RUN_SIMPLE_DELEGATION = False + +logger = get_logger(__name__) + +# Configure LLM and agent +# You can get an API key from https://app.all-hands.dev/settings/api-keys +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=os.environ.get("LLM_BASE_URL", None), + usage_id="agent", +) + +cwd = os.getcwd() + +register_tool("DelegateTool", DelegateTool) +tools = get_default_tools(enable_browser=False) +tools.append(Tool(name="DelegateTool")) + +main_agent = Agent( + llm=llm, + tools=tools, +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +task_message = ( + "Forget about coding. Let's switch to travel planning. " + "Let's plan a trip to London. I have two issues I need to solve: " + "Lodging: what are the best areas to stay at while keeping budget in mind? " + "Activities: what are the top 5 must-see attractions and hidden gems? " + "Please use the delegation tools to handle these two tasks in parallel. " + "Make sure the sub-agents use their own knowledge " + "and dont rely on internet access. " + "They should keep it short. After getting the results, merge both analyses " + "into a single consolidated report.\n\n" +) +conversation.send_message(task_message) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for simple delegation example +cost_1 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (simple delegation): {cost_1}") + +print("Simple delegation example done!", "\n" * 20) + + +# -------- Agent Delegation Second Part: User-Defined Agent Types -------- + +if ONLY_RUN_SIMPLE_DELEGATION: + exit(0) + + +def create_lodging_planner(llm: LLM) -> Agent: + """Create a lodging planner focused on London stays.""" + skills = [ + Skill( + name="lodging_planning", + content=( + "You specialize in finding great places to stay in London. " + "Provide 3-4 hotel recommendations with neighborhoods, quick " + "pros/cons, " + "and notes on transit convenience. Keep options varied by budget." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Focus only on London lodging recommendations.", + ), + ) + + +def create_activities_planner(llm: LLM) -> Agent: + """Create an activities planner focused on London itineraries.""" + skills = [ + Skill( + name="activities_planning", + content=( + "You design concise London itineraries. Suggest 2-3 daily " + "highlights, grouped by proximity to minimize travel time. " + "Include food/coffee stops " + "and note required tickets/reservations." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Plan practical, time-efficient days in London.", + ), + ) + + +# Register user-defined agent types (default agent type is always available) +register_agent( + name="lodging_planner", + factory_func=create_lodging_planner, + description="Finds London lodging options with transit-friendly picks.", +) +register_agent( + name="activities_planner", + factory_func=create_activities_planner, + description="Creates time-efficient London activity itineraries.", +) + +# Make the delegation tool available to the main agent +register_tool("DelegateTool", DelegateTool) + +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +task_message = ( + "Plan a 3-day London trip. " + "1) Spawn two sub-agents: lodging_planner (hotel options) and " + "activities_planner (itinerary). " + "2) Ask lodging_planner for 3-4 central London hotel recommendations with " + "neighborhoods, quick pros/cons, and transit notes by budget. " + "3) Ask activities_planner for a concise 3-day itinerary with nearby stops, " + " food/coffee suggestions, and any ticket/reservation notes. " + "4) Share both sub-agent results and propose a combined plan." +) + +print("=" * 100) +print("Demonstrating London trip delegation (lodging + activities)...") +print("=" * 100) + +conversation.send_message(task_message) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for user-defined agent types example +cost_2 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (user-defined agents): {cost_2}") + +print("All done!") + +# Full example cost report for CI workflow +print(f"EXAMPLE_COST: {cost_1 + cost_2}") +``` + + + + +# Interactive Terminal +Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. + + +## How It Works + +```python icon="python" focus={4-7} +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] +``` + + +The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. + +In the example above, the agent should: +1. Enters Python's interactive mode by running `python3` +2. Executes Python code to get the current time +3. Exits the Python interpreter + +The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/definition.py) and [terminal source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + params={"no_change_timeout_seconds": 3}, + ) +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Enter python interactive mode by directly running `python3`, then tell me " + "the current time, and exit python interactive mode." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases + + +# API-based Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. + +## Key Concepts + +### APIRemoteWorkspace + +The `APIRemoteWorkspace` connects to a hosted runtime API service: + +```python icon="python" +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) as workspace: +``` + +This workspace type: +- Connects to a remote runtime API service +- Automatically provisions sandboxed environments +- Manages container lifecycle through the API +- Handles all infrastructure concerns + +### Runtime API Authentication + +The example requires a runtime API key for authentication: + +```python icon="python" +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) +``` + +This key authenticates your requests to the hosted runtime service. + +### Pre-built Image Selection + +You can specify which pre-built agent server image to use: + +```python icon="python" focus={4} +APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) +``` + +The runtime API will pull and run the specified image in a sandboxed environment. + +### Workspace Testing + +Just like with `DockerWorkspace`, you can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the remote runtime and ensures the environment is ready. + +### Automatic RemoteConversation + +The conversation uses WebSocket communication with the remote server: + +```python icon="python" focus={1, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True +) +assert isinstance(conversation, RemoteConversation) +``` + +All agent execution happens on the remote runtime infrastructure. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +This example shows how to connect to a hosted runtime API for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +"""Example: APIRemoteWorkspace with Dynamic Build. + +This example demonstrates building an agent-server image on-the-fly from the SDK +codebase and launching it in a remote sandboxed environment via Runtime API. + +Usage: + uv run examples/24_remote_convo_with_api_sandboxed_server.py + +Requirements: + - LLM_API_KEY: API key for LLM access + - RUNTIME_API_KEY: API key for runtime API access +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import APIRemoteWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) + + +# If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency +# Otherwise, use the latest image from main +server_image_sha = os.getenv("GITHUB_SHA") or "main" +server_image = f"ghcr.io/openhands/agent-server:{server_image_sha[:7]}-python-amd64" +logger.info(f"Using server image: {server_image}") + +with APIRemoteWorkspace( + runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), + runtime_api_key=runtime_api_key, + server_image=server_image, + image_pull_policy="Always", +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() +``` + +You can run the example code as-is. + +```bash Running the Example +export LLM_API_KEY="your-api-key" +# If using the OpenHands LLM proxy, set its base URL: +export LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" +export RUNTIME_API_KEY="your-runtime-api-key" +# Set the runtime API URL for the remote sandbox +export RUNTIME_API_URL="https://runtime.eval.all-hands.dev" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + + +# Apptainer Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#basic-apptainer-sandbox-example)! + +The Apptainer sandboxed agent server demonstrates how to run agents in isolated Apptainer containers using ApptainerWorkspace. + +Apptainer (formerly Singularity) is a container runtime designed for HPC environments that doesn't require root access, making it ideal for shared computing environments, university clusters, and systems where Docker is not available. + +## When to Use Apptainer + +Use Apptainer instead of Docker when: +- Running on HPC clusters or shared computing environments +- Root access is not available +- Docker daemon cannot be installed +- Working in academic or research computing environments +- Security policies restrict Docker usage + +## Prerequisites + +Before running this example, ensure you have: +- Apptainer installed ([Installation Guide](https://apptainer.org/docs/user/main/quick_start.html)) +- LLM API key set in environment + +## Basic Apptainer Sandbox Example + + +This example is available on GitHub: [examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py) + + +This example shows how to create an `ApptainerWorkspace` that automatically manages Apptainer containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import ApptainerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create an Apptainer-based remote workspace that will set up and manage +# the Apptainer container automatically. Use `ApptainerWorkspace` with a +# pre-built agent server image. +# Apptainer (formerly Singularity) doesn't require root access, making it +# ideal for HPC and shared computing environments. +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with ApptainerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Report cost (must be before conversation.close()) + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Configuration Options + +The `ApptainerWorkspace` supports several configuration options: + +### Option 1: Pre-built Image (Recommended) + +Use a pre-built agent server image for fastest startup: + +```python icon="python" focus={2} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, +) as workspace: + # Your code here +``` + +### Option 2: Build from Base Image + +Build from a base image when you need custom dependencies: + +```python icon="python" focus={2} +with ApptainerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) as workspace: + # Your code here +``` + + +Building from a base image requires internet access and may take several minutes on first run. The built image is cached for subsequent runs. + + +### Option 3: Use Existing SIF File + +If you have a pre-built Apptainer SIF file: + +```python icon="python" focus={2} +with ApptainerWorkspace( + sif_file="/path/to/your/agent-server.sif", + host_port=8010, +) as workspace: + # Your code here +``` + +## Key Features + +### Rootless Container Execution + +Apptainer runs completely without root privileges: +- No daemon process required +- User namespace isolation +- Compatible with most HPC security policies + +### Image Caching + +Apptainer automatically caches container images: +- First run builds/pulls the image +- Subsequent runs reuse cached SIF files +- Cache location: `~/.cache/apptainer/` + +### Port Mapping + +The workspace exposes ports for agent services: +```python icon="python" focus={1, 3} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, # Maps to container port 8010 +) as workspace: + # Access agent server at http://localhost:8010 +``` + +## Differences from Docker + +While the API is similar to DockerWorkspace, there are some differences: + +| Feature | Docker | Apptainer | +|---------|--------|-----------| +| Root access required | Yes (daemon) | No | +| Installation | Requires Docker Engine | Single binary | +| Image format | OCI/Docker | SIF | +| Build speed | Fast (layers) | Slower (monolithic) | +| HPC compatibility | Limited | Excellent | +| Networking | Bridge/overlay | Host networking | + +## Troubleshooting + +### Apptainer Not Found + +If you see `apptainer: command not found`: +1. Install Apptainer following the [official guide](https://apptainer.org/docs/user/main/quick_start.html) +2. Ensure it's in your PATH: `which apptainer` + +### Permission Errors + +Apptainer should work without root. If you see permission errors: +- Check that your user has access to `/tmp` +- Verify Apptainer is properly installed: `apptainer version` +- Ensure the cache directory is writable: `ls -la ~/.cache/apptainer/` + +## Next Steps + +- **[Docker Sandbox](/sdk/guides/agent-server/docker-sandbox)** - Alternative container runtime +- **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing +- **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution + + +# OpenHands Cloud Workspace +Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `OpenHandsCloudWorkspace` demonstrates how to use the [OpenHands Cloud](https://app.all-hands.dev) to provision and manage sandboxed environments for agent execution. This provides a seamless experience with automatic sandbox provisioning, monitoring, and secure execution without managing your own infrastructure. + +## Key Concepts + +### OpenHandsCloudWorkspace + +The `OpenHandsCloudWorkspace` connects to OpenHands Cloud to provision sandboxes: + +```python icon="python" focus={1-2} +with OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, +) as workspace: +``` + +This workspace type: +- Connects to OpenHands Cloud API +- Automatically provisions sandboxed environments +- Manages sandbox lifecycle (create, poll status, delete) +- Handles all infrastructure concerns + +### Getting Your API Key + +To use OpenHands Cloud, you need an API key: + +1. Go to [app.all-hands.dev](https://app.all-hands.dev) +2. Sign in to your account +3. Navigate to Settings → API Keys +4. Create a new API key + +Store this key securely and use it as the `OPENHANDS_CLOUD_API_KEY` environment variable. + + +### Configuration Options + +The `OpenHandsCloudWorkspace` supports several configuration options: + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `cloud_api_url` | `str` | Required | OpenHands Cloud API URL | +| `cloud_api_key` | `str` | Required | API key for authentication | +| `sandbox_spec_id` | `str \| None` | `None` | Custom sandbox specification ID | +| `init_timeout` | `float` | `300.0` | Timeout for sandbox initialization (seconds) | +| `api_timeout` | `float` | `60.0` | Timeout for API requests (seconds) | +| `keep_alive` | `bool` | `False` | Keep sandbox running after cleanup | + +### Keep Alive Mode + +By default, the sandbox is deleted when the workspace is closed. To keep it running: + +```python icon="python" focus={4} +workspace = OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, + keep_alive=True, +) +``` + +This is useful for debugging or when you want to inspect the sandbox state after execution. + +### Workspace Testing + +You can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the cloud sandbox and ensures the environment is ready. + +## Comparison with Other Workspace Types + +| Feature | OpenHandsCloudWorkspace | APIRemoteWorkspace | DockerWorkspace | +|---------|------------------------|-------------------|-----------------| +| Infrastructure | OpenHands Cloud | Runtime API | Local Docker | +| Authentication | API Key | API Key | None | +| Setup Required | None | Runtime API access | Docker installed | +| Custom Images | Via sandbox specs | Direct image specification | Direct image specification | +| Best For | Production use | Custom runtime environments | Local development | + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/07_convo_with_cloud_workspace.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/07_convo_with_cloud_workspace.py) + + +This example shows how to connect to OpenHands Cloud for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +"""Example: OpenHandsCloudWorkspace for OpenHands Cloud API. + +This example demonstrates using OpenHandsCloudWorkspace to provision a sandbox +via OpenHands Cloud (app.all-hands.dev) and run an agent conversation. + +Usage: + uv run examples/02_remote_agent_server/06_convo_with_cloud_workspace.py + +Requirements: + - LLM_API_KEY: API key for direct LLM provider access (e.g., Anthropic API key) + - OPENHANDS_CLOUD_API_KEY: API key for OpenHands Cloud access + +Note: + The LLM configuration is sent to the cloud sandbox, so you need an API key + that works directly with the LLM provider (not a local proxy). If using + Anthropic, set LLM_API_KEY to your Anthropic API key. +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import OpenHandsCloudWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +# Note: Don't use a local proxy URL here - the cloud sandbox needs direct access +# to the LLM provider. Use None for base_url to let LiteLLM use the default +# provider endpoint, or specify the provider's direct URL. +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL") or None, + api_key=SecretStr(api_key), +) + +cloud_api_key = os.getenv("OPENHANDS_CLOUD_API_KEY") +if not cloud_api_key: + logger.error("OPENHANDS_CLOUD_API_KEY required") + exit(1) + +cloud_api_url = os.getenv("OPENHANDS_CLOUD_API_URL", "https://app.all-hands.dev") +logger.info(f"Using OpenHands Cloud API: {cloud_api_url}") + +with OpenHandsCloudWorkspace( + cloud_api_url=cloud_api_url, + cloud_api_key=cloud_api_key, +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() + + logger.info("✅ Conversation completed successfully.") + logger.info(f"Total {len(received_events)} events received during conversation.") +``` + + +```bash Running the Example +export LLM_API_KEY="your-llm-api-key" +export OPENHANDS_CLOUD_API_KEY="your-cloud-api-key" +# Optional: specify a custom sandbox spec +# export OPENHANDS_SANDBOX_SPEC_ID="your-sandbox-spec-id" +cd agent-sdk +uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +``` + +## Next Steps + +- **[API-based Sandbox](/sdk/guides/agent-server/api-sandbox)** - Connect to Runtime API service +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run locally with Docker +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details + + +# Custom Tools with Remote Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +When using a [remote agent server](/sdk/guides/agent-server/overview), custom tools must be available in the server's Python environment. This guide shows how to build a custom base image with your tools and use `DockerDevWorkspace` to automatically build the agent server on top of it. + + +For standalone custom tools (without remote agent server), see the [Custom Tools guide](/sdk/guides/custom-tools). + + +## How It Works + +1. **Define custom tool** with `register_tool()` at module level +2. **Create Dockerfile** that copies tools and sets `PYTHONPATH` +3. **Build custom base image** with your tools +4. **Use `DockerDevWorkspace`** with `base_image` parameter - it builds the agent server on top +5. **Import tool module** in client before creating conversation +6. **Server imports modules** dynamically, triggering registration + +## Key Files + +### Custom Tool (`custom_tools/log_data.py`) + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py +"""Log Data Tool - Example custom tool for logging structured data to JSON. + +This tool demonstrates how to create a custom tool that logs structured data +to a local JSON file during agent execution. The data can be retrieved and +verified after the agent completes. +""" + +import json +from collections.abc import Sequence +from datetime import UTC, datetime +from enum import Enum +from pathlib import Path +from typing import Any + +from pydantic import Field + +from openhands.sdk import ( + Action, + ImageContent, + Observation, + TextContent, + ToolDefinition, +) +from openhands.sdk.tool import ToolExecutor, register_tool + + +# --- Enums and Models --- + + +class LogLevel(str, Enum): + """Log level for entries.""" + + DEBUG = "debug" + INFO = "info" + WARNING = "warning" + ERROR = "error" + + +class LogDataAction(Action): + """Action to log structured data to a JSON file.""" + + message: str = Field(description="The log message") + level: LogLevel = Field( + default=LogLevel.INFO, + description="Log level (debug, info, warning, error)", + ) + data: dict[str, Any] = Field( + default_factory=dict, + description="Additional structured data to include in the log entry", + ) + + +class LogDataObservation(Observation): + """Observation returned after logging data.""" + + success: bool = Field(description="Whether the data was successfully logged") + log_file: str = Field(description="Path to the log file") + entry_count: int = Field(description="Total number of entries in the log file") + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + """Convert observation to LLM content.""" + if self.success: + return [ + TextContent( + text=( + f"✅ Data logged successfully to {self.log_file}\n" + f"Total entries: {self.entry_count}" + ) + ) + ] + return [TextContent(text="❌ Failed to log data")] + + +# --- Executor --- + +# Default log file path +DEFAULT_LOG_FILE = "/tmp/agent_data.json" + + +class LogDataExecutor(ToolExecutor[LogDataAction, LogDataObservation]): + """Executor that logs structured data to a JSON file.""" + + def __init__(self, log_file: str = DEFAULT_LOG_FILE): + """Initialize the log data executor. + + Args: + log_file: Path to the JSON log file + """ + self.log_file = Path(log_file) + + def __call__( + self, + action: LogDataAction, + conversation=None, # noqa: ARG002 + ) -> LogDataObservation: + """Execute the log data action. + + Args: + action: The log data action + conversation: Optional conversation context (not used) + + Returns: + LogDataObservation with the result + """ + # Load existing entries or start fresh + entries: list[dict[str, Any]] = [] + if self.log_file.exists(): + try: + with open(self.log_file) as f: + entries = json.load(f) + except (json.JSONDecodeError, OSError): + entries = [] + + # Create new entry with timestamp + entry = { + "timestamp": datetime.now(UTC).isoformat(), + "level": action.level.value, + "message": action.message, + "data": action.data, + } + entries.append(entry) + + # Write back to file + self.log_file.parent.mkdir(parents=True, exist_ok=True) + with open(self.log_file, "w") as f: + json.dump(entries, f, indent=2) + + return LogDataObservation( + success=True, + log_file=str(self.log_file), + entry_count=len(entries), + ) + + +# --- Tool Definition --- + +_LOG_DATA_DESCRIPTION = """Log structured data to a JSON file. + +Use this tool to record information, findings, or events during your work. +Each log entry includes a timestamp and can contain arbitrary structured data. + +Parameters: +* message: A descriptive message for the log entry +* level: Log level - one of 'debug', 'info', 'warning', 'error' (default: info) +* data: Optional dictionary of additional structured data to include + +Example usage: +- Log a finding: message="Found potential issue", level="warning", data={"file": "app.py", "line": 42} +- Log progress: message="Completed analysis", level="info", data={"files_checked": 10} +""" # noqa: E501 + + +class LogDataTool(ToolDefinition[LogDataAction, LogDataObservation]): + """Tool for logging structured data to a JSON file.""" + + @classmethod + def create(cls, conv_state, **params) -> Sequence[ToolDefinition]: # noqa: ARG003 + """Create LogDataTool instance. + + Args: + conv_state: Conversation state (not used in this example) + **params: Additional parameters: + - log_file: Path to the JSON log file (default: /tmp/agent_data.json) + + Returns: + A sequence containing a single LogDataTool instance + """ + log_file = params.get("log_file", DEFAULT_LOG_FILE) + executor = LogDataExecutor(log_file=log_file) + + return [ + cls( + description=_LOG_DATA_DESCRIPTION, + action_type=LogDataAction, + observation_type=LogDataObservation, + executor=executor, + ) + ] + + +# Auto-register the tool when this module is imported +# This is what enables dynamic tool registration in the remote agent server +register_tool("LogDataTool", LogDataTool) +``` + +### Dockerfile + +```dockerfile icon="docker" +FROM nikolaik/python-nodejs:python3.12-nodejs22 + +COPY custom_tools /app/custom_tools +ENV PYTHONPATH="/app:${PYTHONPATH}" +``` + +## Troubleshooting + +| Issue | Solution | +|-------|----------| +| Tool not found | Ensure `register_tool()` is called at module level, import tool before creating conversation | +| Import errors on server | Check `PYTHONPATH` in Dockerfile, verify all dependencies installed | +| Build failures | Verify file paths in `COPY` commands, ensure Python 3.12+ | + + +**Binary Mode Limitation**: Custom tools only work with **source mode** deployments. When using `DockerDevWorkspace`, set `target="source"` (the default). See [GitHub issue #1531](https://github.com/OpenHands/software-agent-sdk/issues/1531) for details. + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/06_custom_tool/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/02_remote_agent_server/06_custom_tool) + + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tool_example.py +"""Example: Using custom tools with remote agent server. + +This example demonstrates how to use custom tools with a remote agent server +by building a custom base image that includes the tool implementation. + +Prerequisites: + 1. Build the custom base image first: + cd examples/02_remote_agent_server/05_custom_tool + ./build_custom_image.sh + + 2. Set LLM_API_KEY environment variable + +The workflow is: +1. Define a custom tool (LogDataTool for logging structured data to JSON) +2. Create a simple Dockerfile that copies the tool into the base image +3. Build the custom base image +4. Use DockerDevWorkspace with base_image pointing to the custom image +5. DockerDevWorkspace builds the agent server on top of the custom base image +6. The server dynamically registers tools when the client creates a conversation +7. The agent can use the custom tool during execution +8. Verify the logged data by reading the JSON file from the workspace + +This pattern is useful for: +- Collecting structured data during agent runs (logs, metrics, events) +- Implementing custom integrations with external systems +- Adding domain-specific operations to the agent +""" + +import os +import platform +import subprocess +import sys +import time +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + Tool, + get_logger, +) +from openhands.workspace import DockerDevWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# Get the directory containing this script +example_dir = Path(__file__).parent.absolute() + +# Custom base image tag (contains custom tools, agent server built on top) +CUSTOM_BASE_IMAGE_TAG = "custom-base-image:latest" + +# 2) Check if custom base image exists, build if not +logger.info(f"🔍 Checking for custom base image: {CUSTOM_BASE_IMAGE_TAG}") +result = subprocess.run( + ["docker", "images", "-q", CUSTOM_BASE_IMAGE_TAG], + capture_output=True, + text=True, + check=False, +) + +if not result.stdout.strip(): + logger.info("⚠️ Custom base image not found. Building...") + logger.info("📦 Building custom base image with custom tools...") + build_script = example_dir / "build_custom_image.sh" + try: + subprocess.run( + [str(build_script), CUSTOM_BASE_IMAGE_TAG], + cwd=str(example_dir), + check=True, + ) + logger.info("✅ Custom base image built successfully!") + except subprocess.CalledProcessError as e: + logger.error(f"❌ Failed to build custom base image: {e}") + logger.error("Please run ./build_custom_image.sh manually and fix any errors.") + sys.exit(1) +else: + logger.info(f"✅ Custom base image found: {CUSTOM_BASE_IMAGE_TAG}") + +# 3) Create a DockerDevWorkspace with the custom base image +# DockerDevWorkspace will build the agent server on top of this base image +logger.info("🚀 Building and starting agent server with custom tools...") +logger.info("📦 This may take a few minutes on first run...") + +with DockerDevWorkspace( + base_image=CUSTOM_BASE_IMAGE_TAG, + host_port=8011, + platform=detect_platform(), + target="source", # NOTE: "binary" target does not work with custom tools +) as workspace: + logger.info("✅ Custom agent server started!") + + # 4) Import custom tools to register them in the client's registry + # This allows the client to send the module qualname to the server + # The server will then import the same module and execute the tool + import custom_tools.log_data # noqa: F401 + + # 5) Create agent with custom tools + # Note: We specify the tool here, but it's actually executed on the server + # Get default tools and add our custom tool + from openhands.sdk import Agent + from openhands.tools.preset.default import get_default_condenser, get_default_tools + + tools = get_default_tools(enable_browser=False) + # Add our custom tool! + tools.append(Tool(name="LogDataTool")) + + agent = Agent( + llm=llm, + tools=tools, + system_prompt_kwargs={"cli_mode": True}, + condenser=get_default_condenser( + llm=llm.model_copy(update={"usage_id": "condenser"}) + ), + ) + + # 6) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 7) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Custom agent server ready!' && python --version" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + # 8) Create conversation with the custom agent + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending task to analyze files and log findings...") + conversation.send_message( + "Please analyze the Python files in the current directory. " + "Use the LogDataTool to log your findings as you work. " + "For example:\n" + "- Log when you start analyzing a file (level: info)\n" + "- Log any interesting patterns you find (level: info)\n" + "- Log any potential issues (level: warning)\n" + "- Include relevant data like file names, line numbers, etc.\n\n" + "Make at least 3 log entries using the LogDataTool." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ Task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + # 9) Read the logged data from the JSON file using file_download API + logger.info("\n📊 Logged Data Summary:") + logger.info("=" * 80) + + # Download the log file from the workspace using the file download API + import json + import tempfile + + with tempfile.NamedTemporaryFile( + mode="w", suffix=".json", delete=False + ) as tmp_file: + local_path = tmp_file.name + + download_result = workspace.file_download( + source_path="/tmp/agent_data.json", + destination_path=local_path, + ) + + if download_result.success: + try: + with open(local_path) as f: + log_entries = json.load(f) + logger.info(f"Found {len(log_entries)} log entries:\n") + for i, entry in enumerate(log_entries, 1): + logger.info(f"Entry {i}:") + logger.info(f" Timestamp: {entry.get('timestamp', 'N/A')}") + logger.info(f" Level: {entry.get('level', 'N/A')}") + logger.info(f" Message: {entry.get('message', 'N/A')}") + if entry.get("data"): + logger.info(f" Data: {json.dumps(entry['data'], indent=4)}") + logger.info("") + except json.JSONDecodeError: + logger.info("Log file exists but couldn't parse JSON") + with open(local_path) as f: + logger.info(f"Raw content: {f.read()}") + finally: + # Clean up the temporary file + Path(local_path).unlink(missing_ok=True) + else: + logger.info("No log file found (agent may not have used the tool)") + if download_result.error: + logger.debug(f"Download error: {download_result.error}") + + logger.info("=" * 80) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + finally: + logger.info("\n🧹 Cleaning up conversation...") + conversation.close() + +logger.info("\n✅ Example completed successfully!") +logger.info("\nThis example demonstrated how to:") +logger.info("1. Create a custom tool that logs structured data to JSON") +logger.info("2. Build a simple base image with the custom tool") +logger.info("3. Use DockerDevWorkspace with base_image to build agent server on top") +logger.info("4. Enable dynamic tool registration on the server") +logger.info("5. Use the custom tool during agent execution") +logger.info("6. Read the logged data back from the workspace") +``` + +```bash Running the Example +# Build the custom base image first +cd examples/02_remote_agent_server/06_custom_tool +./build_custom_image.sh + +# Run the example +export LLM_API_KEY="your-api-key" +uv run python custom_tool_example.py +``` + + +## Next Steps + +- **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers + + +# Docker Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using `DockerWorkspace`. + +This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. + +Use `DockerWorkspace` with a pre-built agent server image for the fastest startup. When you need to build your own image from a base image, switch to `DockerDevWorkspace`. + +the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. + +## 1) Basic Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-docker-sandbox)! + +### Key Concepts + +#### DockerWorkspace Context Manager + +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: + +```python icon="python" +with DockerWorkspace( + # use pre-built image for faster startup (recommended) + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` + +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done + +#### Platform Detection + +The example includes platform detection to ensure the correct Docker image is built and used: + +```python icon="python" +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` + +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). + + +#### Testing the Workspace + +Before creating a conversation, the example tests the workspace connection: + +```python icon="python" +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed" + f"with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` + +This verifies the workspace is properly initialized and can execute commands. + +#### Automatic RemoteConversation + +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. + + +#### DockerWorkspace vs DockerDevWorkspace + +Use `DockerWorkspace` when you can rely on the official pre-built images for the agent server. Switch to `DockerDevWorkspace` when you need to build or customize the image on-demand (slower startup, requires the SDK source tree and Docker build support). + +```python icon="python" +# ✅ Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) + +# 🛠️ Custom: Build on the fly (requires SDK tooling) +DockerDevWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + target="source", +) +``` + +### Ready-tu-run Example Docker Sandbox + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically. Use `DockerWorkspace` with a pre-built +# image or `DockerDevWorkspace` to automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + + +--- + +## 2) VS Code in Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-vs-code)! + +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. + +### Key Concepts + +#### VS Code-Enabled DockerWorkspace + +The workspace is configured with extra ports for VS Code access: + +```python icon="python" focus={1, 5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=18010, + platform="linux/arm64", # or "linux/amd64" depending on your architecture + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" +``` + +The `extra_ports=True` setting exposes: +- Port `host_port+1`: VS Code Web interface (host_port + 1) +- Port `host_port+2`: VNC viewer for visual access + +If you need to customize the agent-server image, swap in `DockerDevWorkspace` with the same parameters and provide `base_image`/`target` to build on demand. + +#### VS Code URL Generation + +The example retrieves the VS Code URL with authentication token: + +```python icon="python" +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` + +This generates a properly authenticated URL with the workspace directory pre-opened. + +#### VS Code URL Format + +```text +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` +where: +- `vscode_port`: Usually host_port + 1 (e.g., 8011) +- `token`: Authentication token for security +- `workspace_dir`: Workspace directory to open + +### Ready-to-run Example VS Code + + +This example is available on GitHub: [examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py) + + + +```python icon="python" expandable examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py +import os +import platform +import time + +import httpx +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +# Create a Docker-based remote workspace with extra ports for VSCode access +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=18010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" + + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() + + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" + + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +--- + +## 3) Browser in Docker Sandbox +> A ready-to-run example is available [here](#ready-to-run-example-browser)! + +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. + +### Key Concepts + +#### Browser-Enabled DockerWorkspace + +The workspace is configured with extra ports for browser access: + +```python icon="python" focus={1-5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` + +The `extra_ports=True` setting exposes additional ports for: +- Port `host_port+1`: VS Code Web interface +- Port `host_port+2`: VNC viewer for browser visualization + +If you need to pre-build a custom browser image, replace `DockerWorkspace` with `DockerDevWorkspace` and provide `base_image`/`target` to build before launch. + + +#### Enabling Browser Tools + +Browser tools are enabled by setting `cli_mode=False`: + +```python icon="python" focus={2, 4} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` + +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. + +When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: + + +#### VNC Access + +The VNC interface provides real-time visual access to the browser: + +```text +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` + +- `autoconnect=1`: Automatically connect to VNC server +- `resize=remote`: Automatically adjust resolution + +--- + +### Ready-to-run Example Browser + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +This example shows how to configure `DockerWorkspace` with browser capabilities and VNC access: + +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# Create a Docker-based remote workspace with extra ports for browser access. +# Use `DockerWorkspace` with a pre-built image or `DockerDevWorkspace` to +# automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=8011, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" + + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + if os.getenv("CI"): + logger.info( + "CI environment detected; skipping interactive prompt and closing workspace." # noqa: E501 + ) + else: + # Wait for user confirm to exit when running locally + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +## Next Steps + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + + +# Local Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using `RemoteConversation`. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. + +## Key Concepts + +### Managed API Server + +The ready-to-run example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: + +```python icon="python" focus={1, 2, 4, 5} +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __enter__(self): + """Start the API server subprocess.""" + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) +``` + +The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. + +### Remote Workspace + +When connecting to a remote server, you need to provide a `Workspace` that connects to that server: + +```python icon="python" +workspace = Workspace(host=server.base_url) +result = workspace.execute_command("pwd") +``` + +When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). +The `Workspace` object communicates with the remote server's API to execute commands and manage files. + +### RemoteConversation + +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +`RemoteConversation` handles communication with the remote agent server over WebSocket for real-time event streaming. + +### Event Callbacks + +Callbacks receive events in real-time as they happen on the remote server: + +```python icon="python" +def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() +``` + +This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. + +### Conversation State + +The conversation state provides access to all events and status: + +```python icon="python" +# Count total events using state.events +total_events = len(conversation.state.events) +logger.info(f"📈 Total events in conversation: {total_events}") + +# Get recent events (last 5) using state.events +all_events = conversation.state.events +recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +``` + +This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +This example shows how to programmatically start a local agent server and interact with it through a `RemoteConversation`: + +```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py +import os +import subprocess +import sys +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger +from openhands.sdk.event import ConversationStateUpdateEvent +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def _stream_output(stream, prefix, target_stream): + """Stream output from subprocess to target stream with prefix.""" + try: + for line in iter(stream.readline, ""): + if line: + target_stream.write(f"[{prefix}] {line}") + target_stream.flush() + except Exception as e: + print(f"Error streaming {prefix}: {e}", file=sys.stderr) + finally: + stream.close() + + +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __init__(self, port: int = 8000, host: str = "127.0.0.1"): + self.port: int = port + self.host: str = host + self.process: subprocess.Popen[str] | None = None + self.base_url: str = f"http://{host}:{port}" + self.stdout_thread: threading.Thread | None = None + self.stderr_thread: threading.Thread | None = None + + def __enter__(self): + """Start the API server subprocess.""" + print(f"Starting OpenHands API server on {self.base_url}...") + + # Start the server process + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) + + # Start threads to stream stdout and stderr + assert self.process is not None + assert self.process.stdout is not None + assert self.process.stderr is not None + self.stdout_thread = threading.Thread( + target=_stream_output, + args=(self.process.stdout, "SERVER", sys.stdout), + daemon=True, + ) + self.stderr_thread = threading.Thread( + target=_stream_output, + args=(self.process.stderr, "SERVER", sys.stderr), + daemon=True, + ) + + self.stdout_thread.start() + self.stderr_thread.start() + + # Wait for server to be ready + max_retries = 30 + for i in range(max_retries): + try: + import httpx + + response = httpx.get(f"{self.base_url}/health", timeout=1.0) + if response.status_code == 200: + print(f"API server is ready at {self.base_url}") + return self + except Exception: + pass + + assert self.process is not None + if self.process.poll() is not None: + # Process has terminated + raise RuntimeError( + "Server process terminated unexpectedly. " + "Check the server logs above for details." + ) + + time.sleep(1) + + raise RuntimeError(f"Server failed to start after {max_retries} seconds") + + def __exit__(self, exc_type, exc_val, exc_tb): + """Stop the API server subprocess.""" + if self.process: + print("Stopping API server...") + self.process.terminate() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + print("Force killing API server...") + self.process.kill() + self.process.wait() + + # Wait for streaming threads to finish (they're daemon threads, + # so they'll stop automatically) + # But give them a moment to flush any remaining output + time.sleep(0.5) + print("API server stopped.") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +title_gen_llm = LLM( + usage_id="title-gen-llm", + model=os.getenv("LLM_MODEL", "openhands/gpt-5-mini-2025-08-07"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +# Use managed API server +with ManagedAPIServer(port=8001) as server: + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, # Disable browser tools for simplicity + ) + + # Define callbacks to test the WebSocket functionality + received_events = [] + event_tracker = {"last_event_time": time.time()} + + def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() + + # Create RemoteConversation with callbacks + # NOTE: Workspace is required for RemoteConversation + workspace = Workspace(host=server.base_url) + result = workspace.execute_command("pwd") + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + # Send first message and run + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + + # Generate title using a specific LLM + title = conversation.generate_title(max_length=60, llm=title_gen_llm) + logger.info(f"Generated conversation title: {title}") + + logger.info("🚀 Running conversation...") + conversation.run() + + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to stop coming (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - event_tracker["last_event_time"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Demonstrate state.events functionality + logger.info("\n" + "=" * 50) + logger.info("📊 Demonstrating State Events API") + logger.info("=" * 50) + + # Count total events using state.events + total_events = len(conversation.state.events) + logger.info(f"📈 Total events in conversation: {total_events}") + + # Get recent events (last 5) using state.events + logger.info("\n🔍 Getting last 5 events using state.events...") + all_events = conversation.state.events + recent_events = all_events[-5:] if len(all_events) >= 5 else all_events + + for i, event in enumerate(recent_events, 1): + event_type = type(event).__name__ + timestamp = getattr(event, "timestamp", "Unknown") + logger.info(f" {i}. {event_type} at {timestamp}") + + # Let's see what the actual event types are + logger.info("\n🔍 Event types found:") + event_types = set() + for event in recent_events: + event_type = type(event).__name__ + event_types.add(event_type) + for event_type in sorted(event_types): + logger.info(f" - {event_type}") + + # Print all ConversationStateUpdateEvent + logger.info("\n🗂️ ConversationStateUpdateEvent events:") + for event in conversation.state.events: + if isinstance(event, ConversationStateUpdateEvent): + logger.info(f" - {event}") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + finally: + # Clean up + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + + +# Overview +Source: https://docs.openhands.dev/sdk/guides/agent-server/overview + +Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. + + +For example, switching from a local workspace to a Docker‑based remote agent server: + +```python icon="python" lines +# Local → Docker +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import DockerWorkspace # [!code ++] +with DockerWorkspace( # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + +Use `DockerWorkspace` with the pre-built agent server image for the fastest startup. When you need to build from a custom base image, switch to [`DockerDevWorkspace`](/sdk/guides/agent-server/docker-sandbox). + +Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): + +```python icon="python" lines +# Local → Remote API +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import APIRemoteWorkspace # [!code ++] +with APIRemoteWorkspace( # [!code ++] + runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] + runtime_api_key="YOUR_API_KEY", # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + + +## What is a Remote Agent Server? + +A Remote Agent Server is an HTTP/WebSocket server that: +- **Package the Software Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) +- **Runs agents** on dedicated infrastructure +- **Manages workspaces** (Docker containers or remote sandboxes) +- **Streams events** to clients via WebSocket +- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details +- **Provides isolation** between different agent executions + +Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. + +{/* +Same interfaces as local: +[BaseConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[ConversationStateProtocol](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[EventsListBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: +[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). + */} + + +## Architecture Overview + +Remote Agent Servers follow a simple three-part architecture: + +```mermaid +graph TD + Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] + Server --> Workspace[Workspace] + + subgraph Workspace Types + Workspace --> Local[Local Folder] + Workspace --> Docker[Docker Container] + Workspace --> API[Remote Sandbox via API] + end + + Local --> Files[File System] + Docker --> Container[Isolated Runtime] + API --> Cloud[Cloud Infrastructure] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Workspace fill:#e8f5e8 +``` + +1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. +2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. +3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. + +The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. + +## How Remote Conversations Work + +Each step in the diagram maps directly to how the SDK and server interact: + +### 1. Workspace Connection → *(Client → Server)* + +When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: + +```python icon="python" +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) +``` + +This turns the local `Conversation` into a **[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. + + +### 2. Server Initialization → *(Server → Workspace)* + +Once the workspace starts: +- It launches the agent server process. +- Waits for it to be ready. +- Shares the server URL with the SDK client. + +You don’t need to manage this manually—the workspace context handles startup and teardown automatically. + +### 3. Event Streaming → *(Bidirectional WebSocket)* + +The client and agent server maintain a live WebSocket connection for streaming events: + +```python icon="python" +def on_event(event): + print(f"Received: {type(event).__name__}") + +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[on_event], +) +``` + +This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. + +### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* + +Workspace supports file and command operations via the agent server API ([base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: + +```python icon="python" +workspace.file_upload(local_path, remote_path) +workspace.file_download(remote_path, local_path) +result = workspace.execute_command("ls -la") +print(result.stdout) +``` + +These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. + +### Summary + +The architecture makes remote execution seamless: +- Your **client code** stays the same. +- The **agent server** manages execution and streaming. +- The **workspace** provides secure, isolated runtime environments. + +Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. + +## Next Steps + +Explore different deployment options: + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Run agent server in the same process +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run agent server in isolated Docker containers +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted agent server via API + +For architectural details: +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment + + +# Stuck Detector +Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: + +1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) +2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) +3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) +4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) +5. **Context Window Errors**: Repeated context window errors that indicate memory management issues + +When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. + + + For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). + + + +## How It Works + +In the [ready-to-run example](#ready-to-run-example), the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` +command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: + +1. The conversation proceeds normally until the agent starts repeating actions +2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck +3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action + +The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the +stuck status at any point using `conversation.stuck_detector.is_stuck()`. + +## Pattern Detection + +The stuck detector compares events based on their semantic content rather than object identity. For example: +- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) +- **Observations** are compared by their observation content and tool name +- **Errors** are compared by their error messages +- **Messages** are compared by their content and source + +This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) + +llm_messages = [] + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) + +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) + +# Run the conversation - stuck detection happens automatically +conversation.run() + +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control +- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK + + +# Theory of Mind (TOM) Agent +Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +Tom (Theory of Mind) Agent provides advanced user understanding capabilities that help your agent interpret vague instructions and adapt to user preferences over time. Built on research in user mental modeling, Tom agents can: + +- Understand unclear or ambiguous user requests +- Provide personalized guidance based on user modeling +- Build long-term user preference profiles +- Adapt responses based on conversation history + +This is particularly useful when: +- User instructions are vague or incomplete +- You need to infer user intent from minimal context +- Building personalized experiences across multiple conversations +- Understanding user preferences and working patterns + +## Research Foundation + +Tom agent is based on the TOM-SWE research paper on user mental modeling for software engineering agents: + +```bibtex Citation +@misc{zhou2025tomsweusermentalmodeling, + title={TOM-SWE: User Mental Modeling For Software Engineering Agents}, + author={Xuhui Zhou and Valerie Chen and Zora Zhiruo Wang and Graham Neubig and Maarten Sap and Xingyao Wang}, + year={2025}, + eprint={2510.21903}, + archivePrefix={arXiv}, + primaryClass={cs.SE}, + url={https://arxiv.org/abs/2510.21903}, +} +``` + + +Paper: [TOM-SWE on arXiv](https://arxiv.org/abs/2510.21903) + + +## Quick Start + + +This example is available on GitHub: [examples/01_standalone_sdk/30_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/30_tom_agent.py) + + +```python icon="python" expandable examples/01_standalone_sdk/30_tom_agent.py +"""Example demonstrating Tom agent with Theory of Mind capabilities. + +This example shows how to set up an agent with Tom tools for getting +personalized guidance based on user modeling. Tom tools include: +- TomConsultTool: Get guidance for vague or unclear tasks +- SleeptimeComputeTool: Index conversations for user modeling +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.tool import Tool +from openhands.tools.preset.default import get_default_tools +from openhands.tools.tom_consult import ( + SleeptimeComputeAction, + SleeptimeComputeObservation, + SleeptimeComputeTool, + TomConsultTool, +) + + +# Configure LLM +api_key: str | None = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm: LLM = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), + usage_id="agent", + drop_params=True, +) + +# Build tools list with Tom tools +# Note: Tom tools are automatically registered on import (PR #862) +tools = get_default_tools(enable_browser=False) + +# Configure Tom tools with parameters +tom_params: dict[str, bool | str] = { + "enable_rag": True, # Enable RAG in Tom agent +} + +# Add LLM configuration for Tom tools (uses same LLM as main agent) +tom_params["llm_model"] = llm.model +if llm.api_key: + if isinstance(llm.api_key, SecretStr): + tom_params["api_key"] = llm.api_key.get_secret_value() + else: + tom_params["api_key"] = llm.api_key +if llm.base_url: + tom_params["api_base"] = llm.base_url + +# Add both Tom tools to the agent +tools.append(Tool(name=TomConsultTool.name, params=tom_params)) +tools.append(Tool(name=SleeptimeComputeTool.name, params=tom_params)) + +# Create agent with Tom capabilities +# This agent can consult Tom for personalized guidance +# Note: Tom's user modeling data will be stored in ~/.openhands/ +agent: Agent = Agent(llm=llm, tools=tools) + +# Start conversation +cwd: str = os.getcwd() +PERSISTENCE_DIR = os.path.expanduser("~/.openhands") +CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations") +conversation = Conversation( + agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR +) + +# Optionally run sleeptime compute to index existing conversations +# This builds user preferences and patterns from conversation history +# Using execute_tool allows running tools before conversation.run() +print("\nRunning sleeptime compute to index conversations...") +try: + sleeptime_result = conversation.execute_tool( + "sleeptime_compute", SleeptimeComputeAction() + ) + # Cast to the expected observation type for type-safe access + if isinstance(sleeptime_result, SleeptimeComputeObservation): + print(f"Result: {sleeptime_result.message}") + print(f"Sessions processed: {sleeptime_result.sessions_processed}") + else: + print(f"Result: {sleeptime_result.text}") +except KeyError as e: + print(f"Tool not available: {e}") + +# Send a potentially vague message where Tom consultation might help +conversation.send_message( + "I need to debug some code but I'm not sure where to start. " + + "Can you help me figure out the best approach?" +) +conversation.run() + +print("\n" + "=" * 80) +print("Tom agent consultation example completed!") +print("=" * 80) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") + + +# Optional: Index this conversation for Tom's user modeling +# This builds user preferences and patterns from conversation history +# Uncomment the lines below to index the conversation: +# +# conversation.send_message("Please index this conversation using sleeptime_compute") +# conversation.run() +# print("\nConversation indexed for user modeling!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Tom Tools + +### TomConsultTool + +The consultation tool provides personalized guidance when the agent encounters vague or unclear user requests: + +```python icon="python" +# The agent can automatically call this tool when needed +# Example: User says "I need to debug something" +# Tom analyzes the vague request and provides specific guidance +``` + +Key features: +- Analyzes conversation history for context +- Provides personalized suggestions based on user modeling +- Helps disambiguate vague instructions +- Adapts to user communication patterns + +### SleeptimeComputeTool + +The indexing tool processes conversation history to build user preference profiles: + +```python icon="python" +# Index conversations for future personalization +sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute") +if sleeptime_compute_tool: + result = sleeptime_compute_tool.executor( + SleeptimeComputeAction(), conversation + ) +``` + +Key features: +- Processes conversation history into user models +- Stores preferences in `~/.openhands/` directory +- Builds understanding of user patterns over time +- Enables long-term personalization across sessions + +## Configuration + +### RAG Support + +Enable retrieval-augmented generation for enhanced context awareness: + +```python icon="python" +tom_params = { + "enable_rag": True, # Enable RAG for better context retrieval +} +``` + +### Custom LLM for Tom + +You can optionally use a different LLM for Tom's internal reasoning: + +```python icon="python" +# Use the same LLM as main agent +tom_params["llm_model"] = llm.model +tom_params["api_key"] = llm.api_key.get_secret_value() + +# Or configure a separate LLM for Tom +tom_llm = LLM(model="gpt-4", api_key=SecretStr("different-key")) +tom_params["llm_model"] = tom_llm.model +tom_params["api_key"] = tom_llm.api_key.get_secret_value() +``` + +## Data Storage + +Tom stores user modeling data persistently in `~/.openhands/`: + + + + + + + + + + + + + + + + + +where +- `user_models/` stores user preference profiles, with each user having their own subdirectory containing `user_model.json` (the current user model). +- `conversations/` contains indexed conversation data + +This persistent storage enables Tom to: +- Remember user preferences across sessions +- Track which conversations have been indexed +- Build long-term understanding of user patterns + +## Use Cases + +### 1. Handling Vague Requests + +When a user provides minimal information: + +```python icon="python" +conversation.send_message("Help me with that bug") +# Tom analyzes history to determine which bug and suggest approach +``` + +### 2. Personalized Recommendations + +Tom adapts suggestions based on past interactions: + +```python icon="python" +# After multiple conversations, Tom learns: +# - User prefers minimal explanations +# - User typically works with Python +# - User values efficiency over verbosity +``` + +### 3. Intent Inference + +Understanding what the user really wants: + +```python icon="python" +conversation.send_message("Make it better") +# Tom infers from context what "it" is and how to improve it +``` + +## Best Practices + +1. **Enable RAG**: For better context awareness, always enable RAG: + ```python icon="python" + tom_params = {"enable_rag": True} + ``` + +2. **Index Regularly**: Run sleeptime compute after important conversations to build better user models + +3. **Provide Context**: Even with Tom, providing more context leads to better results + +4. **Monitor Data**: Check `~/.openhands/` periodically to understand what's being learned + +5. **Privacy Considerations**: Be aware that conversation data is stored locally for user modeling + +## Next Steps + +- **[Agent Delegation](/sdk/guides/agent-delegation)** - Combine Tom with sub-agents for complex workflows +- **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively +- **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights + + +# Browser Session Recording +Source: https://docs.openhands.dev/sdk/guides/browser-session-recording + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The browser session recording feature allows you to capture your agent's browser interactions and replay them later using [rrweb](https://github.com/rrweb-io/rrweb). This is useful for debugging, auditing, and understanding how your agent interacts with web pages. + +## How It Works + +The recording feature uses rrweb to capture DOM mutations, mouse movements, scrolling, and other browser events. The recordings are saved as JSON files that can be replayed using rrweb-player or the online viewer. + +The [ready-to-run example](#ready-to-run-example) demonstrates: + +1. **Starting a recording**: Use `browser_start_recording` to begin capturing browser events +2. **Browsing and interacting**: Navigate to websites and perform actions while recording +3. **Stopping the recording**: Use `browser_stop_recording` to stop and save the recording + +The recording files are automatically saved to the persistence directory when the recording is stopped. + +## Replaying Recordings + +After recording a session, you can replay it using: + +- **rrweb-player**: A standalone player component - [GitHub](https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player) +- **Online viewer**: Upload your recording at [rrweb.io/demo](https://www.rrweb.io/) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/38_browser_session_recording.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/38_browser_session_recording.py) + + +```python icon="python" expandable examples/01_standalone_sdk/38_browser_session_recording.py +"""Browser Session Recording Example + +This example demonstrates how to use the browser session recording feature +to capture and save a recording of the agent's browser interactions using rrweb. + +The recording can be replayed later using rrweb-player to visualize the agent's +browsing session. + +The recording will be automatically saved to the persistence directory when +browser_stop_recording is called. You can replay it with: + - rrweb-player: https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player + - Online viewer: https://www.rrweb.io/ +""" + +import json +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.browser_use.definition import BROWSER_RECORDING_OUTPUT_DIR + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - including browser tools with recording capability +cwd = os.getcwd() +tools = [ + Tool(name=BrowserToolSet.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with persistence_dir set to save browser recordings +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir="./.conversations", +) + +# The prompt instructs the agent to: +# 1. Start recording the browser session +# 2. Browse to a website and perform some actions +# 3. Stop recording (auto-saves to file) +PROMPT = """ +Please complete the following task to demonstrate browser session recording: + +1. First, use `browser_start_recording` to begin recording the browser session. + +2. Then navigate to https://docs.openhands.dev/ and: + - Get the page content + - Scroll down the page + - Get the browser state to see interactive elements + +3. Next, navigate to https://docs.openhands.dev/openhands/usage/cli/installation and: + - Get the page content + - Scroll down to see more content + +4. Finally, use `browser_stop_recording` to stop the recording. + Events are automatically saved. +""" + +print("=" * 80) +print("Browser Session Recording Example") +print("=" * 80) +print("\nTask: Record an agent's browser session and save it for replay") +print("\nStarting conversation with agent...\n") + +conversation.send_message(PROMPT) +conversation.run() + +print("\n" + "=" * 80) +print("Conversation finished!") +print("=" * 80) + +# Check if the recording files were created +# Recordings are saved in BROWSER_RECORDING_OUTPUT_DIR/recording-{timestamp}/ +if os.path.exists(BROWSER_RECORDING_OUTPUT_DIR): + # Find recording subdirectories (they start with "recording-") + recording_dirs = sorted( + [ + d + for d in os.listdir(BROWSER_RECORDING_OUTPUT_DIR) + if d.startswith("recording-") + and os.path.isdir(os.path.join(BROWSER_RECORDING_OUTPUT_DIR, d)) + ] + ) + + if recording_dirs: + # Process the most recent recording directory + latest_recording = recording_dirs[-1] + recording_path = os.path.join(BROWSER_RECORDING_OUTPUT_DIR, latest_recording) + json_files = sorted( + [f for f in os.listdir(recording_path) if f.endswith(".json")] + ) + + print(f"\n✓ Recording saved to: {recording_path}") + print(f"✓ Number of files: {len(json_files)}") + + # Count total events across all files + total_events = 0 + all_event_types: dict[int | str, int] = {} + total_size = 0 + + for json_file in json_files: + filepath = os.path.join(recording_path, json_file) + file_size = os.path.getsize(filepath) + total_size += file_size + + with open(filepath) as f: + events = json.load(f) + + # Events are stored as a list in each file + if isinstance(events, list): + total_events += len(events) + for event in events: + event_type = event.get("type", "unknown") + all_event_types[event_type] = all_event_types.get(event_type, 0) + 1 + + print(f" - {json_file}: {len(events)} events, {file_size} bytes") + + print(f"✓ Total events: {total_events}") + print(f"✓ Total size: {total_size} bytes") + if all_event_types: + print(f"✓ Event types: {all_event_types}") + + print("\nTo replay this recording, you can use:") + print( + " - rrweb-player: " + "https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player" + ) + else: + print(f"\n✗ No recording directories found in: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") +else: + print(f"\n✗ Observations directory not found: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") + +print("\n" + "=" * 100) +print("Conversation finished.") +print(f"Total LLM messages: {len(llm_messages)}") +print("=" * 100) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"Conversation ID: {conversation.id}") +print(f"EXAMPLE_COST: {cost}") +``` + + + + +# Context Condenser +Source: https://docs.openhands.dev/sdk/guides/context-condenser + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## What is a Context Condenser? + +A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: + +- **💰 Increased API Costs**: More tokens in the context means higher costs per API call +- **⏱️ Slower Response Times**: Larger contexts take longer to process +- **📉 Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information + +The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. + +## Default Implementation: `LLMSummarizingCondenser` + +OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. + +### How It Works + +When conversation history exceeds a defined threshold, the LLM-based condenser: + +1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context +2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained +3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries +4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction + +{/* Auto-switching light/dark mode image. */} +Light mode interface +Dark mode interface + +This approach achieves remarkable efficiency gains: +- Up to **2x reduction** in per-turn API costs +- **Consistent response times** even in long sessions +- **Equivalent or better performance** on software engineering tasks + +Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +### Extensibility + +The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): + +- **`RollingCondenser`** - For condensers that apply condensation to rolling history +- **`CondenserBase`** - For more specialized condensation strategies + +This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. + + +### Setting Up Condensing + +Create a `LLMSummarizingCondenser` to manage the context. +The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. + +This condenser triggers when there are more than `max_context_length` events in +the conversation history, and always keeps the first `keep_first` events (system prompts, +initial user messages) to preserve important context. + +```python focus={3-4} icon="python" +from openhands.sdk.context import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) +``` + +### Ready-to-run example + + +This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + + + +Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: + +```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py +""" +To manage context in long-running conversations, the agent can use a context condenser +that keeps the conversation history within a specified size limit. This example +demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes +older parts of the conversation when the history exceeds a defined threshold. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context.condenser import LLMSummarizingCondenser +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Create a condenser to manage the context. The condenser will automatically truncate +# conversation history when it exceeds max_size, and replaces the dropped events with an +# LLM-generated summary. This condenser triggers when there are more than ten events in +# the conversation history, and always keeps the first two events (system prompts, +# initial user messages) to preserve important context. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +# Send multiple messages to demonstrate condensation +print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") + +conversation.send_message( + "Hello! Can you create a Python file named math_utils.py with functions for " + "basic arithmetic operations (add, subtract, multiply, divide)?" +) +conversation.run() + +conversation.send_message( + "Great! Now add a function to calculate the factorial of a number." +) +conversation.run() + +conversation.send_message("Add a function to check if a number is prime.") +conversation.run() + +conversation.send_message( + "Add a function to calculate the greatest common divisor (GCD) of two numbers." +) +conversation.run() + +conversation.send_message( + "Now create a test file to verify all these functions work correctly." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Finally, clean up by deleting both files.") +conversation.run() + +print("=" * 100) +print("Conversation finished with LLM Summarizing Condenser.") +print(f"Total LLM messages collected: {len(llm_messages)}") +print("\nThe condenser automatically summarized older conversation history") +print("when the conversation exceeded the configured max_size threshold.") +print("This helps manage context length while preserving important information.") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings + + +# Ask Agent Questions +Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use `ask_agent()` to get quick responses from the agent about the current conversation state without +interrupting the main execution flow. + +## Key Features + +The `ask_agent()` method provides several important capabilities: + +#### Context-Aware Responses + +The agent has access to the full conversation history when answering questions: + +```python focus={2-3} icon="python" wrap +# Agent can reference what it has done so far +response = conversation.ask_agent( + "Summarize the activity so far in 1 sentence." +) +print(f"Response: {response}") +``` + +#### Non-Intrusive Operation + +Questions don't interrupt the main conversation flow - they're processed separately: + +```python focus={4-6} icon="python" wrap +# Start main conversation +thread = threading.Thread(target=conversation.run) +thread.start() + +# Ask questions without affecting main execution +response = conversation.ask_agent("How's the progress?") +``` + +#### Works During and After Execution + +You can ask questions while the agent is running or after it has completed: + +```python focus={3,7} icon="python" wrap +# During execution +time.sleep(2) # Let agent start working +response1 = conversation.ask_agent("Have you finished running?") + +# After completion +thread.join() +response2 = conversation.ask_agent("What did you accomplish?") +``` + +### Use Cases + +- **Progress Monitoring**: Check on long-running tasks +- **Status Updates**: Get real-time information about agent activities +- **User Interfaces**: Provide sidebar information in chat applications + +## Ready-to-run Example + + + This example is available on GitHub: + [examples/01_standalone_sdk/28_ask_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/28_ask_agent_example.py) + + +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use `ask_agent()` to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. + +```python icon="python" expandable examples/01_standalone_sdk/28_ask_agent_example.py +""" +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use ask_agent() to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. +""" + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import Event +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + count = 0 + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT {self.count}] {type(event).__name__}") + self.count += 1 + + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation( + agent=agent, workspace=cwd, visualizer=MinimalVisualizer, max_iteration_per_run=5 +) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Ask Agent Example ===") +print("This example demonstrates asking questions during conversation execution") + +# Step 1: Build conversation context +print(f"\n[{timestamp()}] Building conversation context...") +conversation.send_message("Explore the current directory and describe the architecture") + +# Step 2: Start conversation in background thread +print(f"[{timestamp()}] Starting conversation in background thread...") +thread = threading.Thread(target=conversation.run) +thread.start() + +# Give the agent time to start processing +time.sleep(2) + +# Step 3: Use ask_agent while conversation is running +print(f"\n[{timestamp()}] Using ask_agent while conversation is processing...") + +# Ask context-aware questions +questions_and_responses = [] + +question_1 = "Summarize the activity so far in 1 sentence." +print(f"\n[{timestamp()}] Asking: {question_1}") +response1 = conversation.ask_agent(question_1) +questions_and_responses.append((question_1, response1)) +print(f"Response: {response1}") + +time.sleep(1) + +question_2 = "How's the progress?" +print(f"\n[{timestamp()}] Asking: {question_2}") +response2 = conversation.ask_agent(question_2) +questions_and_responses.append((question_2, response2)) +print(f"Response: {response2}") + +time.sleep(1) + +question_3 = "Have you finished running?" +print(f"\n[{timestamp()}] {question_3}") +response3 = conversation.ask_agent(question_3) +questions_and_responses.append((question_3, response3)) +print(f"Response: {response3}") + +# Step 4: Wait for conversation to complete +print(f"\n[{timestamp()}] Waiting for conversation to complete...") +thread.join() + +# Step 5: Verify conversation state wasn't affected +final_event_count = len(conversation.state.events) +# Step 6: Ask a final question after conversation completion +print(f"\n[{timestamp()}] Asking final question after completion...") +final_response = conversation.ask_agent( + "Can you summarize what you accomplished in this conversation?" +) +print(f"Final response: {final_response}") + +# Step 7: Summary +print("\n" + "=" * 60) +print("SUMMARY OF ASK_AGENT DEMONSTRATION") +print("=" * 60) + +print("\nQuestions and Responses:") +for i, (question, response) in enumerate(questions_and_responses, 1): + print(f"\n{i}. Q: {question}") + print(f" A: {response[:100]}{'...' if len(response) > 100 else ''}") + +final_truncated = final_response[:100] + ("..." if len(final_response) > 100 else "") +print(f"\nFinal Question Response: {final_truncated}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + + +## Next Steps + +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interrupt and redirect agent execution +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress + + +# Conversation with Async +Source: https://docs.openhands.dev/sdk/guides/convo-async + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Concurrent Agents + +Run multiple agent tasks in parallel using `asyncio.gather()`: + +```python icon="python" wrap +async def main(): + loop = asyncio.get_running_loop() + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Create multiple conversation tasks running in parallel + tasks = [ + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback) + ] + results = await asyncio.gather(*tasks) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) + + +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop + +```python icon="python" expandable examples/01_standalone_sdk/11_async.py +""" +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop +""" + +import asyncio +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.conversation.types import ConversationCallbackType +from openhands.sdk.tool import Tool +from openhands.sdk.utils.async_utils import AsyncCallbackWrapper +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +# Callback coroutine +async def callback_coro(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Synchronous run conversation +def run_conversation(callback: ConversationCallbackType): + conversation = Conversation(agent=agent, callbacks=[callback]) + + conversation.send_message( + "Hello! Can you create a new Python file named hello.py that prints " + "'Hello, World!'? Use task tracker to plan your steps." + ) + conversation.run() + + conversation.send_message("Great! Now delete that file.") + conversation.run() + + +async def main(): + loop = asyncio.get_running_loop() + + # Create the callback + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Run the conversation in a background thread and wait for it to finish... + await loop.run_in_executor(None, run_conversation, callback) + + print("=" * 100) + print("Conversation finished. Got the following LLM messages:") + for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + asyncio.run(main()) +``` + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + + +# Custom Visualizer +Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The SDK provides flexible visualization options. You can use the default rich-formatted visualizer, customize it with highlighting patterns, or build completely custom visualizers by subclassing `ConversationVisualizerBase`. + +## Visualizer Configuration Options + +The `visualizer` parameter in `Conversation` controls how events are displayed: + +```python icon="python" focus={4-5, 7-8, 10-11, 13, 18, 20, 25} +from openhands.sdk import Conversation +from openhands.sdk.conversation import DefaultConversationVisualizer, ConversationVisualizerBase + +# Option 1: Use default visualizer (enabled by default) +conversation = Conversation(agent=agent, workspace=workspace) + +# Option 2: Disable visualization +conversation = Conversation(agent=agent, workspace=workspace, visualizer=None) + +# Option 3: Pass a visualizer class (will be instantiated automatically) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=DefaultConversationVisualizer) + +# Option 4: Pass a configured visualizer instance +custom_viz = DefaultConversationVisualizer( + name="MyAgent", + highlight_regex={r"^Reasoning:": "bold cyan"} +) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=custom_viz) + +# Option 5: Use custom visualizer class +class MyVisualizer(ConversationVisualizerBase): + def on_event(self, event): + print(f"Event: {event}") + +conversation = Conversation(agent=agent, workspace=workspace, visualizer=MyVisualizer()) +``` + +## Customizing the Default Visualizer + +`DefaultConversationVisualizer` uses Rich panels and supports customization through configuration: + +```python icon="python" focus={3-14, 19} +from openhands.sdk.conversation import DefaultConversationVisualizer + +# Configure highlighting patterns using regex +custom_visualizer = DefaultConversationVisualizer( + name="MyAgent", # Prefix panel titles with agent name + highlight_regex={ + r"^Reasoning:": "bold cyan", # Lines starting with "Reasoning:" + r"^Thought:": "bold green", # Lines starting with "Thought:" + r"^Action:": "bold yellow", # Lines starting with "Action:" + r"\[ERROR\]": "bold red", # Error markers anywhere + r"\*\*(.*?)\*\*": "bold", # Markdown bold **text** + }, + skip_user_messages=False, # Show user messages +) + +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=custom_visualizer +) +``` + +**When to use**: Perfect for customizing colors and highlighting without changing the panel-based layout. + +## Creating Custom Visualizers + +For complete control over visualization, subclass `ConversationVisualizerBase`: + +```python icon="python" focus={4, 11, 28} +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import ActionEvent, ObservationEvent, AgentErrorEvent, Event + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that prints raw event information.""" + + def __init__(self, name: str | None = None): + super().__init__(name=name) + self.step_count = 0 + + def on_event(self, event: Event) -> None: + """Handle each event.""" + if isinstance(event, ActionEvent): + self.step_count += 1 + tool_name = event.tool_name or "unknown" + print(f"Step {self.step_count}: {tool_name}") + + elif isinstance(event, ObservationEvent): + print(f" → Result received") + + elif isinstance(event, AgentErrorEvent): + print(f"❌ Error: {event.error}") + +# Use your custom visualizer +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=MinimalVisualizer(name="Agent") +) +``` + +### Key Methods + +**`__init__(self, name: str | None = None)`** +- Initialize your visualizer with optional configuration +- `name` parameter is available from the base class for agent identification +- Call `super().__init__(name=name)` to initialize the base class + +**`initialize(self, state: ConversationStateProtocol)`** +- Called automatically by `Conversation` after state is created +- Provides access to conversation state and statistics via `self._state` +- Override if you need custom initialization, but call `super().initialize(state)` + +**`on_event(self, event: Event)`** *(required)* +- Called for each conversation event +- Implement your visualization logic here +- Access conversation stats via `self.conversation_stats` property + +**When to use**: When you need a completely different output format, custom state tracking, or integration with external systems. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/26_custom_visualizer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/26_custom_visualizer.py) + + +```python icon="python" expandable examples/01_standalone_sdk/26_custom_visualizer.py +"""Custom Visualizer Example + +This example demonstrates how to create and use a custom visualizer by subclassing +ConversationVisualizer. This approach provides: +- Clean, testable code with class-based state management +- Direct configuration (just pass the visualizer instance to visualizer parameter) +- Reusable visualizer that can be shared across conversations + +This demonstrates how you can pass a ConversationVisualizer instance directly +to the visualizer parameter for clean, reusable visualization logic. +""" + +import logging +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.conversation.visualizer import ConversationVisualizerBase +from openhands.sdk.event import ( + Event, +) +from openhands.tools.preset.default import get_default_agent + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="agent", +) +agent = get_default_agent(llm=llm, cli_mode=True) + +# ============================================================================ +# Configure Visualization +# ============================================================================ +# Set logging level to reduce verbosity +logging.getLogger().setLevel(logging.WARNING) + +# Start a conversation with custom visualizer +cwd = os.getcwd() +conversation = Conversation( + agent=agent, + workspace=cwd, + visualizer=MinimalVisualizer(), +) + +# Send a message and let the agent run +print("Sending task to agent...") +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("Task completed!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + +## Next Steps + +Now that you understand custom visualizers, explore these related topics: + +- **[Events](/sdk/arch/events)** - Learn more about different event types +- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic + + +# Pause and Resume +Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Pausing Execution + +Pause the agent from another thread or after a delay using `conversation.pause()`, and +Resume the paused conversation after performing operations by calling `conversation.run()` again. + +```python icon="python" focus={9, 15} wrap +import time +thread = threading.Thread(target=conversation.run) +thread.start() + +print("Letting agent work for 5 seconds...") +time.sleep(5) + +print("Pausing the agent...") +conversation.pause() + +print("Waiting for 5 seconds...") +time.sleep(5) + +print("Resuming the execution...") +conversation.run() +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + + +Pause agent execution mid-task by calling `conversation.pause()`: + +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) + +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() + +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) + +print(f"Initial status: {conversation.state.execution_status}") +print() + +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() + +# Let the agent work for a few seconds +print("Letting agent work for 2 seconds...") +time.sleep(2) + +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() + +# Wait for the thread to finish (it will stop when paused) +thread.join() + +print(f"Agent status after pause: {conversation.state.execution_status}") +print() + +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() + +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.execution_status}") + +# Resume execution +conversation.run() + +print(f"Final status: {conversation.state.execution_status}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + + +# Persistence +Source: https://docs.openhands.dev/sdk/guides/convo-persistence + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## How to use Persistence + +Save conversation state to disk and restore it later for long-running or multi-session workflows. + +### Saving State + +Create a conversation with a unique ID to enable persistence: + +```python focus={3-4,10-11} icon="python" wrap +import uuid + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` + +### Restoring State + +Restore a conversation using the same ID and persistence directory: + +```python focus={9-10} icon="python" +# Later, in a different session +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +conversation.send_message("Continue task") +conversation.run() # Continues from saved state +``` + +## What Gets Persisted + +The conversation state includes information that allows seamless restoration: + +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys +- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) + + + For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. + + +## Persistence Directory Structure + +When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each +conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` +(unless you specify a custom path). + +**Directory structure:** + + + + + + + + + + + + + + + + + + + + + +Each conversation directory contains: +- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata +- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) + +The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + + +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Reading serialized events + +Convert persisted events into LLM-ready messages for reuse or analysis. + + +This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) + + +```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py +"""Load persisted events and convert them into LLM-ready messages.""" + +import json +import os +import uuid +from pathlib import Path + +from pydantic import SecretStr + + +conversation_id = uuid.uuid4() +persistence_root = Path(".conversations") +log_dir = ( + persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +) + +os.environ.setdefault("LOG_JSON", "true") +os.environ.setdefault("LOG_TO_FILE", "true") +os.environ.setdefault("LOG_DIR", str(log_dir)) +os.environ.setdefault("LOG_LEVEL", "INFO") + +from openhands.sdk import ( # noqa: E402 + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + Tool, +) +from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 +from openhands.tools.terminal import TerminalTool # noqa: E402 + + +setup_logging(log_to_file=True, log_dir=str(log_dir)) +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +if not api_key: + raise RuntimeError("LLM_API_KEY environment variable is not set.") + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +###### +# Create a conversation that persists its events +###### + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + persistence_dir=str(persistence_root), + conversation_id=conversation_id, +) + +conversation.send_message( + "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " + "Reply with a short confirmation once done." +) +conversation.run() + +conversation.send_message( + "Without using any tools, summarize in one sentence what you did." +) +conversation.run() + +assert conversation.state.persistence_dir is not None +persistence_dir = Path(conversation.state.persistence_dir) +event_dir = persistence_dir / "events" + +event_paths = sorted(event_dir.glob("event-*.json")) + +if not event_paths: + raise RuntimeError("No event files found. Was persistence enabled?") + +###### +# Read from serialized events +###### + + +events = [Event.model_validate_json(path.read_text()) for path in event_paths] + +convertible_events = [ + event for event in events if isinstance(event, LLMConvertibleEvent) +] +llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) + +if llm.uses_responses_api(): + logger.info("Formatting messages for the OpenAI Responses API.") + instructions, input_items = llm.format_messages_for_responses(llm_messages) + logger.info("Responses instructions:\n%s", instructions) + logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) +else: + logger.info("Formatting messages for the OpenAI Chat Completions API.") + chat_messages = llm.format_messages_for_llm(llm_messages) + logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## How State Persistence Works + +The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. + +### Auto-Save Mechanism + +When you modify any public field on `ConversationState`, the SDK automatically: + +1. Detects the field change via a custom `__setattr__` implementation +2. Serializes the entire base state to `base_state.json` +3. Triggers any registered state change callbacks + +This happens transparently—you don't need to call any save methods manually. + +```python +# These changes are automatically persisted: +conversation.state.execution_status = ConversationExecutionStatus.RUNNING +conversation.state.max_iterations = 100 +``` + +### Events vs Base State + +The persistence system separates data into two categories: + +| Category | Storage | Contents | +|----------|---------|----------| +| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | +| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | + +Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: +- **Fast event appends**: No need to rewrite the entire history +- **Atomic state updates**: Base state is always consistent +- **Efficient restoration**: Events can be loaded lazily + + + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + + +# Send Message While Running +Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) + + +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: + +```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. + +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. + +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document + +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) + +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. + +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing + +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Send Message While Processing Example ===") + +# Step 1: Send initial message +start_time = timestamp() +conversation.send_message( + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +) + +# Step 2: Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() + +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() + + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") + + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") + + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Sending Messages During Execution + +As shown in the example above, use threading to send messages while the agent is running: + +```python icon="python" +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() +``` + +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion + +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + + +# Critic (Experimental) +Source: https://docs.openhands.dev/sdk/guides/critic + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +## What is a Critic? + +A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: + +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion +- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold + +You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. + + +This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. + + +## Quick Start + +When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. + +## Understanding Critic Results + +Critic evaluations produce scores and feedback: + +- **`score`**: Float between 0.0 and 1.0 representing predicted success probability +- **`message`**: Optional feedback with detailed probabilities +- **`success`**: Boolean property (True if score >= 0.5) + +Results are automatically displayed in the conversation visualizer: + +![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) + +### Accessing Results Programmatically + +```python icon="python" focus={4-7} +from openhands.sdk import Event, ActionEvent, MessageEvent + +def callback(event: Event): + if isinstance(event, (ActionEvent, MessageEvent)): + if event.critic_result is not None: + print(f"Critic score: {event.critic_result.score:.3f}") + print(f"Success: {event.critic_result.success}") + +conversation = Conversation(agent=agent, callbacks=[callback]) +``` + +## Iterative Refinement with a Critic + +The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. + +### How It Works + +1. Agent completes a task and calls `FinishAction` +2. Critic evaluates the result and produces a score +3. If score < `success_threshold`, a follow-up prompt is sent automatically +4. Agent continues working to address issues +5. Process repeats until score meets threshold or `max_iterations` is reached + +### Configuration + +Use `IterativeRefinementConfig` to enable automatic retries: + +```python icon="python" focus={1,4-7,12} +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig + +# Configure iterative refinement +iterative_config = IterativeRefinementConfig( + success_threshold=0.7, # Retry if score < 70% + max_iterations=3, # Maximum retry attempts +) + +# Attach to critic +critic = APIBasedCritic( + server_url="https://llm-proxy.eval.all-hands.dev/vllm", + api_key=api_key, + model_name="critic", + iterative_refinement=iterative_config, +) +``` + +### Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | +| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | + +### Custom Follow-up Prompts + +By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: + +```python icon="python" focus={4-12} +from openhands.sdk.critic.base import CriticBase, CriticResult + +class CustomCritic(APIBasedCritic): + def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: + score_percent = critic_result.score * 100 + return f""" +Your solution scored {score_percent:.1f}% (iteration {iteration}). + +Please review your work carefully: +1. Check that all requirements are met +2. Verify tests pass +3. Fix any issues and try again +""" +``` + +### Example Workflow + +Here's what happens during iterative refinement: + +``` +Iteration 1: + → Agent creates files, runs tests + → Agent calls FinishAction + → Critic evaluates: score = 0.45 (below 0.7 threshold) + → Follow-up prompt sent automatically + +Iteration 2: + → Agent reviews and fixes issues + → Agent calls FinishAction + → Critic evaluates: score = 0.72 (above threshold) + → ✅ Success! Conversation ends +``` + +## Troubleshooting + +### Critic Evaluations Not Appearing + +- Verify the critic is properly configured and passed to the Agent +- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) + +### API Authentication Errors + +- Verify `LLM_API_KEY` is set correctly +- Check that the API key has not expired + +### Iterative Refinement Not Triggering + +- Ensure `iterative_refinement` config is attached to the critic +- Check that `success_threshold` is set appropriately (higher values trigger more retries) +- Verify the agent is using `FinishAction` to complete tasks + +## Ready-to-run Example + + +The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) + + +This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. + +```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py +"""Iterative Refinement with Critic Model Example. + +This is EXPERIMENTAL. + +This example demonstrates how to use a critic model to shepherd an agent through +complex, multi-step tasks. The critic evaluates the agent's progress and provides +feedback that can trigger follow-up prompts when the agent hasn't completed the +task successfully. + +Key concepts demonstrated: +1. Setting up a critic with IterativeRefinementConfig for automatic retry +2. Conversation.run() automatically handles retries based on critic scores +3. Custom follow-up prompt generation via critic.get_followup_prompt() +4. Iterating until the task is completed successfully or max iterations reached + +For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured +using the same base_url with /vllm suffix and "critic" as the model name. +""" + +import os +import re +import tempfile +from pathlib import Path + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +from openhands.sdk.critic.base import CriticBase +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configuration +# Higher threshold (70%) makes it more likely the agent needs multiple iterations, +# which better demonstrates how iterative refinement works. +# Adjust as needed to see different behaviors. +SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) + + +def get_required_env(name: str) -> str: + value = os.getenv(name) + if value: + return value + raise ValueError( + f"Missing required environment variable: {name}. " + f"Set {name} before running this example." + ) + + +def get_default_critic(llm: LLM) -> CriticBase | None: + """Auto-configure critic for All-Hands LLM proxy. + + When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an + APIBasedCritic configured with: + - server_url: {base_url}/vllm + - api_key: same as LLM + - model_name: "critic" + + Args: + llm: The LLM instance to derive critic configuration from. + + Returns: + An APIBasedCritic if the LLM is configured for All-Hands proxy, + None otherwise. + + Example: + llm = LLM( + model="anthropic/claude-sonnet-4-5", + api_key=api_key, + base_url="https://llm-proxy.eval.all-hands.dev", + ) + critic = get_default_critic(llm) + if critic is None: + # Fall back to explicit configuration + critic = APIBasedCritic( + server_url="https://my-critic-server.com", + api_key="my-api-key", + model_name="my-critic-model", + ) + """ + base_url = llm.base_url + api_key = llm.api_key + if base_url is None or api_key is None: + return None + + # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) + pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" + if not re.match(pattern, base_url): + return None + + return APIBasedCritic( + server_url=f"{base_url.rstrip('/')}/vllm", + api_key=api_key, + model_name="critic", + ) + + +# Task prompt designed to be moderately complex with subtle requirements. +# The task is simple enough to complete in 1-2 iterations, but has specific +# requirements that are easy to miss - triggering critic feedback. +INITIAL_TASK_PROMPT = """\ +Create a Python word statistics tool called `wordstats` that analyzes text files. + +## Structure + +Create directory `wordstats/` with: +- `stats.py` - Main module with `analyze_file(filepath)` function +- `cli.py` - Command-line interface +- `tests/test_stats.py` - Unit tests + +## Requirements for stats.py + +The `analyze_file(filepath)` function must return a dict with these EXACT keys: +- `lines`: total line count (including empty lines) +- `words`: word count +- `chars`: character count (including whitespace) +- `unique_words`: count of unique words (case-insensitive) + +### Important edge cases (often missed!): +1. Empty files must return all zeros, not raise an exception +2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) +3. Numbers like "123" or "3.14" are NOT counted as words +4. Contractions like "don't" count as ONE word +5. File not found must raise FileNotFoundError with a clear message + +## Requirements for cli.py + +When run as `python cli.py `: +- Print each stat on its own line: "Lines: X", "Words: X", etc. +- Exit with code 1 if file not found, printing error to stderr +- Exit with code 0 on success + +## Required Tests (test_stats.py) + +Write tests that verify: +1. Basic counting on normal text +2. Empty file returns all zeros +3. Hyphenated words counted correctly +4. Numbers are excluded from word count +5. FileNotFoundError raised for missing files + +## Verification Steps + +1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` + +```bash Running the Example icon="terminal" +LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ + uv run python examples/01_standalone_sdk/34_critic_example.py +``` + +### Example Output + +``` +📁 Created workspace: /tmp/critic_demo_abc123 + +====================================================================== +🚀 Starting Iterative Refinement with Critic Model +====================================================================== +Success threshold: 70% +Max iterations: 3 + +... agent works on the task ... + +✓ Critic evaluation: score=0.758, success=True + +Created files: + - sample.txt + - wordstats/cli.py + - wordstats/stats.py + - wordstats/tests/test_stats.py + +EXAMPLE_COST: 0.0234 +``` + +## Next Steps + +- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior +- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics +- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns + + +# Custom Tools +Source: https://docs.openhands.dev/sdk/guides/custom-tools + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Understanding the Tool System + +The SDK's tool system is built around three core components: + +1. **Action** - Defines input parameters (what the tool accepts) +2. **Observation** - Defines output data (what the tool returns) +3. **Executor** - Implements the tool's logic (what the tool does) + +These components are tied together by a **ToolDefinition** that registers the tool with the agent. + +## Built-in Tools + +The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. + +```python icon="python" wrap +from openhands.tools import BashTool, FileEditorTool +from openhands.tools.preset import get_default_tools + +# Use specific tools +agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) + +# Or use preset +tools = get_default_tools() +agent = Agent(llm=llm, tools=tools) +``` + + +See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. + + +## Creating a Custom Tool + +Here's a minimal example of creating a custom grep tool: + + + + ### Define the Action + Defines input parameters (what the tool accepts) + + ```python icon="python" wrap + class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", + description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, + description="Optional glob to filter files (e.g. '*.py')" + ) + ``` + + + ### Define the Observation + Defines output data (what the tool returns) + + ```python icon="python" wrap + class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + ``` + + The to_llm_content() property formats observations for the LLM. + + + + ### Define the Executor + Implements the tool’s logic (what the tool does) + + ```python icon="python" wrap + class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__( + self, + action: GrepAction, + conversation=None, + ) -> GrepObservation: + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q}" + else: + cmd = f"grep -rHnE {pat} {root_q}" + cmd += " 2>/dev/null | head -100" + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" + # take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation( + matches=matches, + files=sorted(files), + count=len(matches), + ) + ``` + + + ### Finally, define the tool + ```python icon="python" wrap + class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """Custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, + conv_state, + terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get + working directory from. + terminal_executor: Optional terminal executor to reuse. + If not provided, a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + ``` + + + +## Good to know +### Tool Registration +Tools are registered using `register_tool()` and referenced by name: + +```python icon="python" wrap +# Register a simple tool class +register_tool("FileEditorTool", FileEditorTool) + +# Register a factory function that creates multiple tools +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +# Use registered tools by name +tools = [ + Tool(name="FileEditorTool"), + Tool(name="BashAndGrepToolSet"), +] +``` + +### Factory Functions +Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: + +```python icon="python" wrap +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create execute_bash and custom grep tools sharing one executor.""" + bash_executor = BashExecutor( + working_dir=conv_state.workspace.working_dir + ) + # Create and configure tools... + return [bash_tool, grep_tool] +``` + +### Shared Executors +Multiple tools can share executors for efficiency and state consistency: + +```python icon="python" wrap +bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) +bash_tool = execute_bash_tool.set_executor(executor=bash_executor) + +grep_executor = GrepExecutor(bash_executor) +grep_tool = ToolDefinition( + name="grep", + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, +) +``` + +## When to Create Custom Tools + +Create custom tools when you need to: +- Combine multiple operations into a single, structured interface +- Add typed parameters with validation +- Format complex outputs for LLM consumption +- Integrate with external APIs or services + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) + + +```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py +"""Advanced example showing explicit executor usage and custom grep tool.""" + +import os +import shlex +from collections.abc import Sequence + +from pydantic import Field, SecretStr + +from openhands.sdk import ( + LLM, + Action, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Observation, + TextContent, + ToolDefinition, + get_logger, +) +from openhands.sdk.tool import ( + Tool, + ToolExecutor, + register_tool, +) +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import ( + TerminalAction, + TerminalExecutor, + TerminalTool, +) + + +logger = get_logger(__name__) + +# --- Action / Observation --- + + +class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, description="Optional glob to filter files (e.g. '*.py')" + ) + + +class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + + +# --- Executor --- + + +class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" + else: + cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" + + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" — take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) + + +# Tool description +_GREP_DESCRIPTION = """Fast content search tool. +* Searches file contents using regular expressions +* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) +* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") +* Returns matching file paths sorted by modification time. +* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. +* Use this tool when you need to find files containing specific patterns +* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead +""" # noqa: E501 + + +# --- Tool Definition --- + + +class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """A custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, conv_state, terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get working directory from. + terminal_executor: Optional terminal executor to reuse. If not provided, + a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - demonstrating both simplified and advanced patterns +cwd = os.getcwd() + + +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create terminal and custom grep tools sharing one executor.""" + + terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) + # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) + terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] + + # Use the GrepTool.create() method with shared terminal_executor + grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] + + return [terminal_tool, grep_tool] + + +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +tools = [ + Tool(name=FileEditorTool.name), + Tool(name="BashAndGrepToolSet"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Hello! Can you use the grep tool to find all files " + "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 + "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers +- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation + + +# Assign Reviews +Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews + +> The reference workflow is available [here](#reference-workflow)! + +Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. + +## How it works + +It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. + +**Core Components:** +- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts +- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent + +**Prompt Options:** +1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) +2. **`PROMPT_LOCATION`** - URL or file path for external prompts + +The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. + +## Assign Reviews Use Case + +This specific implementation uses the basic action template to handle three PR management scenarios: + +**1. Need Reviewer Action** +- Identifies PRs waiting for review +- Notifies reviewers to take action + +**2. Need Author Action** +- Finds stale PRs with no activity for 5+ days +- Prompts authors to update, request review, or close + +**3. Need Reviewers** +- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) +- Uses git blame analysis to identify relevant contributors +- Automatically assigns reviewers based on file ownership and contribution history +- Balances reviewer workload across team members + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml + ``` + + + Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". + + + The default is: Daily at 12 PM UTC. + + + +## Features + +- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership +- **Automated Notifications** - Sends contextual reminders to reviewers and authors +- **Workload Balancing** - Distributes review requests evenly across team members +- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) + + +```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml +--- +# To set this up: +# 1. Change the name below to something relevant to your task +# 2. Modify the "env" section below with your prompt +# 3. Add your LLM_API_KEY to the repository secrets +# 4. Commit this file to your repository +# 5. Trigger the workflow manually or set up a schedule +name: Assign Reviews + +on: + # Manual trigger + workflow_dispatch: + # Scheduled trigger (disabled by default, uncomment and customize as needed) + schedule: + # Run at 12 PM UTC every day + - cron: 0 12 * * * + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + run-task: + runs-on: ubuntu-24.04 + env: + # Configuration (modify these values as needed) + AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py + # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both + # Option 1: Use a URL or file path for the prompt + PROMPT_LOCATION: '' + # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + # Option 2: Use direct text for the prompt + PROMPT_STRING: > + Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. + Read the sections below in order, and perform each in order. Do NOT take action + on the same issue or PR twice. + + # Issues with needs-info - Check for OP Response + + Find all open issues that have the "needs-info" label. For each issue: + 1. Identify the original poster (issue author) + 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added + 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline + and look for "labeled" events with the label "needs-info" + 4. If the original poster has commented after the label was added: + - Remove the "needs-info" label + - Add the "needs-triage" label + - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." + + # Issues with needs-triage + + Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last + activity: + 1. First, check if the issue has already been triaged by verifying it does NOT have: + - The "enhancement" label + - Any "priority" label (priority:low, priority:medium, priority:high, etc.) + 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label + 3. For issues that have NOT been triaged yet: + - Read the issue description and comments + - Determine if it requires maintainer attention by checking: + * Is it a bug report, feature request, or question? + * Does it have enough information to be actionable? + * Has a maintainer already commented? + * Is the last comment older than 4 days? + - If it needs maintainer attention and no maintainer has commented: + * Find an appropriate maintainer based on the issue topic and recent activity + * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have + a chance?" + + # Need Reviewer Action + + Find all open PRs where: + 1. The PR is waiting for review (there are no open review comments or change requests) + 2. The PR is in a "clean" state (CI passing, no merge conflicts) + 3. The PR is not marked as draft (draft: false) + 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. + + In this case, send a message to the reviewers: + [Automatic Post]: This PR seems to be currently waiting for review. + {reviewer_names}, could you please take a look when you have a chance? + + # Need Author Action + + Find all open PRs where the most recent change or comment was made on the pull + request more than 5 days ago (use 14 days if the PR is marked as draft). + + And send a message to the author: + + [Automatic Post]: It has been a while since there was any activity on this PR. + {author}, are you still working on it? If so, please go ahead, if not then + please request review, close it, or request that someone else follow up. + + # Need Reviewers + + Find all open pull requests that: + 1. Have no reviewers assigned to them. + 2. Are not marked as draft. + 3. Were created more than 1 day ago. + 4. CI is passing and there are no merge conflicts. + + For each of these pull requests, read the git blame information for the files, + and find the most recent and active contributors to the file/location of the changes. + Assign one of these people as a reviewer, but try not to assign too many reviews to + any single person. Add this message: + + [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. + Thanks in advance for the help! + + LLM_MODEL: + LLM_BASE_URL: + steps: + - name: Checkout repository + uses: actions/checkout@v5 + + - name: Set up Python + uses: actions/setup-python@v6 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v7 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Check required configuration + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + run: | + if [ -z "$LLM_API_KEY" ]; then + echo "Error: LLM_API_KEY secret is not set." + exit 1 + fi + + # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set + if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then + echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." + echo "Please provide only one in the env section of the workflow file." + exit 1 + fi + + if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then + echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." + echo "Please set one in the env section of the workflow file." + exit 1 + fi + + if [ -n "$PROMPT_LOCATION" ]; then + echo "Prompt location: $PROMPT_LOCATION" + else + echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" + fi + echo "LLM model: $LLM_MODEL" + if [ -n "$LLM_BASE_URL" ]; then + echo "LLM base URL: $LLM_BASE_URL" + fi + + - name: Run task + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + PYTHONPATH: '' + run: | + echo "Running agent script: $AGENT_SCRIPT_URL" + + # Download script if it's a URL + if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then + echo "Downloading agent script from URL..." + curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py + AGENT_SCRIPT_PATH="/tmp/agent_script.py" + else + AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" + fi + + # Run with appropriate prompt argument + if [ -n "$PROMPT_LOCATION" ]; then + echo "Using prompt from: $PROMPT_LOCATION" + uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" + else + echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" + uv run python "$AGENT_SCRIPT_PATH" + fi + + - name: Upload logs as artifact + uses: actions/upload-artifact@v4 + if: always() + with: + name: openhands-task-logs + path: | + *.log + output/ + retention-days: 7 +``` + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) + + +# PR Review +Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review + +> The reference workflow is available [here](#reference-workflow)! + +Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: +- Requesting `openhands-agent` as a reviewer +- Adding the `review-this` label to the PR + + +The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. + + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. (Optional) Create a "review-this" label in your repository +# Go to Issues → Labels → New label +# You can also trigger reviews by requesting "openhands-agent" as a reviewer +``` + +## Features + +- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR +- **Customizable** - Add your own code review guidelines without forking + +## Security + +- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. +- Maintainers need to read the PR to make sure it's safe to run. + +## Customizing the Code Review + +Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. + +### How It Works + +The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. + + +**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + + +### Example: Custom Code Review Skill + +Create `.agents/skills/custom-codereview-guide.md` in your repository: + +```markdown +--- +name: custom-codereview-guide +description: Project-specific review guidelines for MyProject +triggers: +- /codereview +--- + +# MyProject-Specific Review Guidelines + +In addition to general code review practices, check for: + +## Project Conventions + +- All API endpoints must have OpenAPI documentation +- Database migrations must be reversible +- Feature flags required for new features + +## Architecture Rules + +- No direct database access from controllers +- All external API calls must go through the gateway service + +## Communication Style + +- Be direct and constructive +- Use GitHub suggestion syntax for code fixes +``` + + +**Note**: These rules supplement the default `code-review` skill, not replace it. + + + +**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. + +If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. + + + +**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. + + +### Benefits of Custom Skills + +1. **No forking required**: Keep using the official SDK while customizing behavior +2. **Version controlled**: Your review guidelines live in your repository +3. **Easy updates**: SDK updates don't overwrite your customizations +4. **Team alignment**: Everyone uses the same review standards +5. **Composable**: Add project-specific rules alongside default guidelines + + +See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. + + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) + + +```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml +--- +# OpenHands PR Review Workflow +# +# To set this up: +# 1. Copy this file to .github/workflows/pr-review.yml in your repository +# 2. Add LLM_API_KEY to repository secrets +# 3. Customize the inputs below as needed +# 4. Commit this file to your repository +# 5. Trigger the review by either: +# - Adding the "review-this" label to any PR, OR +# - Requesting openhands-agent as a reviewer +# +# For more information, see: +# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review +name: PR Review by OpenHands + +on: + # Trigger when a label is added or a reviewer is requested + pull_request: + types: [labeled, review_requested] + +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + pr-review: + # Run when review-this label is added OR openhands-agent is requested as reviewer + if: | + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Checkout for composite action + uses: actions/checkout@v4 + with: + repository: OpenHands/software-agent-sdk + # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') + ref: main + sparse-checkout: .github/actions/pr-review + + - name: Run PR Review + uses: ./.github/actions/pr-review + with: + # LLM configuration + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-base-url: '' + # Review style: roasted (other option: standard) + review-style: roasted + # SDK version to use (version tag or branch name) + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) + + +# TODO Management +Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management + +> The reference workflow is available [here](#reference-workflow)! + + +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + ``` + + + Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `Settings → Actions → General → Workflow permissions` and enable: + - `Read and write permissions` + - `Allow GitHub Actions to create and approve pull requests` + + + Trigger the agent by adding TODO comments into your code. + + Example: `# TODO(openhands): Add input validation for user email` + + + The workflow is configurable and any identifier can be used in place of `TODO(openhands)` + + + + + +## Features + +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers + +## Best Practices + +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments +- **Review PRs** - Always review the generated PRs before merging + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + + +```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml +--- +# Automated TODO Management Workflow +# Make sure to replace and with +# appropriate values for your LLM setup. +# +# This workflow automatically scans for TODO(openhands) comments and creates +# pull requests to implement them using the OpenHands agent. +# +# Setup: +# 1. Add LLM_API_KEY to repository secrets +# 2. Ensure GITHUB_TOKEN has appropriate permissions +# 3. Make sure Github Actions are allowed to create and review PRs +# 4. Commit this file to .github/workflows/ in your repository +# 5. Configure the schedule or trigger manually + +name: Automated TODO Management + +on: + # Manual trigger + workflow_dispatch: + inputs: + max_todos: + description: Maximum number of TODOs to process in this run + required: false + default: '3' + type: string + todo_identifier: + description: TODO identifier to search for (e.g., TODO(openhands)) + required: false + default: TODO(openhands) + type: string + + # Trigger when 'automatic-todo' label is added to a PR + pull_request: + types: [labeled] + + # Scheduled trigger (disabled by default, uncomment and customize as needed) + # schedule: + # # Run every Monday at 9 AM UTC + # - cron: "0 9 * * 1" + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + scan-todos: + runs-on: ubuntu-latest + # Only run if triggered manually or if 'automatic-todo' label was added + if: > + github.event_name == 'workflow_dispatch' || + (github.event_name == 'pull_request' && + github.event.label.name == 'automatic-todo') + outputs: + todos: ${{ steps.scan.outputs.todos }} + todo-count: ${{ steps.scan.outputs.todo-count }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for better context + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Copy TODO scanner + run: | + cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py + chmod +x /tmp/scanner.py + + - name: Scan for TODOs + id: scan + run: | + echo "Scanning for TODO comments..." + + # Run the scanner and capture output + TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" + python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json + + # Count TODOs + TODO_COUNT=$(python -c \ + "import json; data=json.load(open('todos.json')); print(len(data))") + echo "Found $TODO_COUNT $TODO_IDENTIFIER items" + + # Limit the number of TODOs to process + MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" + if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then + echo "Limiting to first $MAX_TODOS TODOs" + python -c " + import json + data = json.load(open('todos.json')) + limited = data[:$MAX_TODOS] + json.dump(limited, open('todos.json', 'w'), indent=2) + " + TODO_COUNT=$MAX_TODOS + fi + + # Set outputs + echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT + echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT + + # Display found TODOs + echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY + if [ "$TODO_COUNT" -eq 0 ]; then + echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY + else + echo "Found $TODO_COUNT TODO(openhands) items:" \ + >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + python -c " + import json + data = json.load(open('todos.json')) + for i, todo in enumerate(data, 1): + print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + + f'{todo[\"description\"]}') + " >> $GITHUB_STEP_SUMMARY + fi + + process-todos: + needs: scan-todos + if: needs.scan-todos.outputs.todo-count > 0 + runs-on: ubuntu-latest + strategy: + matrix: + todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} + max-parallel: 1 # Process one TODO at a time to avoid conflicts + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + token: ${{ secrets.GITHUB_TOKEN }} + + - name: Switch to feature branch with TODO management files + run: | + git checkout openhands/todo-management-example + git pull origin openhands/todo-management-example + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Copy agent files + run: | + cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py + cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py + chmod +x agent.py + + - name: Configure Git + run: | + git config --global user.name "openhands-bot" + git config --global user.email \ + "openhands-bot@users.noreply.github.com" + + - name: Process TODO + env: + LLM_MODEL: + LLM_BASE_URL: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_REPOSITORY: ${{ github.repository }} + TODO_FILE: ${{ matrix.todo.file }} + TODO_LINE: ${{ matrix.todo.line }} + TODO_DESCRIPTION: ${{ matrix.todo.description }} + PYTHONPATH: '' + run: | + echo "Processing TODO: $TODO_DESCRIPTION" + echo "File: $TODO_FILE:$TODO_LINE" + + # Create a unique branch name for this TODO + BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ + sed 's/[^a-zA-Z0-9]/-/g' | \ + sed 's/--*/-/g' | \ + sed 's/^-\|-$//g' | \ + tr '[:upper:]' '[:lower:]' | \ + cut -c1-50)" + echo "Branch name: $BRANCH_NAME" + + # Create and switch to new branch (force create if exists) + git checkout -B "$BRANCH_NAME" + + # Run the agent to process the TODO + # Stay in repository directory for git operations + + # Create JSON payload for the agent + TODO_JSON=$(cat <&1 | tee agent_output.log + AGENT_EXIT_CODE=$? + set -e + + echo "Agent exit code: $AGENT_EXIT_CODE" + echo "Agent output log:" + cat agent_output.log + + # Show files in working directory + echo "Files in working directory:" + ls -la + + # If agent failed, show more details + if [ $AGENT_EXIT_CODE -ne 0 ]; then + echo "Agent failed with exit code $AGENT_EXIT_CODE" + echo "Last 50 lines of agent output:" + tail -50 agent_output.log + exit $AGENT_EXIT_CODE + fi + + # Check if any changes were made + cd "$GITHUB_WORKSPACE" + if git diff --quiet; then + echo "No changes made by agent, skipping PR creation" + exit 0 + fi + + # Commit changes + git add -A + git commit -m "Implement TODO: $TODO_DESCRIPTION + + Automatically implemented by OpenHands agent. + + Co-authored-by: openhands " + + # Push branch + git push origin "$BRANCH_NAME" + + # Create pull request + PR_TITLE="Implement TODO: $TODO_DESCRIPTION" + PR_BODY="## 🤖 Automated TODO Implementation + + This PR automatically implements the following TODO: + + **File:** \`$TODO_FILE:$TODO_LINE\` + **Description:** $TODO_DESCRIPTION + + ### Implementation + The OpenHands agent has analyzed the TODO and implemented the + requested functionality. + + ### Review Notes + - Please review the implementation for correctness + - Test the changes in your development environment + - The original TODO comment will be updated with this PR URL + once merged + + --- + *This PR was created automatically by the TODO Management workflow.*" + + # Create PR using GitHub CLI or API + curl -X POST \ + -H "Authorization: token $GITHUB_TOKEN" \ + -H "Accept: application/vnd.github.v3+json" \ + "https://api.github.com/repos/${{ github.repository }}/pulls" \ + -d "{ + \"title\": \"$PR_TITLE\", + \"body\": \"$PR_BODY\", + \"head\": \"$BRANCH_NAME\", + \"base\": \"${{ github.ref_name }}\" + }" + + summary: + needs: [scan-todos, process-todos] + if: always() + runs-on: ubuntu-latest + steps: + - name: Generate Summary + run: | + echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + + TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" + echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY + + if [ "$TODO_COUNT" -gt 0 ]; then + echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "Check the pull requests created for each TODO" \ + "implementation." >> $GITHUB_STEP_SUMMARY + else + echo "**Status:** ℹ️ No TODOs found to process" \ + >> $GITHUB_STEP_SUMMARY + fi + + echo "" >> $GITHUB_STEP_SUMMARY + echo "---" >> $GITHUB_STEP_SUMMARY + echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY +``` + +## Related Documentation + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) + + +# Hello World +Source: https://docs.openhands.dev/sdk/guides/hello-world + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Your First Agent + +This is the most basic example showing how to set up and run an OpenHands agent. + + + + ### LLM Configuration + + Configure the language model that will power your agent: + ```python icon="python" + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, # Optional + service_id="agent" + ) + ``` + + + ### Select an Agent + Use the preset agent with common built-in tools: + ```python icon="python" + agent = get_default_agent(llm=llm, cli_mode=True) + ``` + The default agent includes `BashTool`, `FileEditorTool`, etc. + + For the complete list of available tools see the + [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). + + + + + ### Start a Conversation + Start a conversation to manage the agent's lifecycle: + ```python icon="python" + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message( + "Write 3 facts about the current project into FACTS.txt." + ) + conversation.run() + ``` + + + ### Expected Behavior + When you run this example: + 1. The agent analyzes the current directory + 2. Gathers information about the project + 3. Creates `FACTS.txt` with 3 relevant facts + 4. Completes and exits + + Example output file: + + ```text icon="text" wrap + FACTS.txt + --------- + 1. This is a Python project using the OpenHands Software Agent SDK. + 2. The project includes examples demonstrating various agent capabilities. + 3. The SDK provides tools for file manipulation, bash execution, and more. + ``` + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) + + +```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs +- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage + + +# Hooks +Source: https://docs.openhands.dev/sdk/guides/hooks + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: +- Logging and analytics +- Emitting custom metrics +- Auditing or compliance +- Tracing and debugging + +## Hook Types + +| Hook | When it runs | Can block? | +|------|--------------|------------| +| PreToolUse | Before tool execution | Yes (exit 2) | +| PostToolUse | After tool execution | No | +| UserPromptSubmit | Before processing user message | Yes (exit 2) | +| Stop | When agent tries to finish | Yes (exit 2) | +| SessionStart | When conversation starts | No | +| SessionEnd | When conversation ends | No | + +## Key Concepts + +- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution +- Isolation: hooks run outside the agent loop logic, avoiding core modifications +- Composition: enable or disable hooks per environment (local vs. prod) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) + + +```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py +"""OpenHands Agent SDK — Hooks Example + +Demonstrates the OpenHands hooks system. +Hooks are shell scripts that run at key lifecycle events: + +- PreToolUse: Block dangerous commands before execution +- PostToolUse: Log tool usage after execution +- UserPromptSubmit: Inject context into user messages +- Stop: Enforce task completion criteria + +The hook scripts are in the scripts/ directory alongside this file. +""" + +import os +import signal +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher +from openhands.tools.preset.default import get_default_agent + + +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + +SCRIPT_DIR = Path(__file__).parent / "hook_scripts" + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create temporary workspace with git repo +with tempfile.TemporaryDirectory() as tmpdir: + workspace = Path(tmpdir) + os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") + + log_file = workspace / "tool_usage.log" + summary_file = workspace / "summary.txt" + + # Configure hooks using the typed approach (recommended) + # This provides better type safety and IDE support + hook_config = HookConfig( + pre_tool_use=[ + HookMatcher( + matcher="terminal", + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "block_dangerous.sh"), + timeout=10, + ) + ], + ) + ], + post_tool_use=[ + HookMatcher( + matcher="*", + hooks=[ + HookDefinition( + command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), + timeout=5, + ) + ], + ) + ], + user_prompt_submit=[ + HookMatcher( + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "inject_git_context.sh"), + ) + ], + ) + ], + stop=[ + HookMatcher( + hooks=[ + HookDefinition( + command=( + f"SUMMARY_FILE={summary_file} " + f"{SCRIPT_DIR / 'require_summary.sh'}" + ), + ) + ], + ) + ], + ) + + # Alternative: You can also use .from_dict() for loading from JSON config files + # Example with a single hook matcher: + # hook_config = HookConfig.from_dict({ + # "hooks": { + # "PreToolUse": [{ + # "matcher": "terminal", + # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] + # }] + # } + # }) + + agent = get_default_agent(llm=llm) + conversation = Conversation( + agent=agent, + workspace=str(workspace), + hook_config=hook_config, + ) + + # Demo 1: Safe command (PostToolUse logs it) + print("=" * 60) + print("Demo 1: Safe command - logged by PostToolUse") + print("=" * 60) + conversation.send_message("Run: echo 'Hello from hooks!'") + conversation.run() + + if log_file.exists(): + print(f"\n[Log: {log_file.read_text().strip()}]") + + # Demo 2: Dangerous command (PreToolUse blocks it) + print("\n" + "=" * 60) + print("Demo 2: Dangerous command - blocked by PreToolUse") + print("=" * 60) + conversation.send_message("Run: rm -rf /tmp/test") + conversation.run() + + # Demo 3: Context injection + Stop hook enforcement + print("\n" + "=" * 60) + print("Demo 3: Context injection + Stop hook") + print("=" * 60) + print("UserPromptSubmit injects git status; Stop requires summary.txt\n") + conversation.send_message( + "Check what files have changes, then create summary.txt describing the repo." + ) + conversation.run() + + if summary_file.exists(): + print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") + + print("\n" + "=" * 60) + print("Example Complete!") + print("=" * 60) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") +``` + + + +### Hook Scripts + +The example uses external hook scripts in the `hook_scripts/` directory: + + +```bash +#!/bin/bash +# PreToolUse hook: Block dangerous rm -rf commands +# Uses jq for JSON parsing (needed for nested fields like tool_input.command) + +input=$(cat) +command=$(echo "$input" | jq -r '.tool_input.command // ""') + +# Block rm -rf commands +if [[ "$command" =~ "rm -rf" ]]; then + echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' + exit 2 # Exit code 2 = block the operation +fi + +exit 0 # Exit code 0 = allow the operation +``` + + + +```bash +#!/bin/bash +# PostToolUse hook: Log all tool usage +# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) + +# LOG_FILE should be set by the calling script +LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" + +echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" +exit 0 +``` + + + +```bash +#!/bin/bash +# UserPromptSubmit hook: Inject git status when user asks about code changes + +input=$(cat) + +# Check if user is asking about changes, diff, or git +if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then + # Get git status if in a git repo + if git rev-parse --git-dir > /dev/null 2>&1; then + status=$(git status --short 2>/dev/null | head -10) + if [ -n "$status" ]; then + # Escape for JSON + escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') + echo "{\"additionalContext\": \"Current git status: $escaped\"}" + fi + fi +fi +exit 0 +``` + + + +```bash +#!/bin/bash +# Stop hook: Require a summary.txt file before allowing agent to finish +# SUMMARY_FILE should be set by the calling script + +SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" + +if [ ! -f "$SUMMARY_FILE" ]; then + echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' + exit 2 +fi +exit 0 +``` + + + +## Next Steps + +- See also: [Metrics and Observability](/sdk/guides/metrics) +- Architecture: [Events](/sdk/arch/events) + + +# Iterative Refinement +Source: https://docs.openhands.dev/sdk/guides/iterative-refinement + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: +1. A **refactoring agent** performs the main task (e.g., code conversion) +2. A **critique agent** evaluates the quality and provides detailed feedback +3. If quality is below threshold, the refactoring agent tries again with the feedback + +This pattern is useful for: +- Code refactoring and modernization (e.g., COBOL to Java) +- Document translation and localization +- Content generation with quality requirements +- Any task requiring iterative improvement + +## How It Works + +### The Iteration Loop + +The core workflow runs in a loop until quality threshold is met: + +```python icon="python" wrap +QUALITY_THRESHOLD = 90.0 +MAX_ITERATIONS = 5 + +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Phase 1: Refactoring agent converts COBOL to Java + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir) + ) + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + + # Phase 2: Critique agent evaluates the conversion + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir) + ) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse score and decide whether to continue + current_score = parse_critique_score(critique_file) + + iteration += 1 +``` + +### Critique Scoring + +The critique agent evaluates each file on four dimensions (0-25 pts each): +- **Correctness**: Does the Java code preserve the original business logic? +- **Code Quality**: Is the code clean and following Java conventions? +- **Completeness**: Are all COBOL features properly converted? +- **Best Practices**: Does it use proper OOP, error handling, and documentation? + +### Feedback Loop + +When the score is below threshold, the refactoring agent receives the critique file location: + +```python icon="python" wrap +if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" +``` + +## Customization + +### Adjusting Thresholds + +```python icon="python" wrap +QUALITY_THRESHOLD = 95.0 # Require higher quality +MAX_ITERATIONS = 10 # Allow more iterations +``` + +### Using Real COBOL Files + +The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) + + +```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py +#!/usr/bin/env python3 +""" +Iterative Refinement Example: COBOL to Java Refactoring + +This example demonstrates an iterative refinement workflow where: +1. A refactoring agent converts COBOL files to Java files +2. A critique agent evaluates the quality of each conversion and provides scores +3. If the average score is below 90%, the process repeats with feedback + +The workflow continues until the refactoring meets the quality threshold. + +Source COBOL files can be obtained from: +https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl +""" + +import os +import re +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.tools.preset.default import get_default_agent + + +QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) + + +def setup_workspace() -> tuple[Path, Path, Path]: + """Create workspace directories for the refactoring workflow.""" + workspace_dir = Path(tempfile.mkdtemp()) + cobol_dir = workspace_dir / "cobol" + java_dir = workspace_dir / "java" + critique_dir = workspace_dir / "critiques" + + cobol_dir.mkdir(parents=True, exist_ok=True) + java_dir.mkdir(parents=True, exist_ok=True) + critique_dir.mkdir(parents=True, exist_ok=True) + + return workspace_dir, cobol_dir, java_dir + + +def create_sample_cobol_files(cobol_dir: Path) -> list[str]: + """Create sample COBOL files for demonstration. + + In a real scenario, you would clone files from: + https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl + """ + sample_files = { + "CBACT01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBACT01C. + ***************************************************************** + * Program: CBACT01C - Account Display Program + * Purpose: Display account information for a given account number + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-ACCOUNT-ID PIC 9(11). + 01 WS-ACCOUNT-STATUS PIC X(1). + 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. + 01 WS-CUSTOMER-NAME PIC X(50). + 01 WS-ERROR-MSG PIC X(80). + + PROCEDURE DIVISION. + PERFORM 1000-INIT. + PERFORM 2000-PROCESS. + PERFORM 3000-TERMINATE. + STOP RUN. + + 1000-INIT. + INITIALIZE WS-ACCOUNT-ID + INITIALIZE WS-ACCOUNT-STATUS + INITIALIZE WS-ACCOUNT-BALANCE + INITIALIZE WS-CUSTOMER-NAME. + + 2000-PROCESS. + DISPLAY "ENTER ACCOUNT NUMBER: " + ACCEPT WS-ACCOUNT-ID + IF WS-ACCOUNT-ID = ZEROS + MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG + DISPLAY WS-ERROR-MSG + ELSE + DISPLAY "ACCOUNT: " WS-ACCOUNT-ID + DISPLAY "STATUS: " WS-ACCOUNT-STATUS + DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE + END-IF. + + 3000-TERMINATE. + DISPLAY "PROGRAM COMPLETE". +""", + "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBCUS01C. + ***************************************************************** + * Program: CBCUS01C - Customer Information Program + * Purpose: Manage customer data operations + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-CUSTOMER-ID PIC 9(9). + 01 WS-FIRST-NAME PIC X(25). + 01 WS-LAST-NAME PIC X(25). + 01 WS-ADDRESS PIC X(100). + 01 WS-PHONE PIC X(15). + 01 WS-EMAIL PIC X(50). + 01 WS-OPERATION PIC X(1). + 88 OP-ADD VALUE 'A'. + 88 OP-UPDATE VALUE 'U'. + 88 OP-DELETE VALUE 'D'. + 88 OP-DISPLAY VALUE 'V'. + + PROCEDURE DIVISION. + PERFORM 1000-MAIN-PROCESS. + STOP RUN. + + 1000-MAIN-PROCESS. + DISPLAY "CUSTOMER MANAGEMENT SYSTEM" + DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" + ACCEPT WS-OPERATION + EVALUATE TRUE + WHEN OP-ADD + PERFORM 2000-ADD-CUSTOMER + WHEN OP-UPDATE + PERFORM 3000-UPDATE-CUSTOMER + WHEN OP-DELETE + PERFORM 4000-DELETE-CUSTOMER + WHEN OP-DISPLAY + PERFORM 5000-DISPLAY-CUSTOMER + WHEN OTHER + DISPLAY "INVALID OPERATION" + END-EVALUATE. + + 2000-ADD-CUSTOMER. + DISPLAY "ADDING NEW CUSTOMER" + ACCEPT WS-CUSTOMER-ID + ACCEPT WS-FIRST-NAME + ACCEPT WS-LAST-NAME + DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. + + 3000-UPDATE-CUSTOMER. + DISPLAY "UPDATING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. + + 4000-DELETE-CUSTOMER. + DISPLAY "DELETING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. + + 5000-DISPLAY-CUSTOMER. + DISPLAY "DISPLAYING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "ID: " WS-CUSTOMER-ID + DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. +""", + "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBTRN01C. + ***************************************************************** + * Program: CBTRN01C - Transaction Processing Program + * Purpose: Process financial transactions + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-TRANS-ID PIC 9(16). + 01 WS-TRANS-TYPE PIC X(2). + 88 TRANS-CREDIT VALUE 'CR'. + 88 TRANS-DEBIT VALUE 'DB'. + 88 TRANS-TRANSFER VALUE 'TR'. + 01 WS-TRANS-AMOUNT PIC S9(13)V99. + 01 WS-FROM-ACCOUNT PIC 9(11). + 01 WS-TO-ACCOUNT PIC 9(11). + 01 WS-TRANS-DATE PIC 9(8). + 01 WS-TRANS-STATUS PIC X(10). + + PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE. + PERFORM 2000-PROCESS-TRANSACTION. + PERFORM 3000-FINALIZE. + STOP RUN. + + 1000-INITIALIZE. + MOVE ZEROS TO WS-TRANS-ID + MOVE SPACES TO WS-TRANS-TYPE + MOVE ZEROS TO WS-TRANS-AMOUNT + MOVE "PENDING" TO WS-TRANS-STATUS. + + 2000-PROCESS-TRANSACTION. + DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " + ACCEPT WS-TRANS-TYPE + DISPLAY "ENTER AMOUNT: " + ACCEPT WS-TRANS-AMOUNT + EVALUATE TRUE + WHEN TRANS-CREDIT + PERFORM 2100-PROCESS-CREDIT + WHEN TRANS-DEBIT + PERFORM 2200-PROCESS-DEBIT + WHEN TRANS-TRANSFER + PERFORM 2300-PROCESS-TRANSFER + WHEN OTHER + MOVE "INVALID" TO WS-TRANS-STATUS + END-EVALUATE. + + 2100-PROCESS-CREDIT. + DISPLAY "PROCESSING CREDIT" + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. + + 2200-PROCESS-DEBIT. + DISPLAY "PROCESSING DEBIT" + ACCEPT WS-FROM-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. + + 2300-PROCESS-TRANSFER. + DISPLAY "PROCESSING TRANSFER" + ACCEPT WS-FROM-ACCOUNT + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. + + 3000-FINALIZE. + DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. +""", + } + + created_files = [] + for filename, content in sample_files.items(): + file_path = cobol_dir / filename + file_path.write_text(content) + created_files.append(filename) + + return created_files + + +def get_refactoring_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], + critique_file: Path | None = None, +) -> str: + """Generate the prompt for the refactoring agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + base_prompt = f"""Convert the following COBOL files to Java: + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Files to convert: +{files_list} + +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices + +Read each COBOL file and create the corresponding Java file in the target directory. +""" + + if critique_file and critique_file.exists(): + base_prompt += f""" + +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" + + return base_prompt + + +def get_critique_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], +) -> str: + """Generate the prompt for the critique agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + return f"""Evaluate the quality of COBOL to Java refactoring. + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Original COBOL files: +{files_list} + +Please evaluate each converted Java file against its original COBOL source. + +For each file, assess: +1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) +2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) +3. Completeness: Are all COBOL features properly converted? (0-25 pts) +4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) + +Create a critique report in the following EXACT format: + +# COBOL to Java Refactoring Critique Report + +## Summary +[Brief overall assessment] + +## File Evaluations + +### [Original COBOL filename] +- **Java File**: [corresponding Java filename or "NOT FOUND"] +- **Correctness**: [score]/25 - [brief explanation] +- **Code Quality**: [score]/25 - [brief explanation] +- **Completeness**: [score]/25 - [brief explanation] +- **Best Practices**: [score]/25 - [brief explanation] +- **File Score**: [total]/100 +- **Issues to Address**: + - [specific issue 1] + - [specific issue 2] + ... + +[Repeat for each file] + +## Overall Score +- **Average Score**: [calculated average of all file scores] +- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] + +## Priority Improvements +1. [Most critical improvement needed] +2. [Second priority] +3. [Third priority] + +Save this report to: {java_dir.parent}/critiques/critique_report.md +""" + + +def parse_critique_score(critique_file: Path) -> float: + """Parse the average score from the critique report.""" + if not critique_file.exists(): + return 0.0 + + content = critique_file.read_text() + + # Look for "Average Score: X" pattern + patterns = [ + r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", + r"Average Score:\s*(\d+(?:\.\d+)?)", + r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", + ] + + for pattern in patterns: + match = re.search(pattern, content, re.IGNORECASE) + if match: + return float(match.group(1)) + + return 0.0 + + +def run_iterative_refinement() -> None: + """Run the iterative refinement workflow.""" + # Setup + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") + + llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="iterative_refinement", + ) + + workspace_dir, cobol_dir, java_dir = setup_workspace() + critique_dir = workspace_dir / "critiques" + + print(f"Workspace: {workspace_dir}") + print(f"COBOL Directory: {cobol_dir}") + print(f"Java Directory: {java_dir}") + print(f"Critique Directory: {critique_dir}") + print() + + # Create sample COBOL files + cobol_files = create_sample_cobol_files(cobol_dir) + print(f"Created {len(cobol_files)} sample COBOL files:") + for f in cobol_files: + print(f" - {f}") + print() + + critique_file = critique_dir / "critique_report.md" + current_score = 0.0 + iteration = 0 + + while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + iteration += 1 + print("=" * 80) + print(f"ITERATION {iteration}") + print("=" * 80) + + # Phase 1: Refactoring + print("\n--- Phase 1: Refactoring Agent ---") + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir), + ) + + previous_critique = critique_file if iteration > 1 else None + refactoring_prompt = get_refactoring_prompt( + cobol_dir, java_dir, cobol_files, previous_critique + ) + + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + print("Refactoring phase complete.") + + # Phase 2: Critique + print("\n--- Phase 2: Critique Agent ---") + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir), + ) + + critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + print("Critique phase complete.") + + # Parse the score + current_score = parse_critique_score(critique_file) + print(f"\nCurrent Score: {current_score:.1f}%") + + if current_score >= QUALITY_THRESHOLD: + print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") + else: + print( + f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " + "Continuing refinement..." + ) + + # Final summary + print("\n" + "=" * 80) + print("ITERATIVE REFINEMENT COMPLETE") + print("=" * 80) + print(f"Total iterations: {iteration}") + print(f"Final score: {current_score:.1f}%") + print(f"Workspace: {workspace_dir}") + + # List created Java files + print("\nCreated Java files:") + for java_file in java_dir.glob("*.java"): + print(f" - {java_file.name}") + + # Show critique file location + if critique_file.exists(): + print(f"\nFinal critique report: {critique_file}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + run_iterative_refinement() +``` + + + +## Next Steps + +- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents +- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow + + +# Exception Handling +Source: https://docs.openhands.dev/sdk/guides/llm-error-handling + +The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. + +This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. + +## Why typed exceptions? + +LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: + +- One code path to handle auth, rate limits, timeouts, service issues, and bad requests +- Clear behavior when conversation history exceeds the context window +- Backward compatibility when you switch providers or SDK versions + +## Quick start: Using agents and conversations + +Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import Agent, Conversation, LLM +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +agent = Agent(llm=llm, tools=[]) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) + +try: + conversation.send_message( + "Continue the long analysis we started earlier…" + ) + conversation.run() + +except LLMContextWindowExceedError: + # Conversation is longer than the model’s context window + # Options: + # 1) Enable a condenser (recommended for long sessions) + # 2) Shorten inputs or reset conversation + print("Hit the context limit. Consider enabling a condenser.") + +except LLMAuthenticationError: + print( + "Invalid or missing API credentials." + "Check your API key or auth setup." + ) + +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") + +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") + +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") + +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") + +except LLMError as e: + # Fallback for other SDK LLM errors (parsing/validation, etc.) + print(f"Unhandled LLM error: {e}") +``` + + + +### Avoiding context‑window errors with a condenser + +If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. + +```python icon="python" focus={5-6, 9-14} wrap +from openhands.sdk.context.condenser import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), + max_size=10, + keep_first=2, +) + +agent = Agent(llm=llm, tools=[], condenser=condenser) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) +``` + + + See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). + + +## Handling errors with direct LLM calls + +The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. + +### Example: Using `.completion()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + response = llm.completion([ + Message.user([TextContent(text="Summarize our design doc")]) + ]) + print(response.message) + +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMAuthenticationError: + print("Invalid or missing API credentials.") +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") +except LLMError as e: + print(f"Unhandled LLM error: {e}") +``` + +### Example: Using `.responses()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + resp = llm.responses([ + Message.user( + [TextContent(text="Write a one-line haiku about code.")] + ) + ]) + print(resp.message) +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMError as e: + print(f"LLM error: {e}") +``` + +## Exception reference + +All exceptions live under `openhands.sdk.llm.exceptions` unless noted. + +| Category | Error | Description | +|--------|------|-------------| +| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | +| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | +| | `LLMRateLimitError` | Provider rate limit exceeded. | +| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | +| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | +| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | +| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | +| | `LLMNoActionError` | Model did not return an action when one was expected. | +| | `LLMResponseError` | Could not extract an action from the response. | +| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | +| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | +| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | +| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | +| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | +| | `OperationCancelled` | A running operation was cancelled programmatically. | + + + All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all + for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. + + + +# LLM Fallback Strategy +Source: https://docs.openhands.dev/sdk/guides/llm-fallback + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. + +## Basic Usage + +Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): + +```python icon="python" wrap focus={16, 17, 21, 22, 23} +from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk.llm import FallbackStrategy + +# Menage persisted LLM profiles +# default store directory: .openhands/profiles +store = LLMProfileStore() + +fallback_llm = LLM( + usage_id="fallback-1", + model="openai/gpt-4o", + api_key=SecretStr("your-openai-key"), +) +store.save("fallback-1", fallback_llm, include_secrets=True) + +# Configure an LLM with a fallback strategy +primary_llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1"], + ), +) +``` + +## How It Works + +1. The primary LLM handles the request as normal +2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order +3. The first successful fallback response is returned to the caller +4. If all fallbacks fail, the original primary error is raised +5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + + +Only transient errors trigger fallback. +Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. +For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) + + +## Multiple Fallback Levels + +Chain as many fallback LLMs as you need. They are tried in list order: + +```python icon="python" wrap focus={5-7} +llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + ), +) +``` + +If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. + +## Custom Profile Store Directory + +By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: + +```python icon="python" wrap focus={3} +FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir="/path/to/my/profiles", +) +``` + +## Metrics + +Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: + +```python icon="python" wrap +# After running a conversation +metrics = llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") + +for usage in metrics.token_usages: + print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +``` + +Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. + +## Use Cases + +- **Rate limit handling** — When one provider throttles you, seamlessly switch to another +- **High availability** — Keep your agent running during provider outages +- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure +- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) + + +```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py +"""Example: Using FallbackStrategy for LLM resilience. + +When the primary LLM fails with a transient error (rate limit, timeout, etc.), +FallbackStrategy automatically tries alternate LLMs in order. Fallback is +per-call: each new request starts with the primary model. Token usage and +cost from fallback calls are merged into the primary LLM's metrics. + +This example: + 1. Saves two fallback LLM profiles to a temporary store. + 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. + 3. Runs a conversation — if the primary model is unavailable, the agent + transparently falls back to the next available model. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool +from openhands.sdk.llm import FallbackStrategy +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Read configuration from environment +api_key = os.getenv("LLM_API_KEY", None) +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +profile_store_dir = tempfile.mkdtemp() +store = LLMProfileStore(base_dir=profile_store_dir) + +fallback_1 = LLM( + usage_id="fallback-1", + model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +) +store.save("fallback-1", fallback_1, include_secrets=True) + +fallback_2 = LLM( + usage_id="fallback-2", + model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), +) +store.save("fallback-2", fallback_2, include_secrets=True) + +print(f"Saved fallback profiles: {store.list()}") + + +# Configure the primary LLM with a FallbackStrategy +primary_llm = LLM( + usage_id="agent-primary", + model=primary_model, + api_key=SecretStr(api_key), + base_url=base_url, + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir=profile_store_dir, + ), +) + + +# Run a conversation +agent = Agent( + llm=primary_llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +conversation = Conversation(agent=agent, workspace=os.getcwd()) +conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.run() + + +# Inspect metrics (includes any fallback usage) +metrics = primary_llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +print(f"Token usage records: {len(metrics.token_usages)}") +for usage in metrics.token_usages: + print( + f" model={usage.model}" + f" prompt={usage.prompt_tokens}" + f" completion={usage.completion_tokens}" + ) + +print(f"EXAMPLE_COST: {metrics.accumulated_cost}") +``` + + + +## Next Steps + +- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles +- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) +- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models + + +# Image Input +Source: https://docs.openhands.dev/sdk/guides/llm-image-input + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +### Sending Images + +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + +Pass images along with text in the message content: + +```python focus={14} icon="python" wrap +from openhands.sdk import ImageContent + +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +``` + +Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + + +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: + +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. + +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." + +cwd = os.getcwd() + +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +llm_messages = [] # collect raw LLM messages for inspection + + +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" + +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() + +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently + + +# LLM Profile Store +Source: https://docs.openhands.dev/sdk/guides/llm-profile-store + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. +Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. + +## Benefits +- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. +- **Reusability:** Import a defined profile into any script or session with a single identifier. +- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. + +## How It Works + + + + ### Create a Store + + The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, + but you can point it anywhere. + + ```python icon="python" focus={3, 4, 6, 7} + from openhands.sdk import LLMProfileStore + + # Default location: ~/.openhands/profiles + store = LLMProfileStore() + + # Or bring your own directory + store = LLMProfileStore(base_dir="./my-profiles") + ``` + + + ### Save a Profile + + Got an LLM configured just right? Save it for later. + + ```python icon="python" focus={11, 12} + from pydantic import SecretStr + from openhands.sdk import LLM, LLMProfileStore + + fast_llm = LLM( + usage_id="fast", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("sk-..."), + temperature=0.0, + ) + + store = LLMProfileStore() + store.save("fast", fast_llm) + ``` + + + API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to + persist them; otherwise, they will be read from the environment at load time. + + + + ### Load a Profile + + Next time you need that LLM, just load it: + + ```python icon="python" + # Same model, ready to go. + llm = store.load("fast") + ``` + + + ### List and Clean Up + + See what you've got, delete what you don't need: + + ```python icon="python" focus={1, 3, 4} + print(store.list()) # ['fast.json', 'creative.json'] + + store.delete("creative") + print(store.list()) # ['fast.json'] + ``` + + + +## Good to Know + +Profile names must be simple filenames (no slashes, no dots at the start). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) + + +```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py +"""Example: Using LLMProfileStore to save and reuse LLM configurations. + +LLMProfileStore persists LLM configurations as JSON files, so you can define +a profile once and reload it across sessions without repeating setup code. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, LLMProfileStore + + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +store = LLMProfileStore(base_dir=tempfile.mkdtemp()) + + +# 1. Create two LLM profiles with different usage + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +fast_llm = LLM( + usage_id="fast", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.0, +) + +creative_llm = LLM( + usage_id="creative", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.9, +) + +# 2. Save profiles + +# Note that secrets are excluded by default for safety. +store.save("fast", fast_llm) +store.save("creative", creative_llm) + +# To persist the API key as well, pass `include_secrets=True`: +# store.save("fast", fast_llm, include_secrets=True) + +# 3. List available persisted profiles + +print(f"Stored profiles: {store.list()}") + +# 4. Load a profile + +loaded = store.load("fast") +assert isinstance(loaded, LLM) +print( + "Loaded profile. " + f"usage:{loaded.usage_id}, " + f"model: {loaded.model}, " + f"temperature: {loaded.temperature}." +) + +# 5. Delete a profile + +store.delete("creative") +print(f"After deletion: {store.list()}") + +print("EXAMPLE_COST: 0") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully + + +# Reasoning +Source: https://docs.openhands.dev/sdk/guides/llm-reasoning + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. + +This guide demonstrates two provider-specific approaches: +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter + +## Anthropic Extended Thinking + +> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! + +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process +through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. + +### How It Works + +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: + +```python focus={6-11} icon="python" wrap +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") + +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` + +### Understanding Thinking Blocks + +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: + +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data + +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, +giving you insight into how Claude is approaching the problem. + +### Ready-to-run Example Antrophic + + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + + +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Setup agent with bash tool +agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) + + +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") + + +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) + +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() + +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("✅ Done!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## OpenAI Reasoning via Responses API + +> A ready-to-run example is available [here](#ready-to-run-example-openai)! + +OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) +that provides access to the model's reasoning process. +By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. + +### How It Works + +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: + +```python focus={5} icon="python" wrap +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` + +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of +reasoning performed by the model. + +Then capture reasoning traces in your callback: + +```python focus={3-4} icon="python" wrap +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` + +### Understanding Reasoning Traces + +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. +These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. +Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + +### Ready-to-run Example OpenAI + + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + + +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation + +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" + +from __future__ import annotations + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + +model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", +) + +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) + +llm_messages = [] # collect raw LLM-convertible messages for inspection + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) + +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() + +conversation.send_message("Now delete FACTS.txt.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Use Cases + +**Debugging**: Understand why the agent made specific decisions or took certain actions. + +**Transparency**: Show users how the AI arrived at its conclusions. + +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. + +**Learning**: Study how models approach complex problems. + +## Next Steps + +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities + + +# LLM Registry +Source: https://docs.openhands.dev/sdk/guides/llm-registry + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use the LLM registry to manage multiple LLM providers and dynamically switch between models. + +## Using the Registry + +You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. + +```python icon="python" focus={9,10,13} +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# define the registry and add an LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +... +# retrieve the LLM by its usage ID +llm = llm_registry.get("agent") +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + + +# Model Routing +Source: https://docs.openhands.dev/sdk/guides/llm-routing + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This feature is under active development and more default routers will be available in future releases. + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Using the built-in MultimodalRouter + +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: + +```python icon="python" wrap focus={13-16} +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) +``` + +You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) + + +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: + +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="openhands/devstral-small-2507", + base_url=base_url, + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) + +# Tools +tools = get_default_tools() # Use our default openhands experience + +# Agent +agent = Agent(llm=multimodal_router, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() +) + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], + ) +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + + +# LLM Streaming +Source: https://docs.openhands.dev/sdk/guides/llm-streaming + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +This is currently only supported for the chat completion endpoint. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use +streaming callbacks to process and display tokens as they arrive from the language model. + + +## How It Works + +Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the +complete response. This creates a more responsive user experience, especially for long-form content generation. + + + + ### Enable Streaming on LLM + Configure the LLM with streaming enabled: + + ```python focus={6} icon="python" wrap + llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, # Enable streaming + ) + ``` + + + ### Define Token Callback + Create a callback function that processes streaming chunks as they arrive: + + ```python icon="python" wrap + def on_token(chunk: ModelResponseStream) -> None: + """Process each streaming chunk as it arrives.""" + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + content = getattr(delta, "content", None) + if isinstance(content, str): + sys.stdout.write(content) + sys.stdout.flush() + ``` + + The callback receives a `ModelResponseStream` object containing: + - **`choices`**: List of response choices from the model + - **`delta`**: Incremental content changes for each choice + - **`content`**: The actual text tokens being streamed + + + ### Register Callback with Conversation + + Pass your token callback to the conversation: + + ```python focus={3} icon="python" wrap + conversation = Conversation( + agent=agent, + token_callbacks=[on_token], # Register streaming callback + workspace=os.getcwd(), + ) + ``` + + The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers + if needed (e.g., one for display, another for logging). + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) + + +```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py +import os +import sys +from typing import Literal + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.sdk.llm.streaming import ModelResponseStream +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +if not api_key: + raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") + +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, +) + +agent = get_default_agent(llm=llm, cli_mode=True) + + +# Define streaming states +StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] +# Track state across on_token calls for boundary detection +_current_state: StreamingState | None = None + + +def on_token(chunk: ModelResponseStream) -> None: + """ + Handle all types of streaming tokens including content, + tool calls, and thinking blocks with dynamic boundary detection. + """ + global _current_state + + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + # Handle thinking blocks (reasoning content) + reasoning_content = getattr(delta, "reasoning_content", None) + if isinstance(reasoning_content, str) and reasoning_content: + if _current_state != "thinking": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("THINKING: ") + _current_state = "thinking" + sys.stdout.write(reasoning_content) + sys.stdout.flush() + + # Handle regular content + content = getattr(delta, "content", None) + if isinstance(content, str) and content: + if _current_state != "content": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("CONTENT: ") + _current_state = "content" + sys.stdout.write(content) + sys.stdout.flush() + + # Handle tool calls + tool_calls = getattr(delta, "tool_calls", None) + if tool_calls: + for tool_call in tool_calls: + tool_name = ( + tool_call.function.name if tool_call.function.name else "" + ) + tool_args = ( + tool_call.function.arguments + if tool_call.function.arguments + else "" + ) + if tool_name: + if _current_state != "tool_name": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL NAME: ") + _current_state = "tool_name" + sys.stdout.write(tool_name) + sys.stdout.flush() + if tool_args: + if _current_state != "tool_args": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL ARGS: ") + _current_state = "tool_args" + sys.stdout.write(tool_args) + sys.stdout.flush() + + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + token_callbacks=[on_token], +) + +story_prompt = ( + "Tell me a long story about LLM streaming, write it a file, " + "make sure it has multiple paragraphs. " +) +conversation.send_message(story_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +cleanup_prompt = ( + "Thank you. Please delete the streaming story file now that I've read it, " + "then confirm the deletion." +) +conversation.send_message(cleanup_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully +- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI + + +# LLM Subscriptions +Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. + +## How It Works + + + + ### Call subscription_login() + + The `LLM.subscription_login()` class method handles the entire authentication flow: + + ```python icon="python" + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. + + + ### Use the LLM + + Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. + + + +## Supported Models + +The following models are available via ChatGPT subscription: + +| Model | Description | +|-------|-------------| +| `gpt-5.2-codex` | Latest Codex model (default) | +| `gpt-5.2` | GPT-5.2 base model | +| `gpt-5.1-codex-max` | High-capacity Codex model | +| `gpt-5.1-codex-mini` | Lightweight Codex model | + +## Configuration Options + +### Force Fresh Login + +If your cached credentials become stale or you want to switch accounts: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + force_login=True, # Always perform fresh OAuth login +) +``` + +### Disable Browser Auto-Open + +For headless environments or when you prefer to manually open the URL: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + open_browser=False, # Prints URL to console instead +) +``` + +### Check Subscription Mode + +Verify that the LLM is using subscription-based authentication: + +```python icon="python" +llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") +print(f"Using subscription: {llm.is_subscription}") # True +``` + +## Credential Storage + +Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) + + +```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py +"""Example: Using ChatGPT subscription for Codex models. + +This example demonstrates how to use your ChatGPT Plus/Pro subscription +to access OpenAI's Codex models without consuming API credits. + +The subscription_login() method handles: +- OAuth PKCE authentication flow +- Credential caching (~/.openhands/auth/) +- Automatic token refresh + +Supported models: +- gpt-5.2-codex +- gpt-5.2 +- gpt-5.1-codex-max +- gpt-5.1-codex-mini + +Requirements: +- Active ChatGPT Plus or Pro subscription +- Browser access for initial OAuth login +""" + +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# First time: Opens browser for OAuth login +# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" +) + +# Alternative: Force a fresh login (useful if credentials are stale) +# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) + +# Alternative: Disable auto-opening browser (prints URL to console instead) +# llm = LLM.subscription_login( +# vendor="openai", model="gpt-5.2-codex", open_browser=False +# ) + +# Verify subscription mode is active +print(f"Using subscription mode: {llm.is_subscription}") + +# Use the LLM with an agent as usual +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("List the files in the current directory.") +conversation.run() +print("Done!") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token +- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces + + +# Model Context Protocol +Source: https://docs.openhands.dev/sdk/guides/mcp + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. + Read more about MCP [here](https://modelcontextprotocol.io/). + + + + +## Basic MCP Usage + +> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! + + + + ### MCP Configuration + Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) + + ```python mcp_config icon="python" wrap focus={3-10} + mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "repomix": { + "command": "npx", + "args": ["-y", "repomix@1.4.2", "--mcp"] + }, + } + } + ``` + + + ### Tool Filtering + Use `filter_tools_regex` to control which MCP tools are available to the agent + + ```python filter_tools_regex focus={4-5} icon="python" + agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", + ) + ``` + + + +## MCP with OAuth + +> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! + +For MCP servers requiring OAuth authentication: +- Configure OAuth-enabled MCP servers by specifying the URL and auth type +- The SDK automatically handles the OAuth flow when first connecting +- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) +- User will be prompted to authenticate +- Access tokens are securely stored and automatically refreshed by FastMCP as needed + +```python mcp_config focus={5} icon="python" wrap +mcp_config = { + "mcpServers": { + "Notion": { + "url": "https://mcp.notion.com/mcp", + "auth": "oauth" + } + } +} +``` + +## Ready-to-Run Basic MCP Usage Example + + +This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) + + +Here's an example integrating MCP servers with an agent: + +```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, + } +} +# Agent +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + # This regex filters out all repomix tools except pack_codebase + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", +) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Ready-to-Run MCP with OAuth Example + + +This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) + + +```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +mcp_config = { + "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} +} +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage +- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation + + +# Metrics Tracking +Source: https://docs.openhands.dev/sdk/guides/metrics + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). + +## Getting Metrics from Individual LLMs + +> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! + +Track token usage, costs, and performance metrics from LLM interactions: + +### Accessing Individual LLM Metrics + +Access metrics directly from the LLM object after running the conversation: + +```python icon="python" focus={3-4} +conversation.run() + +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` + +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: + +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call + + + For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + + +### Ready-to-run Example (LLM metrics) + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Using LLM Registry for Cost Tracking + +> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! + +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. + +### How the LLM Registry Works + +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: + +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID + +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. + +### Ready-to-run Example (LLM Registry) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + +### Getting Aggregated Conversation Costs + + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + + +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. + +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os + +from pydantic import SecretStr +from tabulate import tabulate + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model=model, + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Understanding Conversation Stats + +The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: + +#### Key Methods and Properties + +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. + +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. + +```python icon="python" focus={2, 6, 10} +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") + +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") + +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +``` + +## Next Steps + +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models + + +# Observability & Tracing +Source: https://docs.openhands.dev/sdk/guides/observability + +> A full setup example is available [here](#example:-full-setup)! + +## Overview + +The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: + +- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support +- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing +- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more + +The SDK automatically traces: +- Agent execution steps +- Tool calls and executions +- LLM API calls (via LiteLLM integration) +- Browser automation sessions (when using browser-use) +- Conversation lifecycle events + +## Quick Start + +Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. + +### Using Laminar + +[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: + +```bash icon="terminal" wrap +# Set your Laminar project API key +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +That's it! Run your agent code normally and traces will be sent to Laminar automatically. + +### Using Honeycomb or Other OTLP Backends + +For Honeycomb, Jaeger, or any other OTLP-compatible backend: + +```bash icon="terminal" wrap +# Required: Set the OTLP endpoint +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" + +# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" + +# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it +``` + +### Alternative Configuration Methods + +You can also use these alternative environment variable formats: + +```bash icon="terminal" wrap +# Short form for endpoint +export OTEL_ENDPOINT="http://localhost:4317" + +# Alternative header format +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" + +# Alternative protocol specification +export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" +``` + +## How It Works + +The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: + +1. **Detects Configuration**: Checks for OTEL environment variables on startup +2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter +3. **Instruments Code**: Automatically wraps key functions with tracing decorators +4. **Captures Context**: Associates traces with conversation IDs for session grouping +5. **Exports Spans**: Sends trace data to your configured backend + +### What Gets Traced + +The SDK automatically instruments these components: + +- **`agent.step`** - Each iteration of the agent's execution loop +- **Tool Executions** - Individual tool calls with input/output capture +- **LLM Calls** - API requests to language models via LiteLLM +- **Conversation Lifecycle** - Message sending, conversation runs, and title generation +- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) + +### Trace Hierarchy + +Traces are organized hierarchically: + + + + + + + + + + + + + + + +Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single +conversation together in your observability platform. + +Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. + +## Configuration Reference + +### Environment Variables + +The SDK checks for these environment variables (in order of precedence): + +| Variable | Description | Example | +|----------|-------------|---------| +| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | +| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | +| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | +| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | +| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | +| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | +| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | + +### Header Format + +Headers should be comma-separated `key=value` pairs with URL encoding for special characters: + +```bash icon="terminal" wrap +# Single header +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" + +# Multiple headers +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +``` + +### Protocol Options + +The SDK supports both HTTP and gRPC protocols: + +- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) +- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) + +## Platform-Specific Configuration + +### Laminar Setup + +1. Sign up at [laminar.sh](https://laminar.sh/) +2. Create a project and copy your API key +3. Set the environment variable: + +```bash icon="terminal" wrap +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. + +### Honeycomb Setup + +1. Sign up at [honeycomb.io](https://www.honeycomb.io/) +2. Get your API key from the account settings +3. Configure the environment: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +### Jaeger Setup + +For local development with Jaeger: + +```bash icon="terminal" wrap +# Start Jaeger all-in-one container +docker run -d --name jaeger \ + -p 4317:4317 \ + -p 16686:16686 \ + jaegertracing/all-in-one:latest + +# Configure SDK +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" +``` + +Access the Jaeger UI at http://localhost:16686 + +### Generic OTLP Collector + +For other backends, use their OTLP endpoint: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +## Advanced Usage + +### Disabling Observability + +To disable tracing, simply unset all OTEL environment variables: + +```bash icon="terminal" wrap +unset LMNR_PROJECT_API_KEY +unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT +unset OTEL_EXPORTER_OTLP_ENDPOINT +unset OTEL_ENDPOINT +``` + +The SDK will automatically skip all tracing instrumentation with minimal overhead. + +### Custom Span Attributes + +The SDK automatically adds these attributes to spans: + +- **`conversation_id`** - UUID of the conversation +- **`tool_name`** - Name of the tool being executed +- **`action.kind`** - Type of action being performed +- **`session_id`** - Groups all traces from one conversation + +### Debugging Tracing Issues + +If traces aren't appearing in your observability platform: + +1. **Verify Environment Variables**: + ```python icon="python" wrap + import os + + otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') + otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') + + print(f"OTEL Endpoint: {otel_endpoint}") + print(f"OTEL Headers: {otel_headers}") + ``` + +2. **Check SDK Logs**: The SDK logs observability initialization at debug level: + ```python icon="python" wrap + import logging + + logging.basicConfig(level=logging.DEBUG) + ``` + +3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: + ```bash icon="terminal" wrap + curl -v https://api.honeycomb.io:443/v1/traces + ``` + +4. **Validate Headers**: Check that authentication headers are properly URL-encoded + +## Troubleshooting + +### Traces Not Appearing + +**Problem**: No traces showing up in observability platform + +**Solutions**: +- Verify environment variables are set correctly +- Check network connectivity to OTLP endpoint +- Ensure authentication headers are valid +- Look for SDK initialization logs at debug level + +### High Trace Volume + +**Problem**: Too many spans being generated + +**Solutions**: +- Configure sampling at the collector level +- For Laminar with non-browser tools, browser instrumentation is automatically disabled +- Use backend-specific filtering rules + +### Performance Impact + +**Problem**: Concerned about tracing overhead + +**Solutions**: +- Tracing has minimal overhead when properly configured +- Disable tracing in development by unsetting environment variables +- Use asynchronous exporters (default in most OTLP configurations) + +## Example: Full Setup + + +This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) + + +```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py +""" +Observability & Laminar example + +This example demonstrates enabling OpenTelemetry tracing with Laminar in the +OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.terminal import TerminalTool + + +# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: +# export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# For non-Laminar OTLP backends, set OTEL_* variables instead. + +# Configure LLM and Agent +api_key = os.getenv("LLM_API_KEY") +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key) if api_key else None, + base_url=base_url, + usage_id="agent", +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +# Create conversation and run a simple task +conversation = Conversation(agent=agent, workspace=".") +conversation.send_message("List the files in the current directory and print them.") +conversation.run() +print( + "All done! Check your Laminar dashboard for traces " + "(session is the conversation UUID)." +) +``` + +```bash Running the Example +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/27_observability_laminar.py +``` + +## Next Steps + +- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces +- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application +- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions + + +# Plugins +Source: https://docs.openhands.dev/sdk/guides/plugins + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: + +- **Skills**: Specialized knowledge and workflows +- **Hooks**: Event handlers for tool lifecycle +- **MCP Config**: External tool server configurations +- **Agents**: Specialized agent definitions +- **Commands**: Slash commands + +The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). + +## Plugin Structure + + +See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. + + +A plugin follows this directory structure: + + + + + + + + + + + + + + + + + + + + + + + + + +Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. + +### Plugin Manifest + +The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: + +```json icon="file-code" wrap +{ + "name": "code-quality", + "version": "1.0.0", + "description": "Code quality tools and workflows", + "author": "openhands", + "license": "MIT", + "repository": "https://github.com/example/code-quality-plugin" +} +``` + +### Skills + +Skills are defined in markdown files with YAML frontmatter: + +```markdown icon="file-code" +--- +name: python-linting +description: Instructions for linting Python code +trigger: + type: keyword + keywords: + - lint + - linting + - code quality +--- + +# Python Linting Skill + +Run ruff to check for issues: + +\`\`\`bash +ruff check . +\`\`\` +``` + +### Hooks + +Hooks are defined in `hooks/hooks.json`: + +```json icon="file-code" wrap +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "file_editor", + "hooks": [ + { + "type": "command", + "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", + "timeout": 5 + } + ] + } + ] + } +} +``` + +### MCP Configuration + +MCP servers are configured in `.mcp.json`: + +```json wrap icon="file-code" +{ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} +``` + +## Using Plugin Components + +> The ready-to-run example is available [here](#ready-to-run-example)! + +Brief explanation on how to use a plugin with an agent. + + + + ### Loading a Plugin + First, load the desired plugins. + + ```python icon="python" + from openhands.sdk.plugin import Plugin + + # Load a single plugin + plugin = Plugin.load("/path/to/plugin") + + # Load all plugins from a directory + plugins = Plugin.load_all("/path/to/plugins") + ``` + + + ### Accessing Components + You can access the different plugin components to see which ones are available. + + ```python icon="python" + # Skills + for skill in plugin.skills: + print(f"Skill: {skill.name}") + + # Hooks configuration + if plugin.hooks: + print(f"Hooks configured: {plugin.hooks}") + + # MCP servers + if plugin.mcp_config: + servers = plugin.mcp_config.get("mcpServers", {}) + print(f"MCP servers: {list(servers.keys())}") + ``` + + + ### Using with an Agent + You can now feed your agent with your preferred plugin. + + ```python focus={3,10,17} icon="python" + # Create agent context with plugin skills + agent_context = AgentContext( + skills=plugin.skills, + ) + + # Create agent with plugin MCP config + agent = Agent( + llm=llm, + tools=tools, + mcp_config=plugin.mcp_config or {}, + agent_context=agent_context, + ) + + # Create conversation with plugin hooks + conversation = Conversation( + agent=agent, + hook_config=plugin.hooks, + ) + ``` + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) + + +```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py +"""Example: Loading Plugins via Conversation + +Demonstrates the recommended way to load plugins using the `plugins` parameter +on Conversation. Plugins bundle skills, hooks, and MCP config together. + +For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins +""" + +import os +import sys +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.plugin import PluginSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Locate example plugin directory +script_dir = Path(__file__).parent +plugin_path = script_dir / "example_plugins" / "code-quality" + +# Define plugins to load +# Supported sources: local path, "github:owner/repo", or git URL +# Optional: ref (branch/tag/commit), repo_path (for monorepos) +plugins = [ + PluginSource(source=str(plugin_path)), + # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), + # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), +] + +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Set LLM_API_KEY to run this example") + print("EXAMPLE_COST: 0") + sys.exit(0) + +# Configure LLM and Agent +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="plugin-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) +agent = Agent( + llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] +) + +# Create conversation with plugins - skills, MCP config, and hooks are merged +# Note: Plugins are loaded lazily on first send_message() or run() call +with tempfile.TemporaryDirectory() as tmpdir: + conversation = Conversation( + agent=agent, + workspace=tmpdir, + plugins=plugins, + ) + + # Test: The "lint" keyword triggers the python-linting skill + # This first send_message() call triggers lazy plugin loading + conversation.send_message("How do I lint Python code? Brief answer please.") + + # Verify skills were loaded from the plugin (after lazy loading) + skills = ( + conversation.agent.agent_context.skills + if conversation.agent.agent_context + else [] + ) + print(f"Loaded {len(skills)} skill(s) from plugins") + + conversation.run() + + print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` + + + + +## Next Steps + +- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers +- **[Hooks](/sdk/guides/hooks)** - Understand hook event types +- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers + + +# Secret Registry +Source: https://docs.openhands.dev/sdk/guides/secrets + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. +It automatically detects secret references in bash commands, injects them as environment variables when needed, +and masks secret values in command outputs to prevent accidental exposure. + +### Injecting Secrets + +Use the `update_secrets()` method to add secrets to your conversation. + + +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: + +```python focus={4,11} icon="python" wrap +from openhands.sdk.conversation.secret_source import SecretSource + +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) + +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) + + +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.secret import SecretSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + + +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} +) + +conversation.send_message("just echo $SECRET_TOKEN") + +conversation.run() + +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") + +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP +- **[Security Analyzer](/sdk/guides/security)** - Add security validation + + +# Security & Action Confirmation +Source: https://docs.openhands.dev/sdk/guides/security + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user +approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. + +## Confirmation Policy +> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! + +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python icon="python" focus={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) +``` + +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python icon="python" focus={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python icon="python" focus={2-5} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky." + "Please try a safer approach." + ) +``` + +### Ready-to-run Example Confirmation + + +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + + +Require user approval before executing agent actions: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.tools.preset.default import get_default_agent + + +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_action_preview(pending_actions) -> None: + print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing actions…") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("▶️ Running conversation.run()…") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Conditionally add security analyzer based on environment variable +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") + conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) + +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) + +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) + +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() + +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() + +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets execution_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` + + + +--- + +## Security Analyzer + +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: + +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined + +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. + +### LLM Security Analyzer + +> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! + +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. + +#### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python icon="python" focus={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent + +#### Ready-to-run Example Security Analyzer + + +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + + +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_blocked_actions(pending_actions) -> None: + print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set execution_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("▶️ Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) +conversation.set_confirmation_policy(ConfirmRisky()) + +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` + + + +### Custom Security Analyzer Implementation + +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. + +#### Creating a Custom Analyzer + +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: + +```python icon="python" focus={5, 8} +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent + +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" + + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW + +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + + + For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + + + +--- + +## Configurable Security Policy + +> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! + +Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. + + +### Using Custom Security Policies + +You can provide a custom security policy template when creating an agent: + +```python focus={9-13} icon="python" +from openhands.sdk import Agent, LLM + +llm = LLM( + usage_id="agent", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), +) + +# Provide a custom security policy template file +agent = Agent( + llm=llm, + tools=tools, + security_policy_filename="my_security_policy.j2", +) +``` + +Custom security policies allow you to: +- Define organization-specific risk assessment guidelines +- Set custom thresholds for security risk levels +- Add domain-specific security rules +- Tailor risk evaluation to your use case + +The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. + +### Ready-to-run Example Security Policy + + +Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) + + +Define custom security risk guidelines for your agent: + +```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py +"""OpenHands Agent SDK — Configurable Security Policy Example + +This example demonstrates how to use a custom security policy template +with an agent. Security policies define risk assessment guidelines that +help agents evaluate the safety of their actions. + +By default, agents use the built-in security_policy.j2 template. This +example shows how to: +1. Use the default security policy +2. Provide a custom security policy template embedded in the script +3. Apply the custom policy to guide agent behavior +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Define a custom security policy template inline +CUSTOM_SECURITY_POLICY = ( + "# 🔐 Custom Security Risk Policy\n" + "When using tools that support the security_risk parameter, assess the " + "safety risk of your actions:\n" + "\n" + "- **LOW**: Safe read-only actions.\n" + " - Viewing files, calculations, documentation.\n" + "- **MEDIUM**: Moderate container-scoped actions.\n" + " - File modifications, package installations.\n" + "- **HIGH**: Potentially dangerous actions.\n" + " - Network access, system modifications, data exfiltration.\n" + "\n" + "**Custom Rules**\n" + "- Always prioritize user data safety.\n" + "- Escalate to **HIGH** for any external data transmission.\n" +) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Example 1: Agent with default security policy +print("=" * 100) +print("Example 1: Agent with default security policy") +print("=" * 100) +default_agent = Agent(llm=llm, tools=tools) +print(f"Security policy filename: {default_agent.security_policy_filename}") +print("\nDefault security policy is embedded in the agent's system message.") + +# Example 2: Agent with custom security policy +print("\n" + "=" * 100) +print("Example 2: Agent with custom security policy") +print("=" * 100) + +# Create a temporary file for the custom security policy +with tempfile.NamedTemporaryFile( + mode="w", suffix=".j2", delete=False, encoding="utf-8" +) as temp_file: + temp_file.write(CUSTOM_SECURITY_POLICY) + custom_policy_path = temp_file.name + +try: + # Create agent with custom security policy (using absolute path) + custom_agent = Agent( + llm=llm, + tools=tools, + security_policy_filename=custom_policy_path, + ) + print(f"Security policy filename: {custom_agent.security_policy_filename}") + print("\nCustom security policy loaded from temporary file.") + + # Verify the custom policy is in the system message + system_message = custom_agent.static_system_message + if "Custom Security Risk Policy" in system_message: + print("✓ Custom security policy successfully embedded in system message.") + else: + print("✗ Custom security policy not found in system message.") + + # Run a conversation with the custom agent + print("\n" + "=" * 100) + print("Running conversation with custom security policy") + print("=" * 100) + + llm_messages = [] # collect raw LLM messages + + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + conversation = Conversation( + agent=custom_agent, + callbacks=[conversation_callback], + workspace=".", + ) + + conversation.send_message( + "Please create a simple Python script named hello.py that prints " + "'Hello, World!'. Make sure to follow security best practices." + ) + conversation.run() + + print("\n" + "=" * 100) + print("Conversation finished.") + print(f"Total LLM messages: {len(llm_messages)}") + print("=" * 100) + + # Report cost + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + +finally: + # Clean up temporary file + Path(custom_policy_path).unlink(missing_ok=True) + +print("\n" + "=" * 100) +print("Example Summary") +print("=" * 100) +print("This example demonstrated:") +print("1. Using the default security policy (security_policy.j2)") +print("2. Creating a custom security policy template") +print("3. Applying the custom policy via security_policy_filename parameter") +print("4. Running a conversation with the custom security policy") +print( + "\nYou can customize security policies to match your organization's " + "specific requirements." +) +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management + + +# Agent Skills & Context +Source: https://docs.openhands.dev/sdk/guides/skill + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. + +## Context Loading Methods + +| Method | When Content Loads | Use Case | +|--------|-------------------|----------| +| **Always-loaded** | At conversation start | Repository rules, coding standards | +| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | +| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | + +## Always-Loaded Context + +Content that's always in the system prompt. + +### Option 1: `AGENTS.md` (Auto-loaded) + +Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). + +```python icon="python" focus={3, 4} +from openhands.sdk.context.skills import load_project_skills + +# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root +skills = load_project_skills(workspace_dir="/path/to/repo") +agent_context = AgentContext(skills=skills) +``` + +### Option 2: Inline Skill (Code-defined) + +```python icon="python" focus={5-11} +from openhands.sdk import AgentContext +from openhands.sdk.context import Skill + +agent_context = AgentContext( + skills=[ + Skill( + name="code-style", + content="Always use type hints in Python.", + trigger=None, # No trigger = always loaded + ), + ] +) +``` + +## Trigger-Loaded Context + +Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). + +```python icon="python" focus={6} +from openhands.sdk.context import Skill, KeywordTrigger + +Skill( + name="encryption-helper", + content="Use the encrypt.sh script to encrypt messages.", + trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), +) +``` + +When user says "encrypt this", the content is injected into the message: + +```xml icon="file" + +The following information has been included based on a keyword match for "encrypt". +Skill location: /path/to/encryption-helper + +Use the encrypt.sh script to encrypt messages. + +``` + +## Progressive Disclosure (AgentSkills Standard) + +For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. + +```python icon="python" +from openhands.sdk.context.skills import load_skills_from_dir + +# Load SKILL.md files from a directory +_, _, agent_skills = load_skills_from_dir("/path/to/skills") +agent_context = AgentContext(skills=list(agent_skills.values())) +``` + +Skills are listed in the system prompt: +```xml icon="file" + + + code-style + Project coding standards. + /path/to/code-style/SKILL.md + + +``` + + +Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. + + +--- + +## Full Example + + +Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) + + +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# AgentContext provides flexible ways to customize prompts: +# 1. Skills: Inject instructions (always-active or keyword-triggered) +# 2. system_message_suffix: Append text to the system prompt +# 3. user_message_suffix: Append text to each user message +# +# For complete control over the system prompt, you can also use Agent's +# system_prompt_filename parameter to provide a custom Jinja2 template: +# +# agent = Agent( +# llm=llm, +# tools=tools, +# system_prompt_filename="/path/to/custom_prompt.j2", +# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, +# ) +# +# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active (repo skill) + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + # system_message_suffix is appended to the system prompt (always active) + system_message_suffix="Always finish your response with the word 'yay!'", + # user_message_suffix is appended to each user message + user_message_suffix="The first character of your response should be 'I'", + # You can also enable automatic load skills from + # public registry at https://github.com/OpenHands/extensions + load_public_skills=True, +) + +# Agent +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") +conversation.run() + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Now triggering public skill 'github'") +conversation.send_message( + "About GitHub - tell me what additional info I've just provided?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Creating Skills + +Skills are defined with a name, content (the instructions), and an optional trigger: + +```python icon="python" focus={3-14} +agent_context = AgentContext( + skills=[ + Skill( + name="AGENTS.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` + +### Keyword Triggers + +Use `KeywordTrigger` to activate skills only when specific words appear: + +```python icon="python" focus={4} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` + + +## File-Based Skills (`SKILL.md`) + +For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. + + +Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) + + +### Directory Structure + +Each skill is a directory containing: + + + + + + + + + + + + + + + + +where + +| Component | Required | Description | +|-------|----------|-------------| +| `SKILL.md` | Yes | Skill definition with frontmatter | +| `scripts/` | No | Executable scripts | +| `references/` | No | Reference documentation | +| `assets/` | No | Static assets | + + + +### `SKILL.md` Format + +The `SKILL.md` file defines the skill with YAML frontmatter: + +```md icon="markdown" +--- +name: my-skill # Required (standard) +description: > # Required (standard) + A brief description of what this skill does and when to use it. +license: MIT # Optional (standard) +compatibility: Requires bash # Optional (standard) +metadata: # Optional (standard) + author: your-name + version: "1.0" +triggers: # Optional (OpenHands extension) + - keyword1 + - keyword2 +--- + +# Skill Content + +Instructions and documentation for the agent... +``` + +#### Frontmatter Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Skill identifier (lowercase + hyphens) | +| `description` | Yes | What the skill does (shown to agent) | +| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | +| `license` | No | License name | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Custom key-value pairs | + + +Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. + + +### Loading Skills + +Use `load_skills_from_dir()` to load all skills from a directory: + +```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py +"""Example: Loading Skills from Disk (AgentSkills Standard) + +This example demonstrates how to load skills following the AgentSkills standard +from a directory on disk. + +Skills are modular, self-contained packages that extend an agent's capabilities +by providing specialized knowledge, workflows, and tools. They follow the +AgentSkills standard which includes: +- SKILL.md file with frontmatter metadata (name, description, triggers) +- Optional resource directories: scripts/, references/, assets/ + +The example_skills/ directory contains two skills: +- rot13-encryption: Has triggers (encrypt, decrypt) - listed in + AND content auto-injected when triggered +- code-style-guide: No triggers - listed in for on-demand access + +All SKILL.md files follow the AgentSkills progressive disclosure model: +they are listed in with name, description, and location. +Skills with triggers get the best of both worlds: automatic content injection +when triggered, plus the agent can proactively read them anytime. +""" + +import os +import sys +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, AgentContext, Conversation +from openhands.sdk.context.skills import ( + discover_skill_resources, + load_skills_from_dir, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Get the directory containing this script +script_dir = Path(__file__).parent +example_skills_dir = script_dir / "example_skills" + +# ========================================================================= +# Part 1: Loading Skills from a Directory +# ========================================================================= +print("=" * 80) +print("Part 1: Loading Skills from a Directory") +print("=" * 80) + +print(f"Loading skills from: {example_skills_dir}") + +# Discover resources in the skill directory +skill_subdir = example_skills_dir / "rot13-encryption" +resources = discover_skill_resources(skill_subdir) +print("\nDiscovered resources in rot13-encryption/:") +print(f" - scripts: {resources.scripts}") +print(f" - references: {resources.references}") +print(f" - assets: {resources.assets}") + +# Load skills from the directory +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) + +print("\nLoaded skills from directory:") +print(f" - Repo skills: {list(repo_skills.keys())}") +print(f" - Knowledge skills: {list(knowledge_skills.keys())}") +print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") + +# Access the loaded skill and show all AgentSkills standard fields +if agent_skills: + skill_name = next(iter(agent_skills)) + loaded_skill = agent_skills[skill_name] + print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") + print(f" - Name: {loaded_skill.name}") + desc = loaded_skill.description or "" + print(f" - Description: {desc[:70]}...") + print(f" - License: {loaded_skill.license}") + print(f" - Compatibility: {loaded_skill.compatibility}") + print(f" - Metadata: {loaded_skill.metadata}") + if loaded_skill.resources: + print(" - Resources:") + print(f" - Scripts: {loaded_skill.resources.scripts}") + print(f" - References: {loaded_skill.resources.references}") + print(f" - Assets: {loaded_skill.resources.assets}") + print(f" - Skill root: {loaded_skill.resources.skill_root}") + +# ========================================================================= +# Part 2: Using Skills with an Agent +# ========================================================================= +print("\n" + "=" * 80) +print("Part 2: Using Skills with an Agent") +print("=" * 80) + +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Skipping agent demo (LLM_API_KEY not set)") + print("\nTo run the full demo, set the LLM_API_KEY environment variable:") + print(" export LLM_API_KEY=your-api-key") + sys.exit(0) + +# Configure LLM +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="skills-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) + +# Create agent context with loaded skills +agent_context = AgentContext( + skills=list(agent_skills.values()), + # Disable public skills for this demo to keep output focused + load_public_skills=False, +) + +# Create agent with tools so it can read skill resources +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +# Create conversation +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Test the skill (triggered by "encrypt" keyword) +# The skill provides instructions and a script for ROT13 encryption +print("\nSending message with 'encrypt' keyword to trigger skill...") +conversation.send_message("Encrypt the message 'hello world'.") +conversation.run() + +print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") +print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` + + + + +### Key Functions + +#### `load_skills_from_dir()` + +Loads all skills from a directory, returning three dictionaries: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import load_skills_from_dir + +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) +``` + +- **repo_skills**: Skills from `repo.md` files (always active) +- **knowledge_skills**: Skills from `knowledge/` subdirectories +- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) + +#### `discover_skill_resources()` + +Discovers resource files in a skill directory: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import discover_skill_resources + +resources = discover_skill_resources(skill_dir) +print(resources.scripts) # List of script files +print(resources.references) # List of reference files +print(resources.assets) # List of asset files +print(resources.skill_root) # Path to skill directory +``` + +### Skill Location in Prompts + +The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: + +``` + +The following information has been included based on a keyword match for "encrypt". + +Skill location: /path/to/rot13-encryption +(Use this path to resolve relative file references in the skill content below) + +[skill content from SKILL.md] + +``` + +This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. + +### Example Skill: ROT13 Encryption + +Here's a skill with triggers (OpenHands extension): + +**SKILL.md:** +```markdown icon="markdown" +--- +name: rot13-encryption +description: > + This skill helps encrypt and decrypt messages using ROT13 cipher. +triggers: + - encrypt + - decrypt + - cipher +--- + +# ROT13 Encryption Skill + +Run the [encrypt.sh](scripts/encrypt.sh) script with your message: + +\`\`\`bash +./scripts/encrypt.sh "your message" +\`\`\` +``` + +**scripts/encrypt.sh:** +```bash icon="sh" +#!/bin/bash +echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +``` + +When the user says "encrypt", the skill is triggered and the agent can use the provided script. + +## Loading Public Skills + +OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. + +### Automatic Loading via AgentContext + +Enable public skills loading in your `AgentContext`: + +```python icon="python" focus={2} +agent_context = AgentContext( + load_public_skills=True, # Auto-load from public registry + skills=[ + # Your custom skills here + ] +) +``` + +When enabled, the SDK will: +1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run +2. Load all available skills from the repository +3. Merge them with your explicitly defined skills + +### Skill Naming and Triggers + +**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. + +**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. + +```python icon="python" +# Both skills will be triggered by "/codereview" +agent_context = AgentContext( + load_public_skills=True, # Loads public "code-review" skill + skills=[ + Skill( + name="custom-codereview-guide", # Different name = coexists + content="Project-specific guidelines...", + trigger=KeywordTrigger(keywords=["/codereview"]), + ), + ] +) +``` + + +**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. + + +### Programmatic Loading + +You can also load public skills manually and have more control: + +```python icon="python" +from openhands.sdk.context.skills import load_public_skills + +# Load all public skills +public_skills = load_public_skills() + +# Use with AgentContext +agent_context = AgentContext(skills=public_skills) + +# Or combine with custom skills +my_skills = [ + Skill(name="custom", content="Custom instructions", trigger=None) +] +agent_context = AgentContext(skills=my_skills + public_skills) +``` + +### Custom Skills Repository + +You can load skills from your own repository: + +```python icon="python" focus={3-7} +from openhands.sdk.context.skills import load_public_skills + +# Load from a custom repository +custom_skills = load_public_skills( + repo_url="https://github.com/my-org/my-skills", + branch="main" +) +``` + +### How It Works + +The `load_public_skills()` function uses git-based caching for efficiency: + +- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` +- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date +- **Offline mode**: Uses the cached version if network is unavailable + +This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. + + +Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. + + +## Customizing Agent Context + +### Message Suffixes + +Append custom instructions to the system prompt or user messages via `AgentContext`: + +```python icon="python" +agent_context = AgentContext( + system_message_suffix=""" + +Repository: my-project +Branch: feature/new-api + + """.strip(), + user_message_suffix="Remember to explain your reasoning." +) +``` + +- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) +- **`user_message_suffix`**: Appended to each user message + +### Replacing the Entire System Prompt + +For complete control, provide a custom Jinja2 template via the `Agent` class: + +```python icon="python" focus={6} +from openhands.sdk import Agent + +agent = Agent( + llm=llm, + tools=tools, + system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path + system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} +) +``` + +**Custom template example** (`custom_system_prompt.j2`): + +```jinja2 +You are a helpful coding assistant for {{ repo_name }}. + +{% if cli_mode %} +You are running in CLI mode. Keep responses concise. +{% endif %} + +Follow these guidelines: +- Write clean, well-documented code +- Consider edge cases and error handling +- Suggest tests when appropriate +``` + +**Key points:** +- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory +- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location +- Pass variables to the template via `system_prompt_kwargs` +- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval + diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..f2f60add --- /dev/null +++ b/llms.txt @@ -0,0 +1,173 @@ +# OpenHands Docs + +> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded. + +## Agent SDK + +- [Agent](https://docs.openhands.dev/sdk/arch/agent.md): High-level architecture of the reasoning-action loop +- [Agent Server Package](https://docs.openhands.dev/sdk/arch/agent-server.md): HTTP API server for remote agent execution with workspace isolation, container orchestration, and multi-user support. +- [Agent Skills & Context](https://docs.openhands.dev/sdk/guides/skill.md): Skills add specialized behaviors, domain knowledge, and context-aware triggers to your agent through structured prompts. +- [API-based Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md): Connect to hosted API-based agent server for fully managed infrastructure. +- [Apptainer Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md): Run agent server in rootless Apptainer containers for HPC and shared computing environments. +- [Ask Agent Questions](https://docs.openhands.dev/sdk/guides/convo-ask-agent.md): Get sidebar replies from the agent during conversation execution without interrupting the main flow. +- [Assign Reviews](https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md): Automate PR management with intelligent reviewer assignment and workflow notifications using OpenHands Agent +- [Browser Session Recording](https://docs.openhands.dev/sdk/guides/browser-session-recording.md): Record and replay your agent's browser sessions using rrweb. +- [Browser Use](https://docs.openhands.dev/sdk/guides/agent-browser-use.md): Enable web browsing and interaction capabilities for your agent. +- [Condenser](https://docs.openhands.dev/sdk/arch/condenser.md): High-level architecture of the conversation history compression system +- [Context Condenser](https://docs.openhands.dev/sdk/guides/context-condenser.md): Manage agent memory by condensing conversation history to save tokens. +- [Conversation](https://docs.openhands.dev/sdk/arch/conversation.md): High-level architecture of the conversation orchestration system +- [Conversation with Async](https://docs.openhands.dev/sdk/guides/convo-async.md): Use async/await for concurrent agent operations and non-blocking execution. +- [Creating Custom Agent](https://docs.openhands.dev/sdk/guides/agent-custom.md): Learn how to design specialized agents with custom tool sets +- [Critic (Experimental)](https://docs.openhands.dev/sdk/guides/critic.md): Real-time evaluation of agent actions using an LLM-based critic model, with built-in iterative refinement. +- [Custom Tools](https://docs.openhands.dev/sdk/guides/custom-tools.md): Tools define what agents can do. The SDK includes built-in tools for common operations and supports creating custom tools for specialized needs. +- [Custom Tools with Remote Agent Server](https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md): Learn how to use custom tools with a remote agent server by building a custom base image that includes your tool implementations. +- [Custom Visualizer](https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md): Customize conversation visualization by creating custom visualizers or configuring the default visualizer. +- [Design Principles](https://docs.openhands.dev/sdk/arch/design.md): Core architectural principles guiding the OpenHands Software Agent SDK's development. +- [Docker Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md): Run agent server in isolated Docker containers for security and reproducibility. +- [Events](https://docs.openhands.dev/sdk/arch/events.md): High-level architecture of the typed event framework +- [Exception Handling](https://docs.openhands.dev/sdk/guides/llm-error-handling.md): Provider‑agnostic exceptions raised by the SDK and recommended patterns for handling them. +- [FAQ](https://docs.openhands.dev/sdk/faq.md): Frequently asked questions about the OpenHands SDK +- [Getting Started](https://docs.openhands.dev/sdk/getting-started.md): Install the OpenHands SDK and build AI agents that write software. +- [Hello World](https://docs.openhands.dev/sdk/guides/hello-world.md): The simplest possible OpenHands agent - configure an LLM, create an agent, and complete a task. +- [Hooks](https://docs.openhands.dev/sdk/guides/hooks.md): Use lifecycle hooks to observe, log, and customize agent execution. +- [Image Input](https://docs.openhands.dev/sdk/guides/llm-image-input.md): Send images to multimodal agents for vision-based tasks and analysis. +- [Interactive Terminal](https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md): Enable agents to interact with terminal applications like ipython, python REPL, and other interactive CLI tools. +- [Iterative Refinement](https://docs.openhands.dev/sdk/guides/iterative-refinement.md): Implement iterative refinement workflows where agents refine their work based on critique feedback until quality thresholds are met. +- [LLM](https://docs.openhands.dev/sdk/arch/llm.md): High-level architecture of the provider-agnostic language model interface +- [LLM Fallback Strategy](https://docs.openhands.dev/sdk/guides/llm-fallback.md): Automatically try alternate LLMs when the primary model fails with a transient error. +- [LLM Profile Store](https://docs.openhands.dev/sdk/guides/llm-profile-store.md): Save, load, and manage reusable LLM configurations so you never repeat setup code again. +- [LLM Registry](https://docs.openhands.dev/sdk/guides/llm-registry.md): Dynamically select and configure language models using the LLM registry. +- [LLM Streaming](https://docs.openhands.dev/sdk/guides/llm-streaming.md): Stream LLM responses token-by-token for real-time display and interactive user experiences. +- [LLM Subscriptions](https://docs.openhands.dev/sdk/guides/llm-subscriptions.md): Use your ChatGPT Plus/Pro subscription to access Codex models without consuming API credits. +- [Local Agent Server](https://docs.openhands.dev/sdk/guides/agent-server/local-server.md): Run agents through a local HTTP server with RemoteConversation for client-server architecture. +- [MCP Integration](https://docs.openhands.dev/sdk/arch/mcp.md): High-level architecture of Model Context Protocol support +- [Metrics Tracking](https://docs.openhands.dev/sdk/guides/metrics.md): Track token usage, costs, and latency metrics for your agents. +- [Model Context Protocol](https://docs.openhands.dev/sdk/guides/mcp.md): Model Context Protocol (MCP) enables dynamic tool integration from external servers. Agents can discover and use MCP-provided tools automatically. +- [Model Routing](https://docs.openhands.dev/sdk/guides/llm-routing.md): Route agent's LLM requests to different models. +- [Observability & Tracing](https://docs.openhands.dev/sdk/guides/observability.md): Enable OpenTelemetry tracing to monitor and debug your agent's execution with tools like Laminar, Honeycomb, or any OTLP-compatible backend. +- [OpenHands Cloud Workspace](https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md): Connect to OpenHands Cloud for fully managed sandbox environments. +- [openhands.sdk.agent](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md): API reference for openhands.sdk.agent module +- [openhands.sdk.conversation](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md): API reference for openhands.sdk.conversation module +- [openhands.sdk.event](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md): API reference for openhands.sdk.event module +- [openhands.sdk.llm](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md): API reference for openhands.sdk.llm module +- [openhands.sdk.security](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md): API reference for openhands.sdk.security module +- [openhands.sdk.tool](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md): API reference for openhands.sdk.tool module +- [openhands.sdk.utils](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md): API reference for openhands.sdk.utils module +- [openhands.sdk.workspace](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md): API reference for openhands.sdk.workspace module +- [Overview](https://docs.openhands.dev/sdk/arch/overview.md): Understanding the OpenHands Software Agent SDK's package structure, component interactions, and execution models. +- [Overview](https://docs.openhands.dev/sdk/guides/agent-server/overview.md): Run agents on remote servers with isolated workspaces for production deployments. +- [Pause and Resume](https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md): Pause agent execution, perform operations, and resume without losing state. +- [Persistence](https://docs.openhands.dev/sdk/guides/convo-persistence.md): Save and restore conversation state for multi-session workflows. +- [Plugins](https://docs.openhands.dev/sdk/guides/plugins.md): Plugins bundle skills, hooks, MCP servers, agents, and commands into reusable packages that extend agent capabilities. +- [PR Review](https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md): Use OpenHands Agent to generate meaningful pull request review +- [Reasoning](https://docs.openhands.dev/sdk/guides/llm-reasoning.md): Access model reasoning traces from Anthropic extended thinking and OpenAI responses API. +- [SDK Package](https://docs.openhands.dev/sdk/arch/sdk.md): Core framework components for building agents - the reasoning loop, state management, and extensibility system. +- [Secret Registry](https://docs.openhands.dev/sdk/guides/secrets.md): Provide environment variables and secrets to agent workspace securely. +- [Security](https://docs.openhands.dev/sdk/arch/security.md): High-level architecture of action security analysis and validation +- [Security & Action Confirmation](https://docs.openhands.dev/sdk/guides/security.md): Control agent action execution through confirmation policy and security analyzer. +- [Send Message While Running](https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md): Interrupt running agents to provide additional context or corrections. +- [Skill](https://docs.openhands.dev/sdk/arch/skill.md): High-level architecture of the reusable prompt system +- [Software Agent SDK](https://docs.openhands.dev/sdk.md): Build AI agents that write software. A clean, modular SDK with production-ready tools. +- [Stuck Detector](https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md): Detect and handle stuck agents automatically with timeout mechanisms. +- [Sub-Agent Delegation](https://docs.openhands.dev/sdk/guides/agent-delegation.md): Enable parallel task execution by delegating work to multiple sub-agents that run independently and return consolidated results. +- [Theory of Mind (TOM) Agent](https://docs.openhands.dev/sdk/guides/agent-tom-agent.md): Enable your agent to understand user intent and preferences through Theory of Mind capabilities, providing personalized guidance based on user modeling. +- [TODO Management](https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md): Implement TODOs using OpenHands Agent +- [Tool System & MCP](https://docs.openhands.dev/sdk/arch/tool-system.md): High-level architecture of the action-observation tool framework +- [Workspace](https://docs.openhands.dev/sdk/arch/workspace.md): High-level architecture of the execution environment abstraction + +## OpenHands + +- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) +- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. +- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. +- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK +- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). +- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) +- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once +- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. +- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on +- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands +- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options +- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). +- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. +- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users +- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. +- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. +- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) +- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands +- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. +- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. +- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands +- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) +- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. +- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once +- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) +- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands +- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) +- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). +- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker +- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines +- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol +- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents +- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system +- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. +- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs +- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. +- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. +- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) +- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. +- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. +- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. +- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. +- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) +- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities +- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you +- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands +- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). +- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. +- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI +- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. +- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle +- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). +- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. +- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. +- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. +- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. +- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. +- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes +- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. +- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. +- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. +- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI +- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) +- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. +- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. +- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. +- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. +- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands +- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface +- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents +- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) +- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples +- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension +- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase +- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser +- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) +- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task +- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker +- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol + +## Overview + +- [Community](https://docs.openhands.dev/overview/community.md): Learn about the OpenHands community, mission, and values +- [Contributing](https://docs.openhands.dev/overview/contributing.md): Join us in building OpenHands and the future of AI. Learn how to contribute to make a meaningful impact. +- [FAQs](https://docs.openhands.dev/overview/faqs.md): Frequently asked questions about OpenHands. +- [First Projects](https://docs.openhands.dev/overview/first-projects.md): So you've [run OpenHands](/overview/quickstart). Now what? +- [General Skills](https://docs.openhands.dev/overview/skills/repo.md): General guidelines for OpenHands to work more effectively with the repository. +- [Global Skills](https://docs.openhands.dev/overview/skills/public.md): Global skills are [keyword-triggered skills](/overview/skills/keyword) that apply to all OpenHands users. The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). +- [Introduction](https://docs.openhands.dev/overview/introduction.md): Welcome to OpenHands, a community focused on AI-driven development +- [Keyword-Triggered Skills](https://docs.openhands.dev/overview/skills/keyword.md): Keyword-triggered skills provide OpenHands with specific instructions that are activated when certain keywords appear in the prompt. This is useful for tailoring behavior based on particular tools, languages, or frameworks. +- [Model Context Protocol (MCP)](https://docs.openhands.dev/overview/model-context-protocol.md): Model Context Protocol support across OpenHands platforms +- [Organization and User Skills](https://docs.openhands.dev/overview/skills/org.md): Organizations and users can define skills that apply to all repositories belonging to the organization or user. +- [Overview](https://docs.openhands.dev/overview/skills.md): Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. +- [Quick Start](https://docs.openhands.dev/overview/quickstart.md): Choose how you want to run OpenHands diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py new file mode 100755 index 00000000..9ab664e6 --- /dev/null +++ b/scripts/generate-llms-files.py @@ -0,0 +1,184 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import re +from dataclasses import dataclass +from pathlib import Path + +ROOT = Path(__file__).resolve().parents[1] +BASE_URL = "https://docs.openhands.dev" + +EXCLUDED_DIRS = {".git", ".github", ".agents", "tests", "openapi", "logo"} + + +@dataclass(frozen=True) +class DocPage: + rel_path: Path + route: str + title: str + description: str | None + body: str + + +_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) + + +def _strip_quotes(val: str) -> str: + val = val.strip() + if (val.startswith('"') and val.endswith('"')) or ( + val.startswith("'") and val.endswith("'") + ): + return val[1:-1] + return val + + +def parse_frontmatter(text: str) -> tuple[dict[str, str], str]: + m = _FRONTMATTER_RE.match(text) + if not m: + return {}, text + + fm_text = m.group(1) + body = text[m.end() :] + + fm: dict[str, str] = {} + for line in fm_text.splitlines(): + line = line.strip() + if not line or line.startswith("#"): + continue + if ":" not in line: + continue + k, v = line.split(":", 1) + k = k.strip() + v = v.strip() + if not k: + continue + fm[k] = _strip_quotes(v) + + return fm, body + + +def rel_to_route(rel_path: Path) -> str: + p = rel_path.as_posix() + if p.endswith(".mdx"): + p = p[: -len(".mdx")] + + if p.endswith("/index"): + p = p[: -len("/index")] + + return "/" + p.lstrip("/") + + +def is_v0_page(rel_path: Path) -> bool: + s = rel_path.as_posix() + if "/openhands/usage/v0/" in s: + return True + if rel_path.name.startswith("V0"): + return True + return False + + +def iter_doc_pages() -> list[DocPage]: + pages: list[DocPage] = [] + + for mdx_path in sorted(ROOT.rglob("*.mdx")): + rel_path = mdx_path.relative_to(ROOT) + + if any(part in EXCLUDED_DIRS for part in rel_path.parts): + continue + if is_v0_page(rel_path): + continue + + raw = mdx_path.read_text(encoding="utf-8") + fm, body = parse_frontmatter(raw) + + title = fm.get("title") + if not title: + continue + + description = fm.get("description") + route = rel_to_route(rel_path) + + pages.append( + DocPage( + rel_path=rel_path, + route=route, + title=title, + description=description, + body=body.strip(), + ) + ) + + return pages + + +def group_name(rel_path: Path) -> str: + top = rel_path.parts[0] + return { + "overview": "Overview", + "openhands": "OpenHands", + "sdk": "Agent SDK", + }.get(top, top.replace("-", " ").title()) + + +def build_llms_txt(pages: list[DocPage]) -> str: + grouped: dict[str, list[DocPage]] = {} + for p in pages: + grouped.setdefault(group_name(p.rel_path), []).append(p) + + for g in grouped: + grouped[g] = sorted(grouped[g], key=lambda x: (x.title.lower(), x.route)) + + lines: list[str] = [ + "# OpenHands Docs", + "", + "> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded.", + "", + ] + + for group in sorted(grouped.keys()): + lines.append(f"## {group}") + lines.append("") + + for p in grouped[group]: + url = f"{BASE_URL}{p.route}.md" + line = f"- [{p.title}]({url})" + if p.description: + line += f": {p.description}" + lines.append(line) + + lines.append("") + + return "\n".join(lines).rstrip() + "\n" + + +def build_llms_full_txt(pages: list[DocPage]) -> str: + header = [ + "# OpenHands Docs", + "", + "> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded.", + "", + ] + + chunks: list[str] = ["\n".join(header).rstrip()] + + for p in sorted(pages, key=lambda x: x.route): + chunks.append( + f"\n\n# {p.title}\nSource: {BASE_URL}{p.route}\n\n{p.body}\n" + ) + + return "".join(chunks).lstrip() + "\n" + + +def main() -> None: + pages = iter_doc_pages() + + llms_txt = build_llms_txt(pages) + llms_full = build_llms_full_txt(pages) + + (ROOT / "llms.txt").write_text(llms_txt, encoding="utf-8") + (ROOT / "llms-full.txt").write_text(llms_full, encoding="utf-8") + + +if __name__ == "__main__": + main() From 5963fa2ec1f6887a6d32aacdd5d09e67a7fb5140 Mon Sep 17 00:00:00 2001 From: openhands Date: Tue, 24 Feb 2026 09:10:13 +0000 Subject: [PATCH 2/6] docs: improve llms files structure and section labeling Co-authored-by: openhands --- llms-full.txt | 40319 +++++++++++++++---------------- llms.txt | 177 +- scripts/generate-llms-files.py | 168 +- 3 files changed, 20320 insertions(+), 20344 deletions(-) diff --git a/llms-full.txt b/llms-full.txt index 27215470..91325040 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -2,8 +2,10 @@ > Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded. -# About OpenHands -Source: https://docs.openhands.dev/openhands/usage/about +## OpenHands Web App Server + +### About OpenHands +Source: https://docs.openhands.dev/openhands/usage/about.md ## Research Strategy @@ -32,9 +34,8 @@ enhance the capabilities of OpenHands. Distributed under MIT [License](https://github.com/OpenHands/OpenHands/blob/main/LICENSE). - -# Configuration Options -Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options +### Configuration Options +Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md This page documents the current V1 configuration model. @@ -95,9 +96,8 @@ providers, see: - Web → Legacy (V0) → V0 Configuration Options - Web → Legacy (V0) → V0 Runtime Configuration - -# Custom Sandbox -Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide +### Custom Sandbox +Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md These settings are only available in [Local GUI](/openhands/usage/run-openhands/local-setup). OpenHands Cloud uses managed sandbox environments. @@ -199,9 +199,8 @@ platform = "linux/amd64" Run OpenHands by running ```make run``` in the top level directory. - -# Search Engine Setup -Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup +### Search Engine Setup +Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md ## Setting Up Search Engine in OpenHands @@ -272,9 +271,8 @@ If you encounter issues with the search functionality: - Ensure you have an active internet connection. - Check Tavily's status page for any service disruptions. - -# Main Agent and Capabilities -Source: https://docs.openhands.dev/openhands/usage/agents +### Main Agent and Capabilities +Source: https://docs.openhands.dev/openhands/usage/agents.md ## CodeActAgent @@ -299,9 +297,8 @@ https://github.com/OpenHands/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d _Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_. - -# REST API (V1) -Source: https://docs.openhands.dev/openhands/usage/api/v1 +### REST API (V1) +Source: https://docs.openhands.dev/openhands/usage/api/v1.md OpenHands is in a transition period: legacy (V0) endpoints still exist alongside @@ -335,9 +332,8 @@ The V1 API is organized around a few core concepts: - **Sandbox specs**: list the available sandbox “templates” (e.g., Docker image presets). - GET /api/v1/sandbox-specs/search - -# Backend Architecture -Source: https://docs.openhands.dev/openhands/usage/architecture/backend +### Backend Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/backend.md This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents. @@ -438,9 +434,8 @@ classDiagram - -# Runtime Architecture -Source: https://docs.openhands.dev/openhands/usage/architecture/runtime +### Runtime Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/runtime.md The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action. It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system. @@ -609,10952 +604,10954 @@ Key aspects of the plugin system: 4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions 5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping +### Repository Customization +Source: https://docs.openhands.dev/openhands/usage/customization/repository.md -# OpenHands Cloud -Source: https://docs.openhands.dev/openhands/usage/cli/cloud - -## Overview - -The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: - -- Authenticate with your OpenHands Cloud account -- Create new cloud conversations -- Use cloud resources without the web interface +## Skills (formerly Microagents) -## Authentication +Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands +should function. See [Skills Overview](/overview/skills) for more information. -### Login -Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: +## Setup Script +You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. +This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. +For example: ```bash -openhands login +#!/bin/bash +export MY_ENV_VAR="my value" +sudo apt-get update +sudo apt-get install -y lsof +cd frontend && npm install ; cd .. ``` -This opens a browser window for authentication. After successful login, your credentials are stored locally. - -#### Custom Server URL - -For self-hosted or enterprise deployments: +## Pre-commit Script +You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. +This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. +For example: ```bash -openhands login --server-url https://your-openhands-server.com -``` +#!/bin/bash +# Run linting checks +cd frontend && npm run lint +if [ $? -ne 0 ]; then + echo "Frontend linting failed. Please fix the issues before committing." + exit 1 +fi -You can also set the server URL via environment variable: +# Run tests +cd backend && pytest tests/unit +if [ $? -ne 0 ]; then + echo "Backend tests failed. Please fix the issues before committing." + exit 1 +fi -```bash -export OPENHANDS_CLOUD_URL=https://your-openhands-server.com -openhands login +exit 0 ``` -### Logout - -Log out from OpenHands Cloud: +### Debugging +Source: https://docs.openhands.dev/openhands/usage/developers/debugging.md -```bash -# Log out from all servers -openhands logout +The following is intended as a primer on debugging OpenHands for Development purposes. -# Log out from a specific server -openhands logout --server-url https://app.all-hands.dev -``` +## Server / VSCode -## Creating Cloud Conversations +The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: -Create a new conversation in OpenHands Cloud: +``` +{ + "version": "0.2.0", + "configurations": [ + { + "name": "OpenHands CLI", + "type": "debugpy", + "request": "launch", + "module": "openhands.cli.main", + "justMyCode": false + }, + { + "name": "OpenHands WebApp", + "type": "debugpy", + "request": "launch", + "module": "uvicorn", + "args": [ + "openhands.server.listen:app", + "--reload", + "--reload-exclude", + "${workspaceFolder}/workspace", + "--port", + "3000" + ], + "justMyCode": false + } + ] +} +``` -```bash -# With a task -openhands cloud -t "Review the codebase and suggest improvements" +More specific debugging configurations which include more parameters may be specified: -# From a file -openhands cloud -f task.txt +``` + ... + { + "name": "Debug CodeAct", + "type": "debugpy", + "request": "launch", + "module": "openhands.core.main", + "args": [ + "-t", + "Ask me what your task is.", + "-d", + "${workspaceFolder}/workspace", + "-c", + "CodeActAgent", + "-l", + "llm.o1", + "-n", + "prompts" + ], + "justMyCode": false + } + ... ``` -### Options - -| Option | Description | -|--------|-------------| -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | +Values in the snippet above can be updated such that: -### Examples + * *t*: the task + * *d*: the openhands workspace directory + * *c*: the agent + * *l*: the LLM config (pre-defined in config.toml) + * *n*: session name (e.g. eventstream name) -```bash -# Create a cloud conversation with a task -openhands cloud -t "Fix the authentication bug in login.py" +### Development Overview +Source: https://docs.openhands.dev/openhands/usage/developers/development-overview.md -# Create from a task file -openhands cloud -f requirements.txt +## Core Documentation -# Use a custom server -openhands cloud --server-url https://custom.server.com -t "Add unit tests" +### Project Fundamentals +- **Main Project Overview** (`/README.md`) + The primary entry point for understanding OpenHands, including features and basic setup instructions. -# Combine with environment variable -export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev -openhands cloud -t "Refactor the database module" -``` +- **Development Guide** (`/Development.md`) + Guide for developers working on OpenHands, including setup, requirements, and development workflows. -## Workflow +- **Contributing Guidelines** (`/CONTRIBUTING.md`) + Essential information for contributors, covering code style, PR process, and contribution workflows. -A typical workflow with OpenHands Cloud: +### Component Documentation -1. **Login once**: - ```bash - openhands login - ``` +#### Frontend +- **Frontend Application** (`/frontend/README.md`) + Complete guide for setting up and developing the React-based frontend application. -2. **Create conversations as needed**: - ```bash - openhands cloud -t "Your task here" - ``` +#### Backend +- **Backend Implementation** (`/openhands/README.md`) + Detailed documentation of the Python backend implementation and architecture. -3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server +- **Server Documentation** (`/openhands/server/README.md`) + Server implementation details, API documentation, and service architecture. -## Environment Variables +- **Runtime Environment** (`/openhands/runtime/README.md`) + Documentation covering the runtime environment, execution model, and runtime configurations. -| Variable | Description | -|----------|-------------| -| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | +#### Infrastructure +- **Container Documentation** (`/containers/README.md`) + Information about Docker containers, deployment strategies, and container management. -## Cloud vs Local +### Testing and Evaluation +- **Unit Testing Guide** (`/tests/unit/README.md`) + Instructions for writing, running, and maintaining unit tests. -| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | -|---------|---------------------------|---------------------| -| Compute | Cloud-hosted | Your machine | -| Persistence | Cloud storage | Local files | -| Collaboration | Share via link | Local only | -| Setup | Just login | Configure LLM & runtime | -| Cost | Subscription/usage-based | Your LLM API costs | +- **Evaluation Framework** (`/evaluation/README.md`) + Documentation for the evaluation framework, benchmarks, and performance testing. - -Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. - +### Advanced Features +- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) + Detailed information about the skills architecture, implementation, and usage. -## See Also +### Documentation Standards +- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) + Standards and guidelines for writing and maintaining project documentation. -- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation -- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide -- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access +## Getting Started with Development +If you're new to developing with OpenHands, we recommend following this sequence: -# Command Reference -Source: https://docs.openhands.dev/openhands/usage/cli/command-reference +1. Start with the main `README.md` to understand the project's purpose and features +2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute +3. Follow the setup instructions in `Development.md` +4. Dive into specific component documentation based on your area of interest: + - Frontend developers should focus on `/frontend/README.md` + - Backend developers should start with `/openhands/README.md` + - Infrastructure work should begin with `/containers/README.md` -## Basic Usage +## Documentation Updates -```bash -openhands [OPTIONS] [COMMAND] -``` +When making changes to the codebase, please ensure that: +1. Relevant documentation is updated to reflect your changes +2. New features are documented in the appropriate README files +3. Any API changes are reflected in the server documentation +4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` -## Global Options +### Evaluation Harness +Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md -| Option | Description | -|--------|-------------| -| `-v, --version` | Show version number and exit | -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | -| `--last` | Resume the most recent conversation (use with `--resume`) | -| `--exp` | Use textual-based UI (now default, kept for compatibility) | -| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | -| `--json` | Enable JSONL output (requires `--headless`) | -| `--always-approve` | Auto-approve all actions without confirmation | -| `--llm-approve` | Use LLM-based security analyzer for action approval | -| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | -| `--exit-without-confirmation` | Exit without showing confirmation dialog | +This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. -## Subcommands +## Setup Environment and LLM Configuration -### serve +Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. +OpenHands in development mode uses `config.toml` to keep track of most configurations. -Launch the OpenHands GUI server using Docker. +Here's an example configuration file you can use to define and use multiple LLMs: -```bash -openhands serve [OPTIONS] -``` +```toml +[llm] +# IMPORTANT: add your API key here, and set the model to the one you want to evaluate +model = "claude-3-5-sonnet-20241022" +api_key = "sk-XXX" -| Option | Description | -|--------|-------------| -| `--mount-cwd` | Mount the current working directory into the container | -| `--gpu` | Enable GPU support via nvidia-docker | +[llm.eval_gpt4_1106_preview_llm] +model = "gpt-4-1106-preview" +api_key = "XXX" +temperature = 0.0 -**Examples:** -```bash -openhands serve -openhands serve --mount-cwd -openhands serve --gpu -openhands serve --mount-cwd --gpu +[llm.eval_some_openai_compatible_model_llm] +model = "openai/MODEL_NAME" +base_url = "https://OPENAI_COMPATIBLE_URL/v1" +api_key = "XXX" +temperature = 0.0 ``` -### web -Launch the CLI as a web application accessible via browser. +## How to use OpenHands in the command line + +OpenHands can be run from the command line using the following format: ```bash -openhands web [OPTIONS] +poetry run python ./openhands/core/main.py \ + -i \ + -t "" \ + -c \ + -l ``` -| Option | Default | Description | -|--------|---------|-------------| -| `--host` | `0.0.0.0` | Host to bind the web server to | -| `--port` | `12000` | Port to bind the web server to | -| `--debug` | `false` | Enable debug mode | +For example: -**Examples:** ```bash -openhands web -openhands web --port 8080 -openhands web --host 127.0.0.1 --port 3000 -openhands web --debug +poetry run python ./openhands/core/main.py \ + -i 10 \ + -t "Write me a bash script that prints hello world." \ + -c CodeActAgent \ + -l llm ``` -### cloud +This command runs OpenHands with: +- A maximum of 10 iterations +- The specified task description +- Using the CodeActAgent +- With the LLM configuration defined in the `llm` section of your `config.toml` file -Create a new conversation in OpenHands Cloud. +## How does OpenHands work -```bash -openhands cloud [OPTIONS] -``` +The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: -| Option | Description | -|--------|-------------| -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | +1. Parse command-line arguments and load the configuration +2. Create a runtime environment using `create_runtime()` +3. Initialize the specified agent +4. Run the controller using `run_controller()`, which: + - Attaches the runtime to the agent + - Executes the agent's task + - Returns a final state when complete -**Examples:** -```bash -openhands cloud -t "Fix the bug" -openhands cloud -f task.txt -openhands cloud --server-url https://custom.server.com -t "Task" -``` +The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. -### acp -Start the Agent Client Protocol server for IDE integrations. +## Easiest way to get started: Exploring Existing Benchmarks -```bash -openhands acp [OPTIONS] -``` +We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. -| Option | Description | -|--------|-------------| -| `--resume [ID]` | Resume a conversation by ID | -| `--last` | Resume the most recent conversation | -| `--always-approve` | Auto-approve all actions | -| `--llm-approve` | Use LLM-based security analyzer | -| `--streaming` | Enable token-by-token streaming | +To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. -**Examples:** -```bash -openhands acp -openhands acp --llm-approve -openhands acp --resume abc123def456 -openhands acp --resume --last -``` +## How to create an evaluation workflow -### mcp -Manage Model Context Protocol server configurations. +To create an evaluation workflow for your benchmark, follow these steps: -```bash -openhands mcp [OPTIONS] -``` +1. Import relevant OpenHands utilities: + ```python + import openhands.agenthub + from evaluation.utils.shared import ( + EvalMetadata, + EvalOutput, + make_metadata, + prepare_dataset, + reset_logger_for_multiprocessing, + run_evaluation, + ) + from openhands.controller.state.state import State + from openhands.core.config import ( + AppConfig, + SandboxConfig, + get_llm_config_arg, + parse_arguments, + ) + from openhands.core.logger import openhands_logger as logger + from openhands.core.main import create_runtime, run_controller + from openhands.events.action import CmdRunAction + from openhands.events.observation import CmdOutputObservation, ErrorObservation + from openhands.runtime.runtime import Runtime + ``` -#### mcp add +2. Create a configuration: + ```python + def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: + config = AppConfig( + default_agent=metadata.agent_class, + runtime='docker', + max_iterations=metadata.max_iterations, + sandbox=SandboxConfig( + base_container_image='your_container_image', + enable_auto_lint=True, + timeout=300, + ), + ) + config.set_llm_config(metadata.llm_config) + return config + ``` -Add a new MCP server. +3. Initialize the runtime and set up the evaluation environment: + ```python + def initialize_runtime(runtime: Runtime, instance: pd.Series): + # Set up your evaluation environment here + # For example, setting environment variables, preparing files, etc. + pass + ``` -```bash -openhands mcp add --transport [OPTIONS] [-- args...] -``` +4. Create a function to process each instance: + ```python + from openhands.utils.async_utils import call_async_from_sync + def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: + config = get_config(instance, metadata) + runtime = create_runtime(config) + call_async_from_sync(runtime.connect) + initialize_runtime(runtime, instance) -| Option | Description | -|--------|-------------| -| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | -| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | -| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | -| `--auth` | Authentication method (e.g., `oauth`) | -| `--enabled` | Enable immediately (default) | -| `--disabled` | Add in disabled state | + instruction = get_instruction(instance, metadata) -**Examples:** -```bash -openhands mcp add my-api --transport http https://api.example.com/mcp -openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com -openhands mcp add local --transport stdio python -- -m my_server -openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server -``` + state = run_controller( + config=config, + task_str=instruction, + runtime=runtime, + fake_user_response_fn=your_user_response_function, + ) -#### mcp list + # Evaluate the agent's actions + evaluation_result = await evaluate_agent_actions(runtime, instance) -List all configured MCP servers. + return EvalOutput( + instance_id=instance.instance_id, + instruction=instruction, + test_result=evaluation_result, + metadata=metadata, + history=compatibility_for_eval_history_pairs(state.history), + metrics=state.metrics.get() if state.metrics else None, + error=state.last_error if state and state.last_error else None, + ) + ``` -```bash -openhands mcp list -``` +5. Run the evaluation: + ```python + metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) + output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') + instances = prepare_dataset(your_dataset, output_file, eval_n_limit) -#### mcp get + await run_evaluation( + instances, + metadata, + output_file, + num_workers, + process_instance + ) + ``` -Get details for a specific MCP server. +This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. -```bash -openhands mcp get -``` +Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. -#### mcp remove +By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. -Remove an MCP server configuration. -```bash -openhands mcp remove -``` +## Understanding the `user_response_fn` -#### mcp enable +The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. -Enable an MCP server. -```bash -openhands mcp enable -``` +### Workflow and Interaction -#### mcp disable - -Disable an MCP server. - -```bash -openhands mcp disable -``` - -### login - -Authenticate with OpenHands Cloud. - -```bash -openhands login [OPTIONS] -``` - -| Option | Description | -|--------|-------------| -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | - -**Examples:** -```bash -openhands login -openhands login --server-url https://enterprise.openhands.dev -``` +The correct workflow for handling actions and the `user_response_fn` is as follows: -### logout +1. Agent receives a task and starts processing +2. Agent emits an Action +3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): + - The Runtime processes the Action + - Runtime returns an Observation +4. If the Action is not executable (typically a MessageAction): + - The `user_response_fn` is called + - It returns a simulated user response +5. The agent receives either the Observation or the simulated response +6. Steps 2-5 repeat until the task is completed or max iterations are reached -Log out from OpenHands Cloud. +Here's a more accurate visual representation: -```bash -openhands logout [OPTIONS] ``` - -| Option | Description | -|--------|-------------| -| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | - -**Examples:** -```bash -openhands logout -openhands logout --server-url https://app.all-hands.dev + [Agent] + | + v + [Emit Action] + | + v + [Is Action Executable?] + / \ + Yes No + | | + v v + [Runtime] [user_response_fn] + | | + v v + [Return Observation] [Simulated Response] + \ / + \ / + v v + [Agent receives feedback] + | + v + [Continue or Complete Task] ``` -## Interactive Commands - -Commands available inside the CLI (prefix with `/`): - -| Command | Description | -|---------|-------------| -| `/help` | Display available commands | -| `/new` | Start a new conversation | -| `/history` | Toggle conversation history | -| `/confirm` | Configure confirmation settings | -| `/condense` | Condense conversation history | -| `/skills` | View loaded skills, hooks, and MCPs | -| `/feedback` | Send anonymous feedback about CLI | -| `/exit` | Exit the application | - -## Command Palette - -Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: - -| Option | Description | -|--------|-------------| -| **History** | Toggle conversation history panel | -| **Keys** | Show keyboard shortcuts | -| **MCP** | View MCP server configurations | -| **Maximize** | Maximize/restore window | -| **Plan** | View agent plan | -| **Quit** | Quit the application | -| **Screenshot** | Take a screenshot | -| **Settings** | Configure LLM model, API keys, and other settings | -| **Theme** | Toggle color theme | - -## Changing Your Model - -### Via Settings UI - -1. Press `Ctrl+P` to open the command palette -2. Select **Settings** -3. Choose your LLM provider and model -4. Save changes (no restart required) +In this workflow: -### Via Configuration File +- Executable actions (like running commands or executing code) are handled directly by the Runtime +- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` +- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` -Edit `~/.openhands/agent_settings.json` and change the `model` field: +This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. -```json -{ - "llm": { - "model": "claude-sonnet-4-5-20250929", - "api_key": "...", - "base_url": "..." - } -} -``` +### Example Implementation -### Via Environment Variables +Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: -Temporarily override your model without changing saved configuration: +```python +def codeact_user_response(state: State | None) -> str: + msg = ( + 'Please continue working on the task on whatever approach you think is suitable.\n' + 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' + 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' + ) -```bash -export LLM_MODEL="gpt-4o" -export LLM_API_KEY="your-api-key" -openhands --override-with-envs + if state and state.history: + # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up + user_msgs = [ + event + for event in state.history + if isinstance(event, MessageAction) and event.source == 'user' + ] + if len(user_msgs) >= 2: + # let the agent know that it can give up when it has tried 3 times + return ( + msg + + 'If you want to give up, run: exit .\n' + ) + return msg ``` -Changes made with `--override-with-envs` are not persisted. - -## Environment Variables - -| Variable | Description | -|----------|-------------| -| `LLM_API_KEY` | API key for your LLM provider | -| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | -| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | -| `OPENHANDS_CLOUD_URL` | Default cloud server URL | -| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | - -## Exit Codes - -| Code | Meaning | -|------|---------| -| `0` | Success | -| `1` | Error or task failed | -| `2` | Invalid arguments | +This function does the following: -## Configuration Files +1. Provides a standard message encouraging the agent to continue working +2. Checks how many times the agent has attempted to communicate with the user +3. If the agent has made multiple attempts, it provides an option to give up -| File | Purpose | -|------|---------| -| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | -| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | -| `~/.openhands/mcp.json` | MCP server configurations | -| `~/.openhands/conversations/` | Conversation history | +By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. -## See Also +### WebSocket Connection +Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md -- [Installation](/openhands/usage/cli/installation) - Install the CLI -- [Quick Start](/openhands/usage/cli/quick-start) - Get started -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. +## Overview -# Critic (Experimental) -Source: https://docs.openhands.dev/openhands/usage/cli/critic +OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: - -**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. - +1. Receive real-time events from the agent +2. Send user actions to the agent +3. Maintain a persistent connection for ongoing conversations -## Overview +## Connecting to the WebSocket -If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. +### Connection Parameters -For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). +When connecting to the WebSocket, you need to provide the following query parameters: +- `conversation_id`: The ID of the conversation you want to join +- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) +- `providers_set`: (Optional) A comma-separated list of provider types -## What is the Critic? +### Connection Example -The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: +Here's a basic example of connecting to the WebSocket using JavaScript: -- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success -- **Real-time feedback**: Scores computed during agent execution, not just at completion +```javascript +import { io } from "socket.io-client"; - +const socket = io("http://localhost:3000", { + transports: ["websocket"], + query: { + conversation_id: "your-conversation-id", + latest_event_id: -1, + providers_set: "github,gitlab" // Optional + } +}); -![Critic output in CLI](./screenshots/critic-cli-output.png) +socket.on("connect", () => { + console.log("Connected to OpenHands WebSocket"); +}); -## Pricing +socket.on("oh_event", (event) => { + console.log("Received event:", event); +}); -The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. +socket.on("connect_error", (error) => { + console.error("Connection error:", error); +}); -## Disabling the Critic +socket.on("disconnect", (reason) => { + console.log("Disconnected:", reason); +}); +``` -If you prefer not to use the critic feature, you can disable it in your settings: +## Sending Actions to the Agent -1. Open the command palette with `Ctrl+P` -2. Select **Settings** -3. Navigate to the **CLI Settings** tab -4. Toggle off **Enable Critic (Experimental)** +To send an action to the agent, use the `oh_user_action` event: -![Critic settings in CLI](./screenshots/critic-cli-settings.png) +```javascript +// Send a user message to the agent +socket.emit("oh_user_action", { + type: "message", + source: "user", + message: "Hello, can you help me with my project?" +}); +``` +## Receiving Events from the Agent -# GUI Server -Source: https://docs.openhands.dev/openhands/usage/cli/gui-server +The server emits events using the `oh_event` event type. Here are some common event types you might receive: -## Overview +- User messages (`source: "user", type: "message"`) +- Agent messages (`source: "agent", type: "message"`) +- File edits (`action: "edit"`) +- File writes (`action: "write"`) +- Command executions (`action: "run"`) -The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. +Example event handler: -```bash -openhands serve +```javascript +socket.on("oh_event", (event) => { + if (event.source === "agent" && event.type === "message") { + console.log("Agent says:", event.message); + } else if (event.action === "run") { + console.log("Command executed:", event.args.command); + console.log("Result:", event.result); + } +}); ``` - -This requires Docker to be installed and running on your system. - - -## Prerequisites +## Using Websocat for Testing -- [Docker](https://docs.docker.com/get-docker/) installed and running -- Sufficient disk space for Docker images (~2GB) +[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. -## Basic Usage +### Installation ```bash -# Launch the GUI server -openhands serve +# On macOS +brew install websocat -# The server will be available at http://localhost:3000 +# On Linux +curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat +chmod +x websocat +sudo mv websocat /usr/local/bin/ ``` -The command will: -1. Check Docker requirements -2. Pull the required Docker images -3. Start the OpenHands GUI server -4. Display the URL to access the interface - -## Options - -| Option | Description | -|--------|-------------| -| `--mount-cwd` | Mount the current working directory into the container | -| `--gpu` | Enable GPU support via nvidia-docker | - -## Mounting Your Workspace - -To give OpenHands access to your local files: +### Connecting to the WebSocket ```bash -# Mount current directory -openhands serve --mount-cwd +# Connect to the WebSocket and print all received messages +echo "40{}" | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" ``` -This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. - - -Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. - - -## GPU Support - -For tasks that benefit from GPU acceleration: +### Sending a Message ```bash -openhands serve --gpu +# Send a message to the agent +echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" ``` -This requires: -- NVIDIA GPU -- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed -- Docker configured for GPU support +### Complete Example with Websocat -## Examples +Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: ```bash -# Basic GUI server -openhands serve +# Start a persistent connection +websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -# Mount current project and enable GPU -cd /path/to/your/project -openhands serve --mount-cwd --gpu +# In another terminal, send a message +echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" ``` -## How It Works - -The `openhands serve` command: - -1. **Pulls Docker images**: Downloads the OpenHands runtime and application images -2. **Starts containers**: Runs the OpenHands server in a Docker container -3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` -4. **Shares settings**: Uses your `~/.openhands` directory for configuration +## Event Structure -## Stopping the Server +Events sent and received through the WebSocket follow a specific structure: -Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. +```typescript +interface OpenHandsEvent { + id: string; // Unique event ID + source: string; // "user" or "agent" + timestamp: string; // ISO timestamp + message?: string; // For message events + type?: string; // Event type (e.g., "message") + action?: string; // Action type (e.g., "run", "edit", "write") + args?: any; // Action arguments + result?: any; // Action result +} +``` -## Comparison: GUI Server vs Web Interface +## Best Practices -| Feature | `openhands serve` | `openhands web` | -|---------|-------------------|-----------------| -| Interface | Full web GUI | Terminal UI in browser | -| Dependencies | Docker required | None | -| Resources | Full container (~2GB) | Lightweight | -| Features | All GUI features | CLI features only | -| Best for | Rich GUI experience | Quick terminal access | +1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. +2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. +3. **Error Handling**: Implement proper error handling for connection errors and failed actions. +4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. ## Troubleshooting -### Docker Not Running - -``` -❌ Docker daemon is not running. -Please start Docker and try again. -``` - -**Solution**: Start Docker Desktop or the Docker daemon. +### Connection Issues -### Permission Denied +- Verify that the OpenHands server is running and accessible +- Check that you're providing the correct conversation ID +- Ensure your WebSocket URL is correctly formatted -``` -Got permission denied while trying to connect to the Docker daemon socket -``` +### Authentication Issues -**Solution**: Add your user to the docker group: -```bash -sudo usermod -aG docker $USER -# Then log out and back in -``` +- Make sure you have the necessary authentication cookies if required +- Verify that you have permission to access the specified conversation -### Port Already in Use +### Event Handling Issues -If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. +- Check that you're correctly parsing the event data +- Verify that your event handlers are properly registered -## See Also +### Environment Variables Reference +Source: https://docs.openhands.dev/openhands/usage/environment-variables.md -- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide -- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access -- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details +This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. +## Environment Variable Naming Convention -# Headless Mode -Source: https://docs.openhands.dev/openhands/usage/cli/headless +OpenHands follows a consistent naming pattern for environment variables: -## Overview +- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) +- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) +- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) +- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) +- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) -Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: -- CI/CD pipelines -- Automated scripting -- Integration with other tools -- Batch processing +## Core Configuration Variables -```bash -openhands --headless -t "Your task here" -``` +These variables correspond to the `[core]` section in `config.toml`: -## Requirements +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | +| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | +| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | +| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | +| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | +| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | +| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | +| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | +| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | +| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | +| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | +| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | +| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | +| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | +| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | +| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | +| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | -- Must specify a task with `--task` or `--file` +## LLM Configuration Variables - -**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. - +These variables correspond to the `[llm]` section in `config.toml`: -## Basic Usage +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | +| `LLM_API_KEY` | string | `""` | API key for the LLM provider | +| `LLM_BASE_URL` | string | `""` | Custom API base URL | +| `LLM_API_VERSION` | string | `""` | API version to use | +| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | +| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | +| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | +| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | +| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | +| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | +| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | +| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | +| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | +| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | +| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | +| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | +| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | +| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | +| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | +| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | +| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | +| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | -```bash -# Run a task in headless mode -openhands --headless -t "Write a Python script that prints hello world" +### AWS Configuration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | +| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | +| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | -# Load task from a file -openhands --headless -f task.txt -``` +## Agent Configuration Variables -## JSON Output Mode +These variables correspond to the `[agent]` section in `config.toml`: -The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: - -```bash -openhands --headless --json -t "Create a simple Flask app" -``` +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | +| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | +| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | +| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | +| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | +| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | +| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | +| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | -Each line is a JSON object representing an agent event: +## Sandbox Configuration Variables -```json -{"type": "action", "action": "write", "path": "app.py", ...} -{"type": "observation", "content": "File created successfully", ...} -{"type": "action", "action": "run", "command": "python app.py", ...} -``` +These variables correspond to the `[sandbox]` section in `config.toml`: -### Use Cases for JSON Output +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | +| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | +| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | +| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | +| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | +| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | +| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | +| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | +| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | +| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | +| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | +| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | +| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | +| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | +| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | +| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | +| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | +| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | +| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | +| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | -- **CI/CD pipelines**: Parse events to determine success/failure -- **Automated processing**: Feed output to other tools -- **Logging**: Capture structured logs for analysis -- **Integration**: Connect OpenHands with other systems +### Sandbox Environment Variables +Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: -### Example: Capture Output to File +| Environment Variable | Description | +|---------------------|-------------| +| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | -```bash -openhands --headless --json -t "Add unit tests" > output.jsonl -``` +## Security Configuration Variables -## See Also +These variables correspond to the `[security]` section in `config.toml`: -- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage -- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | +| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | +| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | +## Debug and Logging Variables -# JetBrains IDEs -Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable general debug logging | +| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | +| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | +| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | -[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. +## Runtime-Specific Variables -## Supported IDEs +### Docker Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | -This guide applies to all JetBrains IDEs: +### Remote Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | +| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | -- IntelliJ IDEA -- PyCharm -- WebStorm -- GoLand -- Rider -- CLion -- PhpStorm -- RubyMine -- DataGrip -- And other JetBrains IDEs +### Local Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | +| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | +| `RUNTIME_ID` | string | `""` | Runtime identifier | +| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | -## Prerequisites +## Integration Variables -Before configuring JetBrains IDEs: +### GitHub Integration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **JetBrains IDE version 25.3 or later** -4. **JetBrains AI Assistant enabled** in your IDE +### Third-Party API Keys +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `OPENAI_API_KEY` | string | `""` | OpenAI API key | +| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | +| `GOOGLE_API_KEY` | string | `""` | Google API key | +| `AZURE_API_KEY` | string | `""` | Azure API key | +| `TAVILY_API_KEY` | string | `""` | Tavily search API key | - -JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. - +## Server Configuration Variables -## Configuration +These are primarily used when running OpenHands as a server: -### Step 1: Create the ACP Configuration File +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `FRONTEND_PORT` | integer | `3000` | Frontend server port | +| `BACKEND_PORT` | integer | `8000` | Backend server port | +| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | +| `BACKEND_HOST` | string | `"localhost"` | Backend host address | +| `WEB_HOST` | string | `"localhost"` | Web server host | +| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | -Create or edit the file `$HOME/.jetbrains/acp.json`: +## Deprecated Variables - - - ```bash - mkdir -p ~/.jetbrains - nano ~/.jetbrains/acp.json - ``` - - - Create the file at `C:\Users\\.jetbrains\acp.json` - - +These variables are deprecated and should be replaced: -### Step 2: Add the Configuration +| Environment Variable | Replacement | Description | +|---------------------|-------------|-------------| +| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | -Add the following JSON: +## Usage Examples -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": {} - } - } -} +### Basic Setup with OpenAI +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-openai-api-key" +export DEBUG=true ``` -### Step 3: Use OpenHands in Your IDE - -Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. +### Docker Deployment with Custom Volumes +```bash +export RUNTIME="docker" +export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" +export SANDBOX_TIMEOUT=300 +``` -## Advanced Configuration +### Remote Runtime Configuration +```bash +export RUNTIME="remote" +export SANDBOX_API_KEY="your-remote-api-key" +export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" +``` -### LLM-Approve Mode +### Security-Enhanced Setup +```bash +export SECURITY_CONFIRMATION_MODE=true +export SECURITY_SECURITY_ANALYZER="llm" +export DEBUG_RUNTIME=true +``` -For automatic LLM-based approval: +## Notes -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp", "--llm-approve"], - "env": {} - } - } -} -``` +1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). -### Auto-Approve Mode +2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. -For automatic approval of all actions (use with caution): +3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp", "--always-approve"], - "env": {} - } - } -} -``` +4. **Precedence**: Environment variables take precedence over TOML configuration files. -### Resume a Conversation +5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: + ```bash + docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands + ``` -Resume a specific conversation: +6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. -```json -{ - "agent_servers": { - "OpenHands (Resume)": { - "command": "openhands", - "args": ["acp", "--resume", "abc123def456"], - "env": {} - } - } -} -``` +### Good vs. Bad Instructions +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md -Resume the latest conversation: +The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. -```json -{ - "agent_servers": { - "OpenHands (Latest)": { - "command": "openhands", - "args": ["acp", "--resume", "--last"], - "env": {} - } - } -} -``` +## Concrete Examples of Good/Bad Prompts -### Multiple Configurations +### Bug Fixing Examples -Add multiple configurations for different use cases: +#### Bad Example -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": {} - }, - "OpenHands (Auto-Approve)": { - "command": "openhands", - "args": ["acp", "--always-approve"], - "env": {} - }, - "OpenHands (Resume Latest)": { - "command": "openhands", - "args": ["acp", "--resume", "--last"], - "env": {} - } - } -} +``` +Fix the bug in my code. ``` -### Environment Variables +**Why it's bad:** +- No information about what the bug is +- No indication of where to look +- No description of expected vs. actual behavior +- OpenHands would have to guess what's wrong -Pass environment variables to the agent: +#### Good Example -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": { - "LLM_API_KEY": "your-api-key" - } - } - } -} ``` +Fix the TypeError in src/api/users.py line 45. -## Troubleshooting +Error message: +TypeError: 'NoneType' object has no attribute 'get' -### "Agent not found" or "Command failed" +Expected behavior: The get_user_preferences() function should return +default preferences when the user has no saved preferences. -1. Verify OpenHands CLI is installed: - ```bash - openhands --version - ``` +Actual behavior: It crashes with the error above when user.preferences is None. -2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) +The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. +``` -### "AI Assistant not available" +**Why it works:** +- Specific file and line number +- Exact error message +- Clear expected vs. actual behavior +- Suggested approach for the fix -1. Ensure you have JetBrains IDE version 25.3 or later -2. Enable AI Assistant: `Settings > Plugins > AI Assistant` -3. Restart the IDE after enabling +### Feature Development Examples -### Agent doesn't respond +#### Bad Example -1. Check your LLM settings: - ```bash - openhands - # Use /settings to configure - ``` +``` +Add user authentication to my app. +``` -2. Test ACP mode in terminal: - ```bash - openhands acp - # Should start without errors - ``` +**Why it's bad:** +- Scope is too large and undefined +- No details about authentication requirements +- No mention of existing code or patterns +- Could mean many different things -### Configuration not applied +#### Good Example -1. Verify the config file location: `~/.jetbrains/acp.json` -2. Validate JSON syntax (no trailing commas, proper quotes) -3. Restart your JetBrains IDE +``` +Add email/password login to our Express.js API. -### Finding Your Conversation ID +Requirements: +1. POST /api/auth/login endpoint +2. Accept email and password in request body +3. Validate against users in PostgreSQL database +4. Return JWT token on success, 401 on failure +5. Use bcrypt for password comparison (already in dependencies) -To resume conversations, first find the ID: +Follow the existing patterns in src/api/routes.js for route structure. +Use the existing db.query() helper in src/db/index.js for database access. -```bash -openhands --resume +Success criteria: I can call the endpoint with valid credentials +and receive a JWT token that works with our existing auth middleware. ``` -This displays recent conversations with their IDs: +**Why it works:** +- Specific, scoped feature +- Clear technical requirements +- Points to existing patterns to follow +- Defines what "done" looks like + +### Code Review Examples + +#### Bad Example ``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py --------------------------------------------------------------------------------- +Review my code. ``` -## See Also - -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +**Why it's bad:** +- No code provided or referenced +- No indication of what to look for +- No context about the code's purpose +- No criteria for the review +#### Good Example -# IDE Integration Overview -Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview +``` +Review this pull request for our payment processing module: - -IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). - +Focus areas: +1. Security - we're handling credit card data +2. Error handling - payments must never silently fail +3. Idempotency - duplicate requests should be safe - -**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. - +Context: +- This integrates with Stripe API +- It's called from our checkout flow +- We have ~10,000 transactions/day -## What is the Agent Client Protocol (ACP)? +Please flag any issues as Critical/Major/Minor with explanations. +``` -The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. +**Why it works:** +- Clear scope and focus areas +- Important context provided +- Business implications explained +- Requested output format specified -## Supported IDEs +### Refactoring Examples -| IDE | Support Level | Setup Guide | -|-----|---------------|-------------| -| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | -| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | -| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | -| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | +#### Bad Example -## Prerequisites +``` +Make the code better. +``` -Before using OpenHands with any IDE, you must: +**Why it's bad:** +- "Better" is subjective and undefined +- No specific problems identified +- No goals for the refactoring +- No constraints or requirements -1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) +#### Good Example -2. **Configure your LLM settings** using the `/settings` command: - ```bash - openhands - # Then use /settings to configure - ``` +``` +Refactor the UserService class in src/services/user.js: -The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. +Problems to address: +1. The class is 500+ lines - split into smaller, focused services +2. Database queries are mixed with business logic - separate them +3. There's code duplication in the validation methods -## How It Works +Constraints: +- Keep the public API unchanged (other code depends on it) +- Maintain test coverage (run npm test after changes) +- Follow our existing service patterns in src/services/ -```mermaid -graph LR - IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] - CLI -->|API Calls| LLM[LLM Provider] - CLI -->|Commands| Runtime[Sandbox Runtime] +Goal: Improve maintainability while keeping the same functionality. ``` -1. Your IDE launches `openhands acp` as a subprocess -2. Communication happens via JSON-RPC 2.0 over stdio -3. OpenHands uses your configured LLM and runtime settings -4. Results are displayed in your IDE's interface +**Why it works:** +- Specific problems identified +- Clear constraints and requirements +- Points to patterns to follow +- Measurable success criteria -## The ACP Command +## Key Principles for Effective Instructions -The `openhands acp` command starts OpenHands as an ACP server: +### Be Specific -```bash -# Basic ACP server -openhands acp +Vague instructions produce vague results. Be concrete about: -# With LLM-based approval -openhands acp --llm-approve +| Instead of... | Say... | +|---------------|--------| +| "Fix the error" | "Fix the TypeError on line 45 of api.py" | +| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | +| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | +| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | -# Resume a conversation -openhands acp --resume +### Provide Context -# Resume the latest conversation -openhands acp --resume --last +Help OpenHands understand the bigger picture: + +``` +Context to include: +- What does this code do? (purpose) +- Who uses it? (users/systems) +- Why does this matter? (business impact) +- What constraints exist? (performance, compatibility) +- What patterns should be followed? (existing conventions) ``` -### ACP Options +**Example with context:** -| Option | Description | -|--------|-------------| -| `--resume [ID]` | Resume a conversation by ID | -| `--last` | Resume the most recent conversation | -| `--always-approve` | Auto-approve all actions | -| `--llm-approve` | Use LLM-based security analyzer | -| `--streaming` | Enable token-by-token streaming | +``` +Add rate limiting to our public API endpoints. -## Confirmation Modes +Context: +- This is a REST API serving mobile apps and third-party integrations +- We've been seeing abuse from web scrapers hitting us 1000+ times/minute +- Our infrastructure can handle 100 req/sec per client sustainably +- We use Redis (already available in the project) +- Our API follows the controller pattern in src/controllers/ -OpenHands ACP supports three confirmation modes to control how agent actions are approved: +Requirement: Limit each API key to 100 requests per minute with +appropriate 429 responses and Retry-After headers. +``` -### Always Ask (Default) +### Set Clear Goals -The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. +Define what success looks like: -```bash -openhands acp # defaults to always-ask mode +``` +Success criteria checklist: +✓ What specific outcome do you want? +✓ How will you verify it worked? +✓ What tests should pass? +✓ What should the user experience be? ``` -### Always Approve +**Example with clear goals:** -The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. +``` +Implement password reset functionality. -```bash -openhands acp --always-approve +Success criteria: +1. User can request reset via POST /api/auth/forgot-password +2. System sends email with secure reset link +3. Link expires after 1 hour +4. User can set new password via POST /api/auth/reset-password +5. Old sessions are invalidated after password change +6. All edge cases return appropriate error messages +7. Existing tests still pass, new tests cover the feature ``` -### LLM-Based Approval +### Include Constraints -The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. +Specify what you can't or won't change: -```bash -openhands acp --llm-approve +``` +Constraints to specify: +- API compatibility (can't break existing clients) +- Technology restrictions (must use existing stack) +- Performance requirements (must respond in <100ms) +- Security requirements (must not log PII) +- Time/scope limits (just this one file) ``` -### Changing Modes During a Session +## Common Pitfalls to Avoid -You can change the confirmation mode during an active session using slash commands: +### Vague Requirements -| Command | Description | -|---------|-------------| -| `/confirm always-ask` | Switch to always-ask mode | -| `/confirm always-approve` | Switch to always-approve mode | -| `/confirm llm-approve` | Switch to LLM-based approval mode | -| `/help` | Show all available slash commands | + + + ``` + Make the dashboard faster. + ``` + + + ``` + The dashboard takes 5 seconds to load. + + Profile it and optimize to load in under 1 second. + + Likely issues: + - N+1 queries in getWidgetData() + - Uncompressed images + - Missing database indexes + + Focus on the biggest wins first. + ``` + + - -The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. - +### Missing Context -## Choosing an IDE + + + ``` + Add caching to the API. + ``` + + + ``` + Add caching to the product catalog API. + + Context: + - 95% of requests are for the same 1000 products + - Product data changes only via admin panel (rare) + - We already have Redis running for sessions + - Current response time is 200ms, target is <50ms + + Cache strategy: Cache product data in Redis with 5-minute TTL, + invalidate on product update. + ``` + + - - - High-performance editor with native ACP support. Best for speed and simplicity. - - - Universal terminal interface. Works with any terminal, consistent experience. - - - Popular editor with community extension. Great for VS Code users. - - - IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. - - +### Unrealistic Expectations -## Resuming Conversations in IDEs + + + ``` + Rewrite our entire backend from PHP to Go. + ``` + + + ``` + Create a Go microservice for the image processing currently in + src/php/ImageProcessor.php. + + This is the first step in our gradual migration. + The Go service should: + 1. Expose the same API endpoints + 2. Be deployable alongside the existing PHP app + 3. Include a feature flag to route traffic + + Start with just the resize and crop functions. + ``` + + -You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: +### Incomplete Information -```bash -openhands --resume -``` + + + ``` + The login is broken, fix it. + ``` + + + ``` + Users can't log in since yesterday's deployment. + + Symptoms: + - Login form submits but returns 500 error + - Server logs show: "Redis connection refused" + - Redis was moved to a new host yesterday + + The issue is likely in src/config/redis.js which may + have the old host hardcoded. + + Expected: Login should work with the new Redis at redis.internal:6380 + ``` + + -This shows your recent conversations: +## Best Practices -``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py +### Structure Your Instructions + +Use clear structure for complex requests: - 2. xyz789ghi012 (yesterday) - Add unit tests for the user service --------------------------------------------------------------------------------- ``` +## Task +[One sentence describing what you want] -Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. +## Background +[Context and why this matters] -## See Also +## Requirements +1. [Specific requirement] +2. [Specific requirement] +3. [Specific requirement] -- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal -- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide +## Constraints +- [What you can't change] +- [What must be preserved] +## Success Criteria +- [How to verify it works] +``` -# Toad Terminal -Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad +### Provide Examples -[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). +Show what you want through examples: -The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. +``` +Add input validation to the user registration endpoint. -![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) +Example of what validation errors should look like: -## Why Toad? +{ + "error": "validation_failed", + "details": [ + {"field": "email", "message": "Invalid email format"}, + {"field": "password", "message": "Must be at least 8 characters"} + ] +} -Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: +Validate: +- email: valid format, not already registered +- password: min 8 chars, at least 1 number +- username: 3-20 chars, alphanumeric only +``` -- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything -- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs -- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP +### Define Success Criteria -OpenHands is included as a recommended agent in Toad's agent store. +Be explicit about what "done" means: -## Prerequisites +``` +This task is complete when: +1. All existing tests pass (npm test) +2. New tests cover the added functionality +3. The feature works as described in the acceptance criteria +4. Code follows our style guide (npm run lint passes) +5. Documentation is updated if needed +``` -Before using Toad with OpenHands: +### Iterate and Refine -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` +Build on previous work: -## Installation +``` +In our last session, you added the login endpoint. -Install Toad using [uv](https://docs.astral.sh/uv/): +Now add the logout functionality: +1. POST /api/auth/logout endpoint +2. Invalidate the current session token +3. Clear any server-side session data +4. Follow the same patterns used in login -```bash -uvx batrachian-toad +The login implementation is in src/api/auth/login.js for reference. ``` -For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). +## Quick Reference -## Setup +| Element | Bad | Good | +|---------|-----|------| +| Location | "in the code" | "in src/api/users.py line 45" | +| Problem | "it's broken" | "TypeError when user.preferences is None" | +| Scope | "add authentication" | "add JWT-based login endpoint" | +| Behavior | "make it work" | "return 200 with user data on success" | +| Patterns | (none) | "follow patterns in src/services/" | +| Success | (none) | "all tests pass, endpoint returns correct data" | -### Using the Agent Store + +The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. + -The easiest way to set up OpenHands with Toad: +### OpenHands in Your SDLC +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md -1. Launch Toad: `uvx batrachian-toad` -2. Open Toad's agent store -3. Find **OpenHands** in the list of recommended agents -4. Click **Install** to set up OpenHands -5. Select OpenHands and start a conversation +OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. -The install process runs: -```bash -uv tool install openhands --python 3.12 && openhands login -``` +## Integration with Development Workflows -### Manual Configuration +### Planning Phase -You can also launch Toad directly with OpenHands: +Use OpenHands during planning to accelerate technical decisions: -```bash -toad acp "openhands acp" +**Technical specification assistance:** ``` +Create a technical specification for adding search functionality: -## Usage - -### Basic Usage +Requirements from product: +- Full-text search across products and articles +- Filter by category, price range, and date +- Sub-200ms response time at 1000 QPS -```bash -# Launch Toad with OpenHands -toad acp "openhands acp" +Provide: +1. Architecture options (Elasticsearch vs. PostgreSQL full-text) +2. Data model changes needed +3. API endpoint designs +4. Estimated implementation effort +5. Risks and mitigations ``` -### With Command Line Arguments - -Pass OpenHands CLI flags through Toad: +**Sprint planning support:** +``` +Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: -```bash -# Use LLM-based approval mode -toad acp "openhands acp --llm-approve" +Story 1: As a user, I can reset my password via email +Story 2: As an admin, I can view user activity logs -# Auto-approve all actions -toad acp "openhands acp --always-approve" +For each story, create: +- Technical subtasks +- Estimated effort (hours) +- Dependencies on other work +- Testing requirements ``` -### Resume a Conversation - -Resume a specific conversation by ID: +### Development Phase -```bash -toad acp "openhands acp --resume abc123def456" -``` +OpenHands excels during active development: -Resume the most recent conversation: +**Feature implementation:** +- Write new features with clear specifications +- Follow existing code patterns automatically +- Generate tests alongside code +- Create documentation as you go -```bash -toad acp "openhands acp --resume --last" -``` +**Bug fixing:** +- Analyze error logs and stack traces +- Identify root causes +- Implement fixes with regression tests +- Document the issue and solution - -Find your conversation IDs by running `openhands --resume` in a regular terminal. - +**Code improvement:** +- Refactor for clarity and maintainability +- Optimize performance bottlenecks +- Update deprecated APIs +- Improve error handling -## Advanced Configuration +### Testing Phase -### Combined Options +Automate test creation and improvement: -```bash -# Resume with LLM approval -toad acp "openhands acp --resume --last --llm-approve" ``` +Add comprehensive tests for the UserService module: -### Environment Variables +Current coverage: 45% +Target coverage: 85% -Pass environment variables to OpenHands: +1. Analyze uncovered code paths using the codecov module +2. Write unit tests for edge cases +3. Add integration tests for API endpoints +4. Create test data factories +5. Document test scenarios -```bash -LLM_API_KEY=your-key toad acp "openhands acp" +Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). ``` -## Troubleshooting +### Review Phase -### "openhands" command not found +Accelerate code reviews: -Ensure OpenHands is installed: -```bash -uv tool install openhands --python 3.12 ``` +Review this PR for our coding standards: -Verify it's in your PATH: -```bash -which openhands +Check for: +1. Security issues (SQL injection, XSS, etc.) +2. Performance concerns +3. Test coverage adequacy +4. Documentation completeness +5. Adherence to our style guide + +Provide actionable feedback with severity ratings. ``` -### Agent doesn't respond +### Deployment Phase -1. Check your LLM settings: `openhands` then `/settings` -2. Verify your API key is valid -3. Check network connectivity to your LLM provider +Assist with deployment preparation: -### Conversation not persisting +``` +Prepare for production deployment: -Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. +1. Review all changes since last release +2. Check for breaking API changes +3. Verify database migrations are reversible +4. Update the changelog +5. Create release notes +6. Identify rollback steps if needed +``` -## See Also +## CI/CD Integration -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. +### GitHub Actions Integration -# VS Code -Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode +The Software Agent SDK provides composite GitHub Actions for common workflows: -[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. +- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments +- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK - -VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. - +For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. -## Prerequisites +### What You Can Automate -Before configuring VS Code: +Using the SDK, you can create GitHub Actions workflows to: -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) +1. **Automatic code review** when a PR is opened +2. **Automatically update docs** weekly when new functionality is added +3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements +4. **Manage TODO comments** and track technical debt +5. **Assign reviewers** based on code ownership patterns -## Installation +### Getting Started -### Step 1: Install the Extension +To integrate OpenHands into your CI/CD: -1. Open VS Code -2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) -3. Search for **"VSCode ACP"** -4. Click **Install** +1. Review the [SDK Getting Started guide](/sdk/getting-started) +2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) +3. Set up your `LLM_API_KEY` as a repository secret +4. Use the provided composite actions or build custom workflows -Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). +See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. -### Step 2: Connect to OpenHands +## Team Workflows -1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) -2. Click **Connect** to start a session -3. Select **OpenHands** from the agent dropdown -4. Start chatting with OpenHands! +### Solo Developer Workflows -## How It Works +For individual developers: -The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. +**Daily workflow:** +1. **Morning review**: Have OpenHands analyze overnight CI results +2. **Feature development**: Use OpenHands for implementation +3. **Pre-commit**: Request review before pushing +4. **Documentation**: Generate/update docs for changes -The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. +**Best practices:** +- Set up automated reviews on all PRs +- Use OpenHands for boilerplate and repetitive tasks +- Keep AGENTS.md updated with project patterns -## Verification +### Small Team Workflows -Ensure OpenHands is discoverable: +For teams of 2-10 developers: -```bash -which openhands -# Should return a path like /Users/you/.local/bin/openhands +**Collaborative workflow:** ``` - -If the command is not found, install OpenHands CLI: -```bash -uv tool install openhands --python 3.12 +Team Member A: Creates feature branch, writes initial implementation +OpenHands: Reviews code, suggests improvements +Team Member B: Reviews OpenHands suggestions, approves or modifies +OpenHands: Updates documentation, adds missing tests +Team: Merges after final human review ``` -## Advanced Usage - -### Custom Arguments +**Communication integration:** +- Slack notifications for OpenHands findings +- Automatic issue creation for bugs found +- Weekly summary reports -The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. +### Enterprise Team Workflows -### Resume Conversations +For larger organizations: -To resume a conversation, you may need to: - -1. Find your conversation ID: `openhands --resume` -2. Configure the extension to use custom arguments (if supported) -3. Or use the terminal directly: `openhands acp --resume ` +**Governance and oversight:** +- Configure approval requirements for OpenHands changes +- Set up audit logging for all AI-assisted changes +- Define scope limits for automated actions +- Establish human review requirements - -The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. - +**Scale patterns:** +``` +Central Platform Team: +├── Defines OpenHands policies +├── Manages integrations +└── Monitors usage and quality -## Troubleshooting +Feature Teams: +├── Use OpenHands within policies +├── Customize for team needs +└── Report issues to platform team +``` -### OpenHands Not Appearing in Dropdown +## Best Practices -1. Verify OpenHands is installed and in PATH: - ```bash - which openhands - openhands --version - ``` +### Code Review Integration -2. Restart VS Code after installing OpenHands +Set up effective automated reviews: -3. Check if the extension recognizes agents: - - Look for any error messages in the extension panel - - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) +```yaml +# .openhands/review-config.yml +review: + focus_areas: + - security + - performance + - test_coverage + - documentation + + severity_levels: + block_merge: + - critical + - security + require_response: + - major + informational: + - minor + - suggestion + + ignore_patterns: + - "*.generated.*" + - "vendor/*" +``` -### Connection Failed +### Pull Request Automation -1. Ensure your LLM settings are configured: - ```bash - openhands - # Use /settings to configure - ``` +Automate common PR tasks: -2. Check that `openhands acp` works in terminal: - ```bash - openhands acp - # Should start without errors (Ctrl+C to exit) - ``` +| Trigger | Action | +|---------|--------| +| PR opened | Auto-review, label by type | +| Tests fail | Analyze failures, suggest fixes | +| Coverage drops | Identify missing tests | +| PR approved | Update changelog, check docs | -### Extension Not Working +### Quality Gates -1. Update to the latest version of the extension -2. Check for VS Code updates -3. Report issues on the [extension's GitHub](https://github.com/omercnet) +Define automated quality gates: -## Limitations +```yaml +quality_gates: + - name: test_coverage + threshold: 80% + action: block_merge + + - name: security_issues + threshold: 0 critical + action: block_merge + + - name: code_review_score + threshold: 7/10 + action: require_review + + - name: documentation + requirement: all_public_apis + action: warn +``` -Since this is a community extension: +### Automated Testing -- Feature availability may vary -- Support depends on the extension maintainer -- Not all OpenHands CLI flags may be accessible through the UI +Integrate OpenHands with your testing strategy: -For the most control over OpenHands, consider using: -- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage -- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support +**Test generation triggers:** +- New code without tests +- Coverage below threshold +- Bug fix without regression test +- API changes without contract tests -## See Also +**Example workflow:** +```yaml +on: + push: + branches: [main] -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal +jobs: + ensure-coverage: + steps: + - name: Check coverage + run: | + COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') + if [ "$COVERAGE" -lt "80" ]; then + openhands generate-tests --target 80 + fi +``` +## Common Integration Patterns -# Zed IDE -Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed +### Pre-Commit Hooks -[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. +Run OpenHands checks before commits: - +```bash +# .git/hooks/pre-commit +#!/bin/bash -## Prerequisites +# Quick code review +openhands review --quick --staged-only -Before configuring Zed, ensure you have: +if [ $? -ne 0 ]; then + echo "OpenHands found issues. Review and fix before committing." + exit 1 +fi +``` -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **Zed editor** - Download from [zed.dev](https://zed.dev/) +### Post-Commit Actions -## Configuration +Automate tasks after commits: -### Step 1: Open Agent Settings +```yaml +# .github/workflows/post-commit.yml +on: + push: + branches: [main] -1. Open Zed -2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette -3. Search for `agent: open settings` +jobs: + update-docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Update API docs + run: openhands update-docs --api + - name: Commit changes + run: | + git add docs/ + git commit -m "docs: auto-update API documentation" || true + git push +``` -![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) +### Scheduled Tasks -### Step 2: Add OpenHands as an Agent +Run regular maintenance: -1. On the right side, click `+ Add Agent` -2. Select `Add Custom Agent` +```yaml +# Weekly dependency check +on: + schedule: + - cron: '0 9 * * 1' # Monday 9am -![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) +jobs: + dependency-review: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check dependencies + run: | + openhands check-dependencies --security --outdated + - name: Create issues + run: openhands create-issues --from-report deps.json +``` -### Step 3: Configure the Agent +### Event-Triggered Workflows -Add the following configuration to the `agent_servers` field: +You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. -```json -{ - "agent_servers": { - "OpenHands": { - "command": "uvx", - "args": [ - "openhands", - "acp" - ], - "env": {} - } - } -} -``` +For more event-driven automation patterns, see: +- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events +- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage -### Step 4: Save and Use +### When to Use OpenHands +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md -1. Save the settings file -2. You can now use OpenHands within Zed! +OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. -![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) +## Task Complexity Guidance -## Advanced Configuration +### Simple Tasks -### LLM-Approve Mode +**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. -For automatic LLM-based approval of actions: +- Adding a new function or method +- Writing unit tests for existing code +- Fixing simple bugs with clear error messages +- Code formatting and style fixes +- Adding documentation or comments +- Simple refactoring (rename, extract method) +- Configuration changes -```json -{ - "agent_servers": { - "OpenHands (LLM Approve)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--llm-approve" - ], - "env": {} - } - } -} +**Example prompt:** +``` +Add a calculateDiscount() function to src/utils/pricing.js that takes +a price and discount percentage, returns the discounted price. +Add unit tests. ``` -### Resume a Specific Conversation +### Medium Complexity Tasks -To resume a previous conversation: +**Good for OpenHands** — These tasks may need more context and possibly some iteration. -```json -{ - "agent_servers": { - "OpenHands (Resume)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--resume", - "abc123def456" - ], - "env": {} - } - } -} +- Implementing a new API endpoint +- Adding a feature to an existing module +- Debugging issues that span multiple files +- Migrating code to a new pattern +- Writing integration tests +- Performance optimization with clear metrics +- Setting up CI/CD workflows + +**Example prompt:** +``` +Add a user profile endpoint to our API: +- GET /api/users/:id/profile +- Return user data with their recent activity +- Follow patterns in existing controllers +- Add integration tests +- Handle not-found and unauthorized cases ``` -Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. +### Complex Tasks -### Resume Latest Conversation +**May require iteration** — These benefit from breaking down into smaller pieces. -```json -{ - "agent_servers": { - "OpenHands (Latest)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--resume", - "--last" - ], - "env": {} - } - } -} +- Large refactoring across many files +- Architectural changes +- Implementing complex business logic +- Multi-service integrations +- Performance optimization without clear cause +- Security audits +- Framework or major dependency upgrades + +**Recommended approach:** ``` +Break large tasks into phases: -### Multiple Configurations +Phase 1: "Analyze the current authentication system and document +all touch points that need to change for OAuth2 migration." -You can add multiple OpenHands configurations for different use cases: +Phase 2: "Implement the OAuth2 provider configuration and basic +token flow, keeping existing auth working in parallel." -```json -{ - "agent_servers": { - "OpenHands": { - "command": "uvx", - "args": ["openhands", "acp"], - "env": {} - }, - "OpenHands (Auto-Approve)": { - "command": "uvx", - "args": ["openhands", "acp", "--always-approve"], - "env": {} - }, - "OpenHands (Resume Latest)": { - "command": "uvx", - "args": ["openhands", "acp", "--resume", "--last"], - "env": {} - } - } -} +Phase 3: "Migrate the user login flow to use OAuth2, maintaining +backwards compatibility." ``` -## Troubleshooting - -### Accessing Debug Logs - -If you encounter issues: +## Best Use Cases -1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) -2. Type and select `acp debug log` -3. Review the logs for errors or warnings -4. Restart the conversation to reload connections after configuration changes +### Ideal Scenarios -### Common Issues +OpenHands is **most effective** when: -**"openhands" command not found** +| Scenario | Why It Works | +|----------|--------------| +| Clear requirements | OpenHands can work independently | +| Well-defined scope | Less ambiguity, fewer iterations | +| Existing patterns to follow | Consistency with codebase | +| Good test coverage | Easy to verify changes | +| Isolated changes | Lower risk of side effects | -Ensure OpenHands is installed and in your PATH: -```bash -which openhands -# Should return a path like /Users/you/.local/bin/openhands -``` +**Perfect use cases:** -If using `uvx`, ensure uv is installed: -```bash -uv --version -``` +- **Bug fixes with reproduction steps**: Clear problem, measurable solution +- **Test additions**: Existing code provides the specification +- **Documentation**: Code is the source of truth +- **Boilerplate generation**: Follows established patterns +- **Code review and analysis**: Read-only, analytical tasks -**Agent doesn't start** +### Good Fit Scenarios -1. Check that your LLM settings are configured: run `openhands` and verify `/settings` -2. Verify the configuration JSON syntax is valid -3. Check the ACP debug logs for detailed errors +OpenHands works **well with some guidance** for: -**Conversation doesn't persist** +- **Feature implementation**: When requirements are documented +- **Refactoring**: When goals and constraints are clear +- **Debugging**: When you can provide logs and context +- **Code modernization**: When patterns are established +- **API development**: When specs exist -Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. +**Tips for these scenarios:** - -After making configuration changes, restart the conversation in Zed to apply them. - +1. Provide clear acceptance criteria +2. Point to examples of similar work in the codebase +3. Specify constraints and non-goals +4. Be ready to iterate and clarify -## See Also +### Poor Fit Scenarios -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +**Consider alternatives** when: +| Scenario | Challenge | Alternative | +|----------|-----------|-------------| +| Vague requirements | Unclear what "done" means | Define requirements first | +| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | +| Highly sensitive code | Risk tolerance is zero | Human review essential | +| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | +| Visual design | Subjective aesthetic judgments | Use design tools | -# Installation -Source: https://docs.openhands.dev/openhands/usage/cli/installation +**Red flags that a task may not be suitable:** - -**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. - +- "Make it look better" (subjective) +- "Figure out what's wrong" (too vague) +- "Rewrite everything" (too large) +- "Do what makes sense" (unclear requirements) +- Changes to production infrastructure without review -## Installation Methods +## Limitations - - - Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. +### Current Limitations - **Install OpenHands:** - ```bash - uv tool install openhands --python 3.12 - ``` +Be aware of these constraints: - **Run OpenHands:** - ```bash - openhands - ``` +- **Long-running processes**: Sessions have time limits +- **Interactive debugging**: Can't set breakpoints interactively +- **Visual verification**: Can't see rendered UI easily +- **External system access**: May need credentials configured +- **Large codebase analysis**: Memory and time constraints - **Upgrade OpenHands:** - ```bash - uv tool upgrade openhands --python 3.12 - ``` - - - Install the OpenHands CLI binary with the install script: +### Technical Constraints - ```bash - curl -fsSL https://install.openhands.dev/install.sh | sh - ``` +| Constraint | Impact | Workaround | +|------------|--------|------------| +| Session duration | Very long tasks may timeout | Break into smaller tasks | +| Context window | Can't see entire large codebase at once | Focus on relevant files | +| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | +| Network access | Some external services may be blocked | Use local resources when possible | - Then run: - ```bash - openhands - ``` +### Scope Boundaries - - Your system may require you to allow permissions to run the executable. +OpenHands works within your codebase but has boundaries: - - When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple - cannot check it for malicious software." +**Can do:** +- Read and write files in the repository +- Run tests and commands +- Access configured services and APIs +- Browse documentation and reference material - 1. Open `System Settings`. - 2. Go to `Privacy & Security`. - 3. Scroll down to `Security` and click `Allow Anyway`. - 4. Rerun the OpenHands CLI. +**Cannot do:** +- Access your local environment outside the sandbox +- Make decisions requiring business context it doesn't have +- Replace human judgment for critical decisions +- Guarantee production-safe changes without review - ![mac-security](/openhands/static/img/cli-security-mac.png) +## Pre-Task Checklist - - - - - 1. Set the following environment variable in your terminal: - - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) +### Prerequisites - 2. Ensure you have configured your settings before starting: - - Set up `~/.openhands/settings.json` with your LLM configuration +Before starting a task, ensure: - 3. Run the following command: +- [ ] Clear description of what you want +- [ ] Expected outcome is defined +- [ ] Relevant files are identified +- [ ] Dependencies are available +- [ ] Tests can be run - ```bash - docker run -it \ - --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e SANDBOX_USER_ID=$(id -u) \ - -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/root/.openhands \ - --add-host host.docker.internal:host-gateway \ - --name openhands-cli-$(date +%Y%m%d%H%M%S) \ - python:3.12-slim \ - bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" - ``` +### Environment Setup - The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's - permissions. This prevents the agent from creating root-owned files in the mounted workspace. - - +Prepare your repository: -## First Run +```markdown +## AGENTS.md Checklist -The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved -for future sessions in `~/.openhands/settings.json`. +- [ ] Build commands documented +- [ ] Test commands documented +- [ ] Code style guidelines noted +- [ ] Architecture overview included +- [ ] Common patterns described +``` -The conversation history will be saved in `~/.openhands/conversations`. +See [Repository Setup](/openhands/usage/customization/repository) for details. - -If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the -configuration format has changed. - +### Repository Preparation -## Next Steps +Optimize for success: -- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +1. **Clean state**: Commit or stash uncommitted changes +2. **Working build**: Ensure the project builds +3. **Passing tests**: Start from a green state +4. **Updated dependencies**: Resolve any dependency issues +5. **Clear documentation**: Update AGENTS.md if needed +## Post-Task Review -# MCP Servers -Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers +### Quality Checks -## Overview +After OpenHands completes a task: -[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. +- [ ] Review all changed files +- [ ] Understand each change made +- [ ] Check for unintended modifications +- [ ] Verify code style consistency +- [ ] Look for hardcoded values or credentials -The CLI provides two ways to manage MCP servers: -1. **CLI commands** (`openhands mcp`) - Manage servers from the command line -2. **Interactive command** (`/mcp`) - View server status within a conversation +### Validation Steps - -If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. - +1. **Run tests**: `npm test`, `pytest`, etc. +2. **Check linting**: Ensure style compliance +3. **Build the project**: Verify it still compiles +4. **Manual testing**: Test the feature yourself +5. **Edge cases**: Try unusual inputs -## MCP Commands +### Learning from Results -### List Servers +After each significant task: -View all configured MCP servers: +**What went well?** +- Note effective prompt patterns +- Document successful approaches +- Update AGENTS.md with learnings -```bash -openhands mcp list +**What could improve?** +- Identify unclear instructions +- Note missing context +- Plan better for next time + +**Update your repository:** +```markdown +## Things OpenHands Should Know (add to AGENTS.md) + +- When adding API endpoints, always add to routes/index.js +- Our date format is ISO 8601 everywhere +- All database queries go through the repository pattern ``` -### Get Server Details +## Decision Framework -View details for a specific server: +Use this framework to decide if a task is right for OpenHands: -```bash -openhands mcp get ``` +Is the task well-defined? +├── No → Define it better first +└── Yes → Continue -### Remove a Server +Do you have clear success criteria? +├── No → Define acceptance criteria +└── Yes → Continue -Remove a server configuration: +Is the scope manageable (< 100 LOC)? +├── No → Break into smaller tasks +└── Yes → Continue -```bash -openhands mcp remove +Do examples exist in the codebase? +├── No → Provide examples or patterns +└── Yes → Continue + +Can you verify the result? +├── No → Add tests or verification steps +└── Yes → ✅ Good candidate for OpenHands ``` -### Enable/Disable Servers +OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! -Control which servers are active: +But it can be particularly useful for certain types of tasks. For instance: -```bash -# Enable a server -openhands mcp enable +- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. +- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. +- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. +- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. +- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. -# Disable a server -openhands mcp disable -``` +There are also some tasks where agent struggle a little more. -## Adding Servers +- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. +- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. -### HTTP/SSE Servers +### Tutorial Library +Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials.md -Add remote servers with HTTP or SSE transport: +Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. -```bash -openhands mcp add --transport http -``` +## Categories Overview -#### With Bearer Token Authentication +| Category | Best For | Complexity | +|----------|----------|------------| +| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | +| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | +| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | +| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | +| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | +| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | -```bash -openhands mcp add my-api --transport http \ - --header "Authorization: Bearer your-token" \ - https://api.example.com/mcp -``` + +For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. + -#### With API Key Authentication +## Task Complexity Guidance -```bash -openhands mcp add weather-api --transport http \ - --header "X-API-Key: your-api-key" \ - https://weather.api.com -``` +Before starting, assess your task's complexity: -#### With Multiple Headers +**Simple tasks** (5-15 minutes): +- Single file changes +- Clear, well-defined requirements +- Existing patterns to follow -```bash -openhands mcp add secure-api --transport http \ - --header "Authorization: Bearer token123" \ - --header "X-Client-ID: client456" \ - https://api.example.com -``` +**Medium tasks** (15-45 minutes): +- Multiple file changes +- Some discovery required +- Integration with existing code -#### With OAuth Authentication +**Complex tasks** (45+ minutes): +- Architectural changes +- Multiple components +- Requires iteration -```bash -openhands mcp add notion-server --transport http \ - --auth oauth \ - https://mcp.notion.com/mcp -``` + +Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. + -### Stdio Servers +## Best Use Cases -Add local servers that communicate via stdio: +OpenHands excels at: -```bash -openhands mcp add --transport stdio -- [args...] -``` +- **Repetitive tasks**: Boilerplate code, test generation +- **Pattern application**: Following established conventions +- **Analysis**: Code review, debugging, documentation +- **Exploration**: Understanding new codebases -#### Basic Example +## Example Tutorials by Category -```bash -openhands mcp add local-server --transport stdio \ - python -- -m my_mcp_server -``` +### Testing -#### With Environment Variables +#### Tutorial: Add Unit Tests for a Module -```bash -openhands mcp add local-server --transport stdio \ - --env "API_KEY=secret123" \ - --env "DATABASE_URL=postgresql://localhost/mydb" \ - python -- -m my_mcp_server --config config.json +**Goal**: Achieve 80%+ test coverage for a service module + +**Prompt**: ``` +Add unit tests for the UserService class in src/services/user.js. -#### Add in Disabled State +Current coverage: 35% +Target coverage: 80% -```bash -openhands mcp add my-server --transport stdio --disabled \ - node -- my-server.js +Requirements: +1. Test all public methods +2. Cover edge cases (null inputs, empty arrays, etc.) +3. Mock external dependencies (database, API calls) +4. Follow our existing test patterns in tests/services/ +5. Use Jest as the testing framework + +Focus on these methods: +- createUser() +- updateUser() +- deleteUser() +- getUserById() ``` -### Command Reference +**What OpenHands does**: +1. Analyzes the UserService class +2. Identifies untested code paths +3. Creates test file with comprehensive tests +4. Mocks dependencies appropriately +5. Runs tests to verify they pass -```bash -openhands mcp add --transport [options] [-- args...] -``` +**Tips**: +- Provide existing test files as examples +- Specify the testing framework +- Mention any mocking conventions -| Option | Description | -|--------|-------------| -| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | -| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | -| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | -| `--auth` | Authentication method (e.g., `oauth`) | -| `--enabled` | Enable immediately (default) | -| `--disabled` | Add in disabled state | +--- -## Example: Web Search with Tavily +#### Tutorial: Add Integration Tests for an API -Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): +**Goal**: Test API endpoints end-to-end -```bash -openhands mcp add tavily --transport stdio \ - npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" +**Prompt**: ``` +Add integration tests for the /api/products endpoints. -## Manual Configuration +Endpoints to test: +- GET /api/products (list all) +- GET /api/products/:id (get one) +- POST /api/products (create) +- PUT /api/products/:id (update) +- DELETE /api/products/:id (delete) -You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. +Requirements: +1. Use our test database (configured in jest.config.js) +2. Set up and tear down test data properly +3. Test success cases and error cases +4. Verify response bodies and status codes +5. Follow patterns in tests/integration/ +``` -### Configuration Format +--- -The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): +### Data Analysis -```json -{ - "mcpServers": { - "server-name": { - "command": "command-to-run", - "args": ["arg1", "arg2"], - "env": { - "ENV_VAR": "value" - } - } - } -} -``` +#### Tutorial: Create a Data Processing Script -### Example Configuration +**Goal**: Process CSV data and generate a report -```json -{ - "mcpServers": { - "tavily-remote": { - "command": "npx", - "args": [ - "-y", - "mcp-remote", - "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" - ] - }, - "local-tools": { - "command": "python", - "args": ["-m", "my_mcp_tools"], - "env": { - "DEBUG": "true" - } - } - } -} +**Prompt**: ``` +Create a Python script to analyze our sales data. -## Interactive `/mcp` Command +Input: sales_data.csv with columns: date, product, quantity, price, region -Within an OpenHands conversation, use `/mcp` to view server status: +Requirements: +1. Load and validate the CSV data +2. Calculate: + - Total revenue by product + - Monthly sales trends + - Top 5 products by quantity + - Revenue by region +3. Generate a summary report (Markdown format) +4. Create visualizations (bar chart for top products, line chart for trends) +5. Save results to reports/ directory -- **View active servers**: Shows which MCP servers are currently active in the conversation -- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts +Use pandas for data processing and matplotlib for charts. +``` - -The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. - +**What OpenHands does**: +1. Creates a Python script with proper structure +2. Implements data loading with validation +3. Calculates requested metrics +4. Generates formatted report +5. Creates and saves visualizations -## Workflow +--- -1. **Add servers** using `openhands mcp add` -2. **Start a conversation** with `openhands` -3. **Check status** with `/mcp` inside the conversation -4. **Use the tools** provided by your MCP servers +#### Tutorial: Database Query Analysis -The agent will automatically have access to tools provided by enabled MCP servers. +**Goal**: Analyze and optimize slow database queries -## Troubleshooting +**Prompt**: +``` +Analyze our slow query log and identify optimization opportunities. -### Server Not Appearing +File: logs/slow_queries.log -1. Verify the server is enabled: - ```bash - openhands mcp list - ``` +For each slow query: +1. Explain why it's slow +2. Suggest index additions if helpful +3. Rewrite the query if it can be optimized +4. Estimate the improvement -2. Check the configuration: - ```bash - openhands mcp get - ``` +Create a report in reports/query_optimization.md with: +- Summary of findings +- Prioritized recommendations +- SQL for suggested changes +``` -3. Restart the conversation to load new configurations +--- -### Server Fails to Start +### Web Scraping -1. Test the command manually: - ```bash - # For stdio servers - python -m my_mcp_server - - # For HTTP servers, check the URL is reachable - curl https://api.example.com/mcp - ``` +#### Tutorial: Build a Web Scraper -2. Check environment variables and credentials +**Goal**: Extract product data from a website -3. Review error messages in the CLI output +**Prompt**: +``` +Create a web scraper to extract product information from our competitor's site. -### Configuration File Location +Target URL: https://example-store.com/products -The MCP configuration is stored at: -- **Config file**: `~/.openhands/mcp.json` +Extract for each product: +- Name +- Price +- Description +- Image URL +- SKU (if available) -## See Also +Requirements: +1. Use Python with BeautifulSoup or Scrapy +2. Handle pagination (site has 50 pages) +3. Respect rate limits (1 request/second) +4. Save results to products.json +5. Handle errors gracefully +6. Log progress to console -- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation -- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration -- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference +Include a README with usage instructions. +``` +**Tips**: +- Specify rate limiting requirements +- Mention error handling expectations +- Request logging for debugging -# Quick Start -Source: https://docs.openhands.dev/openhands/usage/cli/quick-start +--- + +### Code Review -**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. +For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). -## Overview - -The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: +#### Tutorial: Security-Focused Code Review -| Mode | Command | Best For | -|------|---------|----------| -| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | -| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | -| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | -| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | -| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | +**Goal**: Identify security vulnerabilities in a PR - +**Prompt**: +``` +Review this pull request for security issues: -## Your First Conversation +Focus areas: +1. Input validation - check all user inputs are sanitized +2. Authentication - verify auth checks are in place +3. SQL injection - check for parameterized queries +4. XSS - verify output encoding +5. Sensitive data - ensure no secrets in code -**Set up your account** (first time only): +For each issue found, provide: +- File and line number +- Severity (Critical/High/Medium/Low) +- Description of the vulnerability +- Suggested fix with code example - - - ```bash - openhands login - ``` - This authenticates with OpenHands Cloud and fetches your settings. - - - The CLI will prompt you to configure your LLM provider and API key on first run. - - +Output format: Markdown suitable for PR comments +``` -1. **Start the CLI:** - ```bash - openhands - ``` +--- -2. **Enter a task:** - ``` - Create a Python script that prints "Hello, World!" - ``` +#### Tutorial: Performance Review -3. **Watch OpenHands work:** - The agent will create the file and show you the results. +**Goal**: Identify performance issues in code -## Controls +**Prompt**: +``` +Review the OrderService class for performance issues. -Once inside the CLI, use these controls: +File: src/services/order.js -| Control | Description | -|---------|-------------| -| `Ctrl+P` | Open command palette (access Settings, MCP status) | -| `Esc` | Pause the running agent | -| `Ctrl+Q` or `/exit` | Exit the CLI | +Check for: +1. N+1 database queries +2. Missing indexes (based on query patterns) +3. Inefficient loops or algorithms +4. Missing caching opportunities +5. Unnecessary data fetching -## Starting with a Task +For each issue: +- Explain the impact +- Show the problematic code +- Provide an optimized version +- Estimate the improvement +``` -You can start the CLI with an initial task: +--- -```bash -# Start with a task -openhands -t "Fix the bug in auth.py" +### Bug Fixing -# Start with a task from a file -openhands -f task.txt -``` + +For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. + -## Resuming Conversations +#### Tutorial: Fix a Crash Bug -Resume a previous conversation: +**Goal**: Diagnose and fix an application crash -```bash -# List recent conversations and select one -openhands --resume +**Prompt**: +``` +Fix the crash in the checkout process. -# Resume the most recent conversation -openhands --resume --last +Error: +TypeError: Cannot read property 'price' of undefined + at calculateTotal (src/checkout/calculator.js:45) + at processOrder (src/checkout/processor.js:23) -# Resume a specific conversation by ID -openhands --resume abc123def456 -``` +Steps to reproduce: +1. Add item to cart +2. Apply discount code "SAVE20" +3. Click checkout +4. Crash occurs -For more details, see [Resume Conversations](/openhands/usage/cli/resume). +The bug was introduced in commit abc123 (yesterday's deployment). -## Next Steps +Requirements: +1. Identify the root cause +2. Fix the bug +3. Add a regression test +4. Verify the fix doesn't break other functionality +``` - - - Learn about the interactive terminal interface - - - Use OpenHands in Zed, VS Code, or JetBrains - - - Automate tasks with scripting - - - Add tools via Model Context Protocol - - +**What OpenHands does**: +1. Analyzes the stack trace +2. Reviews recent changes +3. Identifies the null reference issue +4. Implements a defensive fix +5. Creates test to prevent regression +--- -# Resume Conversations -Source: https://docs.openhands.dev/openhands/usage/cli/resume +#### Tutorial: Fix a Memory Leak -## Overview +**Goal**: Identify and fix a memory leak -OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. +**Prompt**: +``` +Investigate and fix the memory leak in our Node.js application. -## Listing Previous Conversations +Symptoms: +- Memory usage grows 100MB/hour +- After 24 hours, app becomes unresponsive +- Restarting temporarily fixes the issue -To see a list of your recent conversations, run: +Suspected areas: +- Event listeners in src/events/ +- Cache implementation in src/cache/ +- WebSocket connections in src/ws/ -```bash -openhands --resume +Analyze these areas and: +1. Identify the leak source +2. Explain why it's leaking +3. Implement a fix +4. Add monitoring to detect future leaks ``` -This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: +--- -``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py +### Feature Development - 2. xyz789ghi012 (yesterday) - Add unit tests for the user service +#### Tutorial: Add a REST API Endpoint - 3. mno345pqr678 (3 days ago) - Refactor the database connection module --------------------------------------------------------------------------------- -To resume a conversation, use: openhands --resume +**Goal**: Create a new API endpoint with full functionality + +**Prompt**: ``` +Add a user preferences API endpoint. -## Resuming a Specific Conversation +Endpoint: /api/users/:id/preferences -To resume a specific conversation, use the `--resume` flag with the conversation ID: +Operations: +- GET: Retrieve user preferences +- PUT: Update user preferences +- PATCH: Partially update preferences -```bash -openhands --resume +Preferences schema: +{ + theme: "light" | "dark", + notifications: { email: boolean, push: boolean }, + language: string, + timezone: string +} + +Requirements: +1. Follow patterns in src/api/routes/ +2. Add request validation with Joi +3. Use UserPreferencesService for business logic +4. Add appropriate error handling +5. Document the endpoint in OpenAPI format +6. Add unit and integration tests ``` -For example: +**What OpenHands does**: +1. Creates route handler following existing patterns +2. Implements validation middleware +3. Creates or updates the service layer +4. Adds error handling +5. Generates API documentation +6. Creates comprehensive tests -```bash -openhands --resume abc123def456 -``` +--- -## Resuming the Latest Conversation +#### Tutorial: Implement a Feature Flag System -To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: +**Goal**: Add feature flags to the application -```bash -openhands --resume --last +**Prompt**: ``` +Implement a feature flag system for our application. -This automatically finds and resumes the most recent conversation. +Requirements: +1. Create a FeatureFlags service +2. Support these flag types: + - Boolean (on/off) + - Percentage (gradual rollout) + - User-based (specific user IDs) +3. Load flags from environment variables initially +4. Add a React hook: useFeatureFlag(flagName) +5. Add middleware for API routes -## How It Works +Initial flags to configure: +- new_checkout: boolean, default false +- dark_mode: percentage, default 10% +- beta_features: user-based -When you resume a conversation: +Include documentation and tests. +``` -1. OpenHands loads the full conversation history from disk -2. The agent has access to all previous context, including: - - Your previous messages and requests - - The agent's responses and actions - - Any files that were created or modified -3. You can continue the conversation as if you never left +--- - -The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. - +## Contributing Tutorials -## Resuming in Different Modes +Have a great use case? Share it with the community! -### Terminal Mode +**What makes a good tutorial:** +- Solves a common problem +- Has clear, reproducible steps +- Includes example prompts +- Explains expected outcomes +- Provides tips for success -```bash -openhands --resume abc123def456 -openhands --resume --last -``` +**How to contribute:** +1. Create a detailed example following this format +2. Test it with OpenHands to verify it works +3. Submit via GitHub pull request to the docs repository +4. Include any prerequisites or setup required -### ACP Mode (IDEs) + +These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. + -```bash -openhands acp --resume abc123def456 -openhands acp --resume --last -``` +### Key Features +Source: https://docs.openhands.dev/openhands/usage/key-features.md -For IDE-specific configurations, see: -- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) -- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) -- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) + + + - Displays the conversation between the user and OpenHands. + - OpenHands explains its actions in this panel. -### With Confirmation Modes + ![overview](/openhands/static/img/chat-panel.png) + + + - Shows the file changes performed by OpenHands. -Combine `--resume` with confirmation mode flags: + ![overview](/openhands/static/img/changes-tab.png) + + + - Embedded VS Code for browsing and modifying files. + - Can also be used to upload and download files. -```bash -# Resume with LLM-based approval -openhands --resume abc123def456 --llm-approve + ![overview](/openhands/static/img/vs-tab.png) + + + - A space for OpenHands and users to run terminal commands. -# Resume with auto-approve -openhands --resume --last --always-approve -``` + ![overview](/openhands/static/img/terminal-tab.png) + + + - Displays the web server when OpenHands runs an application. + - Users can interact with the running application. -## Tips + ![overview](/openhands/static/img/app-tab.png) + + + - Used by OpenHands to browse websites. + - The browser is non-interactive. - -**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. - + ![overview](/openhands/static/img/browser-tab.png) + + - -**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. - +### Azure +Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms.md -## Storage Location +## Azure OpenAI Configuration -Conversations are stored in: +When running OpenHands, you'll need to set the following environment variable using `-e` in the +docker run command: ``` -~/.openhands/conversations/ -├── abc123def456/ -│ └── conversation.json -├── xyz789ghi012/ -│ └── conversation.json -└── ... +LLM_API_VERSION="" # e.g. "2023-05-15" ``` -## See Also +Example: +```bash +docker run -it --pull=always \ + -e LLM_API_VERSION="2023-05-15" + ... +``` -- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage -- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs -- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference +Then in the OpenHands UI Settings under the `LLM` tab: + +You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as +<deployment-name> below. + -# Terminal (CLI) -Source: https://docs.openhands.dev/openhands/usage/cli/terminal +1. Enable `Advanced` options. +2. Set the following: + - `Custom Model` to azure/<deployment-name> + - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) + - `API Key` to your Azure API key -## Overview +### Azure OpenAI Configuration -The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. +When running OpenHands, set the following environment variable using `-e` in the +docker run command: -```bash -openhands +``` +LLM_API_VERSION="" # e.g. "2024-02-15-preview" ``` -## Features +### Custom LLM Configurations +Source: https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md -- **Real-time interaction**: Type natural language tasks and receive instant feedback -- **Live status monitoring**: Watch the agent's progress as it works -- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more +## How It Works -## Command Palette +Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: -Press `Ctrl+P` to open the command palette, then select from the dropdown options: +```toml +# Default LLM configuration +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 -| Option | Description | -|--------|-------------| -| **Settings** | Open the settings configuration menu | -| **MCP** | View MCP server status | +# Custom LLM configuration for a cheaper model +[llm.gpt3] +model = "gpt-3.5-turbo" +api_key = "your-api-key" +temperature = 0.2 -## Controls +# Another custom configuration with different parameters +[llm.high-creativity] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.8 +top_p = 0.9 +``` -| Control | Action | -|---------|--------| -| `Ctrl+P` | Open command palette | -| `Esc` | Pause the running agent | -| `Ctrl+Q` or `/exit` | Exit the CLI | +Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. -## Starting with a Task +## Using Custom Configurations -Start a conversation with an initial task: +### With Agents -```bash -# Provide a task directly -openhands -t "Create a REST API for user management" +You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: -# Load task from a file -openhands -f requirements.txt +```toml +[agent.RepoExplorerAgent] +# Use the cheaper GPT-3 configuration for this agent +llm_config = 'gpt3' + +[agent.CodeWriterAgent] +# Use the high creativity configuration for this agent +llm_config = 'high-creativity' ``` -## Confirmation Modes +### Configuration Options -Control how the agent requests approval for actions: +Each named LLM configuration supports all the same options as the default LLM configuration. These include: -```bash -# Default: Always ask for confirmation -openhands +- Model selection (`model`) +- API configuration (`api_key`, `base_url`, etc.) +- Model parameters (`temperature`, `top_p`, etc.) +- Retry settings (`num_retries`, `retry_multiplier`, etc.) +- Token limits (`max_input_tokens`, `max_output_tokens`) +- And all other LLM configuration options -# Auto-approve all actions (use with caution) -openhands --always-approve +For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. -# Use LLM-based security analyzer -openhands --llm-approve -``` +## Use Cases -## Resuming Conversations +Custom LLM configurations are particularly useful in several scenarios: -Resume previous conversations: +- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. +- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. +- **Different Providers**: Use different LLM providers or API endpoints for different tasks. +- **Testing and Development**: Easily switch between different model configurations during development and testing. -```bash -# List recent conversations -openhands --resume +## Example: Cost Optimization -# Resume the most recent -openhands --resume --last +A practical example of using custom LLM configurations to optimize costs: -# Resume a specific conversation -openhands --resume abc123def456 -``` +```toml +# Default configuration using GPT-4 for high-quality responses +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 -For more details, see [Resume Conversations](/openhands/usage/cli/resume). +# Cheaper configuration for repository exploration +[llm.repo-explorer] +model = "gpt-3.5-turbo" +temperature = 0.2 -## Tips +# Configuration for code generation +[llm.code-gen] +model = "gpt-4" +temperature = 0.0 +max_output_tokens = 2000 - -Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. - +[agent.RepoExplorerAgent] +llm_config = 'repo-explorer' - -Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. - +[agent.CodeWriterAgent] +llm_config = 'code-gen' +``` -## See Also +In this example: +- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code +- Code generation uses GPT-4 with a higher token limit for generating larger code blocks +- The default configuration remains available for other tasks -- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers -- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation +# Custom Configurations with Reserved Names +OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. -# Web Interface -Source: https://docs.openhands.dev/openhands/usage/cli/web-interface +## Draft Editor Configuration -## Overview +The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. -The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: -- Access the CLI remotely -- Share your terminal session -- Use the CLI on devices without a full terminal +For example, you can define in `config.toml` a draft editor like this: -```bash -openhands web +```toml +[llm.draft_editor] +model = "gpt-4" +temperature = 0.2 +top_p = 0.95 +presence_penalty = 0.0 +frequency_penalty = 0.0 ``` +This configuration: +- Uses GPT-4 for high-quality edits and suggestions +- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility +- Uses a high top_p value (0.95) to consider a wide range of token options +- Disables presence and frequency penalties to maintain focus on the specific edits needed + +Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: +- Review and suggest code improvements +- Refine existing content while maintaining its core meaning +- Make precise, focused changes to code or text + -This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. +Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. -## Basic Usage +### Google Gemini/Vertex +Source: https://docs.openhands.dev/openhands/usage/llms/google-llms.md -```bash -# Start on default port (12000) -openhands web +## Gemini - Google AI Studio Configs -# Access at http://localhost:12000 -``` +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Gemini` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). +- `API Key` to your Gemini API key -## Options +## VertexAI - Google Cloud Platform Configs -| Option | Default | Description | -|--------|---------|-------------| -| `--host` | `0.0.0.0` | Host address to bind to | -| `--port` | `12000` | Port number to use | -| `--debug` | `false` | Enable debug mode | +To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment +variables using `-e` in the docker run command: -## Examples +``` +GOOGLE_APPLICATION_CREDENTIALS="" +VERTEXAI_PROJECT="" +VERTEXAI_LOCATION="" +``` -```bash -# Custom port -openhands web --port 8080 +Then set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `VertexAI` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. vertex_ai/<model-name>). -# Bind to localhost only (more secure) -openhands web --host 127.0.0.1 +### Groq +Source: https://docs.openhands.dev/openhands/usage/llms/groq.md -# Enable debug mode -openhands web --debug +## Configuration -# Full example with custom host and port -openhands web --host 0.0.0.0 --port 3000 -``` +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Groq` +- `LLM Model` to the model you will be using. [Visit here to see the list of +models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, +enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). +- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). -## Remote Access +## Using Groq as an OpenAI-Compatible Endpoint -To access the web interface from another machine: +The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you +would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) + - `Base URL` to `https://api.groq.com/openai/v1` + - `API Key` to your Groq API key -1. Start with `--host 0.0.0.0` to bind to all interfaces: - ```bash - openhands web --host 0.0.0.0 --port 12000 - ``` +### LiteLLM Proxy +Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md -2. Access from another machine using the host's IP: - ``` - http://:12000 - ``` +## Configuration - -When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. - +To use LiteLLM proxy with OpenHands, you need to: -## Use Cases +1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) +2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: + * Enable `Advanced` options + * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) + * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) + * `API Key` to your LiteLLM proxy API key -### Development on Remote Servers +## Supported Models -Access OpenHands on a remote development server through your local browser: +The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy +is configured to handle. -```bash -# On remote server -openhands web --host 0.0.0.0 --port 12000 +Refer to your LiteLLM proxy configuration for the list of available models and their names. -# On local machine, use SSH tunnel -ssh -L 12000:localhost:12000 user@remote-server +### Overview +Source: https://docs.openhands.dev/openhands/usage/llms/llms.md -# Access at http://localhost:12000 -``` + +This section is for users who want to connect OpenHands to different LLMs. + -### Sharing Sessions + +OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this +page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation +for the canonical list of supported parameters. + -Run the web interface on a shared server for team access: +## Model Recommendations -```bash -openhands web --host 0.0.0.0 --port 8080 -``` +Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some +recommendations for model selection. Our latest benchmarking results can be found in +[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). -## Comparison: Web Interface vs GUI Server +Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: -| Feature | `openhands web` | `openhands serve` | -|---------|-----------------|-------------------| -| Interface | Terminal UI in browser | Full web GUI | -| Dependencies | None | Docker required | -| Resources | Lightweight | Full container | -| Best for | Quick access | Rich GUI experience | +### Cloud / API-Based Models -## See Also +- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) +- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) +- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) +- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) +- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) +- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) -- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage -- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker -- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options +If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process +to help others using the same provider! +For a full list of the providers and models available, please consult the +[litellm documentation](https://docs.litellm.ai/docs/providers). -# Bitbucket Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation + +OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending +limits and monitor usage. + -## Prerequisites +### Local / Self-Hosted Models -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). +- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) +- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) -## Adding Bitbucket Repository Access +### Known Issues -Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. + +Most current local and open source models are not as powerful. When using such models, you may see long +wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the +models driving it. However, if you do find ones that work, please add them to the verified list above. + -## Working With Bitbucket Repos in Openhands Cloud +## LLM Configuration -After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and -branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! +The following can be set in the OpenHands UI through the Settings. Each option is serialized into the +`LLM.load_from_env()` schema before being passed to the Agent SDK: -![Connect Repo](/openhands/static/img/connect-repo.png) +- `LLM Provider` +- `LLM Model` +- `API Key` +- `Base URL` (through `Advanced` settings) -## IP Whitelisting +There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them +as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: -If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow -OpenHands to access your repositories: +- `LLM_API_VERSION` +- `LLM_EMBEDDING_MODEL` +- `LLM_EMBEDDING_DEPLOYMENT_NAME` +- `LLM_DROP_PARAMS` +- `LLM_DISABLE_VISION` +- `LLM_CACHING_PROMPT` -### Core App IP -``` -34.68.58.200 -``` +## LLM Provider Guides -### Runtime IPs -``` -34.10.175.217 -34.136.162.246 -34.45.0.142 -34.28.69.126 -35.224.240.213 -34.70.174.52 -34.42.4.87 -35.222.133.153 -34.29.175.97 -34.60.55.59 -``` +We have a few guides for running OpenHands with specific model providers: -## Next Steps +- [Azure](/openhands/usage/llms/azure-llms) +- [Google](/openhands/usage/llms/google-llms) +- [Groq](/openhands/usage/llms/groq) +- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) +- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) +- [Moonshot AI](/openhands/usage/llms/moonshot) +- [OpenAI](/openhands/usage/llms/openai-llms) +- [OpenHands](/openhands/usage/llms/openhands-llms) +- [OpenRouter](/openhands/usage/llms/openrouter) -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +These pages remain the authoritative provider references for both the Agent SDK +and the OpenHands interfaces. +## Model Customization -# Cloud API -Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api +LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: -For the available API endpoints, refer to the -[OpenHands API Reference](https://docs.openhands.dev/api-reference). +- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. +- **Native Tool Calling**: Toggle native function/tool calling capabilities. -## Obtaining an API Key +For detailed information about model customization, see +[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). -To use the OpenHands Cloud API, you'll need to generate an API key: +### API retries and rate limits -1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. -2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. -3. Click `Create API Key`. -4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. -5. Copy the generated API key and store it securely. It will only be shown once. +LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically +retry requests if it receives a Rate Limit Error (429 error code). -## API Usage Example (V1) +You can customize these options as you need for the provider you're using. Check their documentation, and set the +following environment variables to control the number of retries and the time between retries: -### Starting a New Conversation +- `LLM_NUM_RETRIES` (Default of 4 times) +- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) +- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) +- `LLM_RETRY_MULTIPLIER` (Default of 2) -To start a new conversation with OpenHands to perform a task, -make a POST request to the V1 app-conversations endpoint. +If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: - - - ```bash - curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_message": { - "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] - }, - "selected_repository": "yourusername/your-repo" - }' - ``` - - - ```python - import requests +```toml +[llm] +num_retries = 4 +retry_min_wait = 5 +retry_max_wait = 30 +retry_multiplier = 2 +``` - api_key = "YOUR_API_KEY" - url = "https://app.all-hands.dev/api/v1/app-conversations" +### Local LLMs +Source: https://docs.openhands.dev/openhands/usage/llms/local-llms.md - headers = { - "Authorization": f"Bearer {api_key}", - "Content-Type": "application/json" - } +## News - data = { - "initial_message": { - "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] - }, - "selected_repository": "yourusername/your-repo" - } +- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! - response = requests.post(url, headers=headers, json=data) - result = response.json() +## Quickstart: Running OpenHands with a Local LLM using LM Studio - # The response contains a start task with the conversation ID - conversation_id = result.get("app_conversation_id") or result.get("id") - print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") - print(f"Status: {result['status']}") - ``` - - - ```typescript - const apiKey = "YOUR_API_KEY"; - const url = "https://app.all-hands.dev/api/v1/app-conversations"; +This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. - const headers = { - "Authorization": `Bearer ${apiKey}`, - "Content-Type": "application/json" - }; +We recommend: +- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. +- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. - const data = { - initial_message: { - content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] - }, - selected_repository: "yourusername/your-repo" - }; +### Hardware Requirements - async function startConversation() { - try { - const response = await fetch(url, { - method: "POST", - headers: headers, - body: JSON.stringify(data) - }); +Running Qwen3-Coder-30B-A3B-Instruct requires: +- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or +- A Mac with Apple Silicon with at least 32GB of RAM - const result = await response.json(); +### 1. Install LM Studio - // The response contains a start task with the conversation ID - const conversationId = result.app_conversation_id || result.id; - console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); - console.log(`Status: ${result.status}`); +Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). - return result; - } catch (error) { - console.error("Error starting conversation:", error); - } - } +### 2. Download the Model - startConversation(); - ``` - - +1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. +2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. -#### Response +![image](./screenshots/01_lm_studio_open_model_hub.png) -The API will return a JSON object with details about the conversation start task: +3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. -```json -{ - "id": "550e8400-e29b-41d4-a716-446655440000", - "status": "WORKING", - "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", - "sandbox_id": "sandbox-abc123", - "created_at": "2025-01-15T10:30:00Z" -} -``` +![image](./screenshots/02_lm_studio_download_devstral.png) -The `status` field indicates the current state of the conversation startup process: -- `WORKING` - Initial processing -- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready -- `PREPARING_REPOSITORY` - Cloning and setting up the repository -- `READY` - Conversation is ready to use -- `ERROR` - An error occurred during startup +4. Wait for the download to finish. -You may receive an authentication error if: +### 3. Load the Model -- You provided an invalid API key. -- You provided the wrong repository name. -- You don't have access to the repository. +1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. +2. Click the "Select a model to load" dropdown at the top of the application window. -### Streaming Conversation Start (Optional) +![image](./screenshots/03_lm_studio_open_load_model.png) -For real-time updates during conversation startup, you can use the streaming endpoint: +3. Enable the "Manually choose model load parameters" switch. +4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. -```bash -curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_message": { - "content": [{"type": "text", "text": "Your task description here"}] - }, - "selected_repository": "yourusername/your-repo" - }' -``` +![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) -#### Streaming Response +5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. +6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. +7. Click "Load Model" to start loading the model. -The endpoint streams a JSON array incrementally. Each element represents a status update: +![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) -```json -[ - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} -] -``` +### 4. Start the LLM server -Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. +1. Enable the switch next to "Status" at the top-left of the Window. +2. Take note of the Model API Identifier shown on the sidebar on the right. -## Rate Limits +![image](./screenshots/06_lm_studio_start_server.png) -If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. -If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). +### 5. Start OpenHands ---- +1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: -## Migrating from V0 to V1 API +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` - - The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. - Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. - +2. Wait until the server is running (see log below): +``` +Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f +Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 +Starting OpenHands... +Running OpenHands as root +14:22:13 - openhands:INFO: server_config.py:50 - Using config class None +INFO: Started server process [8] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) +``` -### Key Differences +3. Visit `http://localhost:3000` in your browser. -| Feature | V0 API | V1 API | -|---------|--------|--------| -| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | -| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | -| Repository field | `repository` | `selected_repository` | -| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | +### 6. Configure OpenHands to use the LLM server -### Migration Steps +Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. -1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` +When started for the first time, OpenHands will prompt you to set up the LLM provider. -2. **Update the request body**: - - Change `repository` to `selected_repository` - - Change `initial_user_msg` (string) to `initial_message` (object with content array): - ```json - // V0 format - { "initial_user_msg": "Your message here" } +1. Click "see advanced settings" to open the LLM Settings page. - // V1 format - { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } - ``` +![image](./screenshots/07_openhands_open_advanced_settings.png) -3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. +2. Enable the "Advanced" switch at the top of the page to show all the available settings. ---- +3. Set the following values: + - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") + - **Base URL**: `http://host.docker.internal:1234/v1` + - **API Key**: `local-llm` -## Legacy API (V0) - Deprecated +4. Click "Save Settings" to save the configuration. - - The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. - New integrations should use the V1 API documented above. - +![image](./screenshots/08_openhands_configure_local_llm_parameters.png) -### Starting a New Conversation (V0) +That's it! You can now start using OpenHands with the local LLM server. - - - ```bash - curl -X POST "https://app.all-hands.dev/api/conversations" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - "repository": "yourusername/your-repo" - }' - ``` - - - ```python - import requests +If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). - api_key = "YOUR_API_KEY" - url = "https://app.all-hands.dev/api/conversations" +## Advanced: Alternative LLM Backends - headers = { - "Authorization": f"Bearer {api_key}", - "Content-Type": "application/json" - } +This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. - data = { - "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - "repository": "yourusername/your-repo" - } +### Create an OpenAI-Compatible Endpoint with Ollama - response = requests.post(url, headers=headers, json=data) - conversation = response.json() +- Install Ollama following [the official documentation](https://ollama.com/download). +- Example launch command for Qwen3-Coder-30B-A3B-Instruct: - print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") - print(f"Status: {conversation['status']}") - ``` - - - ```typescript - const apiKey = "YOUR_API_KEY"; - const url = "https://app.all-hands.dev/api/conversations"; +```bash +# ⚠️ WARNING: OpenHands requires a large context size to work properly. +# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. +# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. +OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & +ollama pull qwen3-coder:30b +``` - const headers = { - "Authorization": `Bearer ${apiKey}`, - "Content-Type": "application/json" - }; +### Create an OpenAI-Compatible Endpoint with vLLM or SGLang - const data = { - initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - repository: "yourusername/your-repo" - }; +First, download the model checkpoint: - async function startConversation() { - try { - const response = await fetch(url, { - method: "POST", - headers: headers, - body: JSON.stringify(data) - }); +```bash +huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct +``` - const conversation = await response.json(); +#### Serving the model using SGLang - console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); - console.log(`Status: ${conversation.status}`); +- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). +- Example launch command (with at least 2 GPUs): - return conversation; - } catch (error) { - console.error("Error starting conversation:", error); - } - } +```bash +SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ + --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --port 8000 \ + --tp 2 --dp 1 \ + --host 0.0.0.0 \ + --api-key mykey --context-length 131072 +``` - startConversation(); - ``` - - +#### Serving the model using vLLM -#### Response (V0) +- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). +- Example launch command (with at least 2 GPUs): -```json -{ - "status": "ok", - "conversation_id": "abc1234" -} +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --enable-prefix-caching ``` +If you are interested in further improved inference speed, you can also try Snowflake's version +of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), +which can achieve up to 2x speedup in some cases. -# Cloud UI -Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui - -## Landing Page - -The landing page is where you can: +1. Install the Arctic Inference library that automatically patches vLLM: -- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), - [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or - [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. -- Launch an empty conversation using `New Conversation`. -- See `Suggested Tasks` for repositories that OpenHands has access to. -- See your `Recent Conversations`. +```bash +pip install git+https://github.com/snowflakedb/ArcticInference.git +``` -## Settings +2. Run the launch command with speculative decoding enabled: -Settings are divided across tabs, with each tab focusing on a specific area of configuration. +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --speculative-config '{"method": "suffix"}' +``` -- `User` - - Change your email address. -- `Integrations` - - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. - - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). -- `Application` - - Set your preferred language, notifications and other preferences. - - Toggle task suggestions on GitHub. - - Toggle Solvability Analysis. - - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). - - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). -- `LLM` - - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). -- `Billing` - - Add credits for using the OpenHands provider. -- `Secrets` - - [Manage secrets](/openhands/usage/settings/secrets-settings). -- `API Keys` - - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). -- `MCP` - - [Setup an MCP server](/openhands/usage/settings/mcp-settings) +### Run OpenHands (Alternative Backends) -## Key Features +#### Using Docker -For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) -section of the documentation. +Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). -## Next Steps +#### Using Development Mode -- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). -- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. +Start OpenHands using `make run`. -# GitHub Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation +### Configure OpenHands (Alternative Backends) -## Prerequisites +Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). +1. Click **"see advanced settings"** to access the full configuration panel. +2. Enable the **Advanced** toggle at the top of the page. +3. Set the following parameters, if you followed the examples above: + - **Custom Model**: `openai/` + - For **Ollama**: `openai/qwen3-coder:30b` + - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` + - **Base URL**: `http://host.docker.internal:/v1` + Use port `11434` for Ollama, or `8000` for SGLang and vLLM. + - **API Key**: + - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) + - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) -## Adding GitHub Repository Access +### Moonshot AI +Source: https://docs.openhands.dev/openhands/usage/llms/moonshot.md -You can grant OpenHands access to specific GitHub repositories: +## Using Moonshot AI with OpenHands -1. Click on `+ Add GitHub Repos` in the repository selection dropdown. -2. Select your organization and choose the specific repositories to grant OpenHands access to. - - - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: - - Actions: Read and write - - Commit statuses: Read and write - - Contents: Read and write - - Issues: Read and write - - Metadata: Read-only - - Pull requests: Read and write - - Webhooks: Read and write - - Workflows: Read and write - - Repository access for a user is granted based on: - - Permission granted for the repository - - User's GitHub permissions (owner/collaborator) - +[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. -3. Click `Install & Authorize`. +### Setup -## Modifying Repository Access +1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) +2. Generate an API key from your account settings +3. Configure OpenHands to use Moonshot AI: -You can modify GitHub repository access at any time by: -- Selecting `+ Add GitHub Repos` in the repository selection dropdown or -- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` +| Setting | Value | +| --- | --- | +| LLM Provider | `moonshot` | +| LLM Model | `kimi-k2-0711-preview` | +| API Key | Your Moonshot API key | -## Working With GitHub Repos in Openhands Cloud +### Recommended Models -Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the -`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click -on `Launch` to start the conversation! +- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. -![Connect Repo](/openhands/static/img/connect-repo.png) +### OpenAI +Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms.md -## Working on GitHub Issues and Pull Requests Using Openhands +## Configuration -To allow OpenHands to work directly from GitHub directly, you must -[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is -given, you can use OpenHands by labeling the issue or by tagging `@openhands`. +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenAI` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). +* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). -### Working with Issues +## Using OpenAI-Compatible Endpoints -On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: -1. Comment on the issue to let you know it is working on it. - - You can click on the link to track the progress on OpenHands Cloud. -2. Open a pull request if it determines that the issue has been successfully resolved. -3. Comment on the issue with a summary of the performed tasks and a link to the PR. +Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). -### Working with Pull Requests +## Using an OpenAI Proxy -To get OpenHands to work on pull requests, mention `@openhands` in the comments to: -- Ask questions -- Request updates -- Get code explanations +If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) + - `Base URL` to the URL of your OpenAI proxy + - `API Key` to your OpenAI API key - -The `@openhands` mention functionality in pull requests only works if the pull request is both -*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate -permissions to access both repositories. - +### OpenHands +Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md +## Obtain Your OpenHands LLM API Key -## Next Steps +1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) +## Configuration -# GitLab Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `OpenHands` +- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) +- `API Key` to your OpenHands LLM API key copied from above -## Prerequisites +## Using OpenHands LLM Provider in the CLI -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). +1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). +2. To select OpenHands as the LLM provider: + - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. + - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then + choose `openhands` and finally the model. -## Adding GitLab Repository Access +![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) -Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. -## Working With GitLab Repos in Openhands Cloud + +When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy + -After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and -branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! +## Using OpenHands LLM Provider with the SDK -![Connect Repo](/openhands/static/img/connect-repo.png) +You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. -## Using Tokens with Reduced Scopes +### Configuration -OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. -To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, -which will override the default token assigned to the agent. While the high-permission API token is still requested -and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. +The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: -## Working on GitLab Issues and Merge Requests Using Openhands +```bash +export LLM_API_KEY="your-openhands-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-20250514" +``` - -This feature works for personal projects and is available for group projects with a -[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). +### Example -A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into -OpenHands Cloud. +```python +from openhands.sdk import LLM - +# The openhands/ prefix auto-configures the base URL +llm = LLM.load_from_env() -Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. +# Or configure directly +llm = LLM( + model="openhands/claude-sonnet-4-20250514", + api_key="your-openhands-api-key", +) +``` -### Working with Issues +The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. -On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +### Available Models -1. Comment on the issue to let you know it is working on it. - - You can click on the link to track the progress on OpenHands Cloud. -2. Open a merge request if it determines that the issue has been successfully resolved. -3. Comment on the issue with a summary of the performed tasks and a link to the PR. +When using the SDK, prefix any model from the pricing table below with `openhands/`: +- `openhands/claude-sonnet-4-20250514` +- `openhands/claude-sonnet-4-5-20250929` +- `openhands/claude-opus-4-20250514` +- `openhands/gpt-5-2025-08-07` +- etc. -### Working with Merge Requests + +If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. + -To get OpenHands to work on merge requests, mention `@openhands` in the comments to: +## Pricing -- Ask questions -- Request updates -- Get code explanations +Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: -## Managing GitLab Webhooks -The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. +| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | +|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| +| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | +| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | +| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | +| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | +| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | +| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | +| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | +| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | +| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | +| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | +| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | -### Accessing Webhook Management +**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. -The webhook management table is available on the Integrations page when: +### OpenRouter +Source: https://docs.openhands.dev/openhands/usage/llms/openrouter.md -- You are signed in to OpenHands Cloud with a GitLab account -- Your GitLab token is connected +## Configuration -To access it: +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenRouter` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). +If the model is not in the list, enable `Advanced` options, and enter it in +`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). +* `API Key` to your OpenRouter API key. -1. Navigate to the `Settings > Integrations` page -2. Find the GitLab section -3. If your GitLab token is connected, you'll see the webhook management table below the connection status +### OpenHands GitHub Action +Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md -### Viewing Webhook Status +## Using the Action in the OpenHands Repository -The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. +To use the OpenHands GitHub Action in a repository, you can: -- **Resource**: The name and full path of the project or group -- **Type**: Whether it's a "project" or "group" -- **Status**: The current webhook installation status: - - **Installed**: The webhook is active and working - - **Not Installed**: No webhook is currently installed - - **Failed**: A previous installation attempt failed (error details are shown below the status) +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. -### Reinstalling Webhooks +The action will automatically trigger and attempt to resolve the issue. -If a webhook is not installed or has failed, you can reinstall it: +## Installing the Action in a New Repository -1. Find the resource in the webhook management table -2. Click the `Reinstall` button in the Action column -3. The button will show `Reinstalling...` while the operation is in progress -4. Once complete, the status will update to reflect the result +To install the OpenHands GitHub Action in your own repository, follow +the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). - - To reinstall an existing webhook, you must first delete the current webhook - from the GitLab UI before using the Reinstall button in OpenHands Cloud. - +## Usage Tips -**Important behaviors:** +### Iterative resolution -- The Reinstall button is disabled if the webhook is already installed -- Only one reinstall operation can run at a time -- After a successful reinstall, the button remains disabled to prevent duplicate installations -- If a reinstall fails, the error message is displayed below the status badge -- The resources list automatically refreshes after a reinstall completes +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. +3. Review the attempt to resolve the issue by checking the pull request. +4. Follow up with feedback through general comments, review comments, or inline thread comments. +5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. -### Constraints and Limitations +### Label versus Macro -- The webhook management table only displays resources that are accessible with your connected GitLab token -- Webhook installation requires Admin or Owner permissions on the GitLab project or group +- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. +- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. -## Next Steps +## Advanced Settings -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +### Add custom repository settings +You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). -# Getting Started -Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud +### Custom configurations -## Accessing OpenHands Cloud +GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. +The customization options you can set are: -OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, -visit [app.all-hands.dev](https://app.all-hands.dev). +| **Attribute name** | **Type** | **Purpose** | **Example** | +| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | +| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | +| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | +| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | +| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | +| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | -You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: +### Configure +Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md -1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. -2. Review the permissions requested by OpenHands and authorize the application. - - OpenHands will require certain permissions from your account. To read more about these permissions, - you can click the `Learn more` link on the authorization page. -3. Review and accept the `terms of service` and select `Continue`. +## Prerequisites -## Next Steps +- [OpenHands is running](/openhands/usage/run-openhands/local-setup) -Once you've connected your account, you can: +## Launching the GUI Server -- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). -- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). -- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). +### Using the CLI Command +You can launch the OpenHands GUI server directly from the command line using the `serve` command: -# Jira Data Center Integration (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration + +**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` +installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker +directly (see the [Docker section](#using-docker-directly) below). + -# Jira Data Center Integration +```bash +openhands serve +``` -## Platform Configuration +This command will: +- Check that Docker is installed and running +- Pull the required Docker images +- Launch the OpenHands GUI server at http://localhost:3000 +- Use the same configuration directory (`~/.openhands`) as the CLI mode -### Step 1: Create Service Account +#### Mounting Your Current Directory -1. **Access User Management** - - Log in to Jira Data Center as administrator - - Go to **Administration** > **User Management** +To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: -2. **Create User** - - Click **Create User** - - Username: `openhands-agent` - - Full Name: `OpenHands Agent` - - Email: `openhands@yourcompany.com` (replace with your preferred service account email) - - Password: Set a secure password - - Click **Create** +```bash +openhands serve --mount-cwd +``` -3. **Assign Permissions** - - Add user to appropriate groups - - Ensure access to relevant projects - - Grant necessary project permissions +This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. -### Step 2: Generate API Token +#### Using GPU Support -1. **Personal Access Tokens** - - Log in as the service account - - Go to **Profile** > **Personal Access Tokens** - - Click **Create token** - - Name: `OpenHands Cloud Integration` - - Expiry: Set appropriate expiration (recommend 1 year) - - Click **Create** - - **Important**: Copy and store the token securely +If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: -### Step 3: Configure Webhook +```bash +openhands serve --gpu +``` -1. **Create Webhook** - - Go to **Administration** > **System** > **WebHooks** - - Click **Create a WebHook** - - **Name**: `OpenHands Cloud Integration` - - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` - - Set a suitable webhook secret - - **Issue related events**: Select the following: - - Issue updated - - Comment created - - **JQL Filter**: Leave empty (or customize as needed) - - Click **Create** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: ---- +```bash +openhands serve --gpu --mount-cwd +``` -## Workspace Integration +**Prerequisites for GPU support:** +- NVIDIA GPU drivers must be installed on your host system +- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured -### Step 1: Log in to OpenHands Cloud +#### Requirements -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +Before using the `openhands serve` command, ensure that: +- Docker is installed and running on your system +- You have internet access to pull the required Docker images +- Port 3000 is available on your system -### Step 2: Configure Jira Data Center Integration +The CLI will automatically check these requirements and provide helpful error messages if anything is missing. -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Jira Data Center** section +### Using Docker Directly -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The personal access token from Step 2 above - - Ensure **Active** toggle is enabled +Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. - -Workspace name is the host name of your Jira Data Center instance. +## Overview -Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 +### Initial Setup -Here the workspace name is **jira.all-hands.dev**. - +1. Upon first launch, you'll see a settings popup. +2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, + select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the + `Custom Model` text box. +3. Enter the corresponding `API Key` for your chosen provider. +4. Click `Save Changes` to apply the settings. -3. **Complete OAuth Flow** - - You'll be redirected to Jira Data Center to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI +### Settings -### Managing Your Integration +You can use the Settings page at any time to: -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view +- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). +- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). +- [Configure MCP servers](/openhands/usage/settings/mcp-settings). +- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), + [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) + and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). +- Set application settings like your preferred language, notifications and other preferences. +- [Manage custom secrets](/openhands/usage/settings/secrets-settings). -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. +### Key Features -### Screenshots +For an overview of the key features available inside a conversation, please refer to the +[Key Features](/openhands/usage/key-features) section of the documentation. - - -![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) - +## Other Ways to Run Openhands +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) - -![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) - +### Setup +Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md - -![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) - +## Recommended Methods for Running Openhands on Your Local System - -![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) - - +### System Requirements +- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) +- Linux +- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) -# Jira Cloud Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration +A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. -# Jira Cloud Integration +### Prerequisites -## Platform Configuration + -### Step 1: Create Service Account + -1. **Navigate to User Management** - - Go to [Atlassian Admin](https://admin.atlassian.com/) - - Select your organization - - Go to **Directory** > **Users** + **Docker Desktop** -2. **Create OpenHands Service Account** - - Click **Service accounts** - - Click **Create a service account** - - Name: `OpenHands Agent` - - Click **Next** - - Select **User** role for Jira app - - Click **Create** + 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). + 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. + -### Step 2: Generate API Token + -1. **Access Service Account Configuration** - - Locate the created service account from above step and click on it - - Click **Create API token** - - Set the expiry to 365 days (maximum allowed value) - - Click **Next** - - In **Select token scopes** screen, filter by following values - - App: Jira - - Scope type: Classic - - Scope actions: Write, Read - - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes - - Click **Next** - - Review and create API token - - **Important**: Copy and securely store the token immediately + + Tested with Ubuntu 22.04. + -### Step 3: Configure Webhook + **Docker Desktop** -1. **Navigate to Webhook Settings** - - Go to **Jira Settings** > **System** > **WebHooks** - - Click **Create a WebHook** + 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). -2. **Configure Webhook** - - **Name**: `OpenHands Cloud Integration` - - **Status**: Enabled - - **URL**: `https://app.all-hands.dev/integration/jira/events` - - **Issue related events**: Select the following: - - Issue updated - - Comment created - - **JQL Filter**: Leave empty (or customize as needed) - - Click **Create** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + ---- + -## Workspace Integration + **WSL** -### Step 1: Log in to OpenHands Cloud + 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). + 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + **Ubuntu (Linux Distribution)** -### Step 2: Configure Jira Integration + 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. + 2. Restart computer when prompted. + 3. Open Ubuntu from Start menu to complete setup. + 4. Verify installation: `wsl --list` should show Ubuntu. -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Jira Cloud** section + **Docker Desktop** -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The API token from Step 2 above - - Ensure **Active** toggle is enabled + 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). + 2. Open Docker Desktop, go to `Settings` and confirm the following: + - General: `Use the WSL 2 based engine` is enabled. + - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. - -Workspace name is the host name when accessing a resource in Jira Cloud. + + The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. + -Eg: https://all-hands.atlassian.net/browse/OH-55 + -Here the workspace name is **all-hands**. - + -3. **Complete OAuth Flow** - - You'll be redirected to Jira Cloud to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI +### Start the App -### Managing Your Integration +#### Option 1: Using the CLI Launcher with uv (Recommended) -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view +We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. +**Install uv** (if you haven't already): -### Screenshots +See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. - - -![workspace-link.png](/openhands/static/img/jira-user-link.png) - +**Install OpenHands**: +```bash +uv tool install openhands --python 3.12 +``` - -![workspace-link.png](/openhands/static/img/jira-admin-configure.png) - +**Launch OpenHands**: +```bash +# Launch the GUI server +openhands serve - -![workspace-link.png](/openhands/static/img/jira-user-unlink.png) - +# Or with GPU support (requires nvidia-docker) +openhands serve --gpu - -![workspace-link.png](/openhands/static/img/jira-admin-edit.png) - - +# Or with current directory mounted +openhands serve --mount-cwd +``` +This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. -# Linear Integration (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration +**Upgrade OpenHands**: +```bash +uv tool upgrade openhands --python 3.12 +``` -# Linear Integration + -## Platform Configuration +If you prefer to use pip and have Python 3.12+ installed: -### Step 1: Create Service Account +```bash +# Install OpenHands +pip install openhands -1. **Access Team Settings** - - Log in to Linear as a team admin - - Go to **Settings** > **Members** +# Launch the GUI server +openhands serve +``` -2. **Invite Service Account** - - Click **Invite members** - - Email: `openhands@yourcompany.com` (replace with your preferred service account email) - - Role: **Member** (with appropriate team access) - - Send invitation +Note that you'll still need `uv` installed for the default MCP servers to work properly. -3. **Complete Setup** - - Accept invitation from the service account email - - Complete profile setup - - Ensure access to relevant teams/workspaces + -### Step 2: Generate API Key +#### Option 2: Using Docker Directly -1. **Access API Settings** - - Log in as the service account - - Go to **Settings** > **Security & access** + -2. **Create Personal API Key** - - Click **Create new key** - - Name: `OpenHands Cloud Integration` - - Scopes: Select the following: - - `Read` - Read access to issues and comments - - `Create comments` - Ability to create or update comments - - Select the teams you want to provide access to, or allow access for all teams you have permissions for - - Click **Create** - - **Important**: Copy and store the API key securely +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` -### Step 3: Configure Webhook + -1. **Access Webhook Settings** - - Go to **Settings** > **API** > **Webhooks** - - Click **New webhook** +> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. -2. **Configure Webhook** - - **Label**: `OpenHands Cloud Integration` - - **URL**: `https://app.all-hands.dev/integration/linear/events` - - **Resource types**: Select: - - `Comment` - For comment events - - `Issue` - For issue updates (label changes) - - Select the teams you want to provide access to, or allow access for all public teams - - Click **Create webhook** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +You'll find OpenHands running at http://localhost:3000! ---- +### Setup -## Workspace Integration +After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. +This can be done during the initial settings popup or by selecting the `Settings` +button (gear icon) in the UI. -### Step 1: Log in to OpenHands Cloud +If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options +and manually enter it with the correct prefix in the `Custom Model` text box. +The `Advanced` options also allow you to specify a `Base URL` if required. -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +#### Getting an API Key -### Step 2: Configure Linear Integration +OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Linear** section + -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The API key from Step 2 above - - Ensure **Active** toggle is enabled + - -Workspace name is the identifier after the host name when accessing a resource in Linear. +1. [Log in to OpenHands Cloud](https://app.all-hands.dev). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. -Eg: https://linear.app/allhands/issue/OH-37 +OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). -Here the workspace name is **allhands**. - + -3. **Complete OAuth Flow** - - You'll be redirected to Linear to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + -### Managing Your Integration +1. [Create an Anthropic account](https://console.anthropic.com/). +2. [Generate an API key](https://console.anthropic.com/settings/keys). +3. [Set up billing](https://console.anthropic.com/settings/billing). -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view + -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + -### Screenshots +1. [Create an OpenAI account](https://platform.openai.com/). +2. [Generate an API key](https://platform.openai.com/api-keys). +3. [Set up billing](https://platform.openai.com/account/billing/overview). - - -![workspace-link.png](/openhands/static/img/linear-user-link.png) - -![workspace-link.png](/openhands/static/img/linear-admin-configure.png) - + - -![workspace-link.png](/openhands/static/img/linear-admin-edit.png) - +1. Create a Google account if you don't already have one. +2. [Generate an API key](https://aistudio.google.com/apikey). +3. [Set up billing](https://aistudio.google.com/usage?tab=billing). - -![workspace-link.png](/openhands/static/img/linear-admin-edit.png) - + -# Project Management Tool Integrations (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview +If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. -# Project Management Tool Integrations + -## Overview + -OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: -- Adding `@openhands` in ticket comments -- Adding the `openhands` label to tickets +Consider setting usage limits to control costs. -## Prerequisites +#### Using a Local LLM -Integration requires two levels of setup: -1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) -2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace + +Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. + -### Platform-Specific Setup Guides: -- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) -- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) -- [Linear Integration (Coming soon...)](./linear-integration.md) +To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. -## Usage +#### Setting Up Search Engine -Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: +OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. -### Method 1: Comment Mention -Add a comment to any issue with `@openhands` followed by your task description: -``` -@openhands Please implement the user authentication feature described in this ticket -``` +To enable search functionality in OpenHands: -### Method 2: Label-based Delegation -Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. +1. Get a Tavily API key from [tavily.com](https://tavily.com/). +2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` -### Git Repository Detection +For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. -The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: +### Versions -#### Specifying the Target Repository +The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: +- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. +For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. +- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. +This version is unstable and is recommended for testing or development purposes only. -**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. +## Next Steps -**Supported Repository Formats:** -- Full HTTPS URL: `https://github.com/owner/repository.git` -- GitHub URL without .git: `https://github.com/owner/repository` -- Owner/repository format: `owner/repository` +- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) +- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) -#### Platform-Specific Behavior +### Docker Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker.md -**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. +The **Docker sandbox** runs the agent server inside a Docker container. This is +the default and recommended option for most users. -**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. + + In some self-hosted deployments, the sandbox provider is controlled via the + legacy RUNTIME environment variable. Docker is the default. + -## Troubleshooting -### Platform Configuration Issues -- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) -- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings -- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions +## Why Docker? -### Workspace Integration Issues -- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) -- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first -- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access +- Isolation: reduces risk when the agent runs commands. +- Reproducibility: consistent environment across machines. -### General Issues -- **Agent not responding**: Check webhook logs in your platform settings and verify service account status -- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access -- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on -- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed +## Mounting your code into the sandbox -### Getting Help -For additional support, contact OpenHands Cloud support with: -- Your integration platform (Linear, Jira Cloud, or Jira Data Center) -- Workspace name -- Error logs from webhook/integration attempts -- Screenshots of configuration settings (without sensitive credentials) +If you want OpenHands to work directly on a local repository, mount it into the +sandbox. +### Recommended: CLI launcher -# Slack Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation +If you start OpenHands via: - +```bash +openhands serve --mount-cwd +``` - -OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. -While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to -validate critical information independently. - +your current directory will be mounted into the sandbox workspace. -## Prerequisites +### Using SANDBOX_VOLUMES -- Access to OpenHands Cloud. +You can also configure mounts via the SANDBOX_VOLUMES environment +variable (format: host_path:container_path[:mode]): -## Installation Steps +```bash +export SANDBOX_VOLUMES=$PWD:/workspace:rw +``` - - + + Anything mounted read-write into /workspace can be modified by the + agent. + - **This step is for Slack admins/owners** +## Custom sandbox images - 1. Make sure you have permissions to install Apps to your workspace. - 2. Click the button below to install OpenHands Slack App Add to Slack - 3. In the top right corner, select the workspace to install the OpenHands Slack app. - 4. Review permissions and click allow. +To customize the container image (extra tools, system deps, etc.), see +[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). - +### Overview +Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview.md - +A **sandbox** is the environment where OpenHands runs commands, edits files, and +starts servers while working on your task. - **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** +In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. - Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: - 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. - 2. Click `Install OpenHands Slack App`. - 3. In the top right corner, select the workspace to install the OpenHands Slack app. - 4. Review permissions and click allow. +## Sandbox providers - Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. +OpenHands supports multiple sandbox “providers”, with different tradeoffs: - +- **Docker sandbox (recommended)** + - Runs the agent server inside a Docker container. + - Good isolation from your host machine. - +- **Process sandbox (unsafe, but fast)** + - Runs the agent server as a regular process on your machine. + - No container isolation. +- **Remote sandbox** + - Runs the agent server in a remote environment. + - Used by managed deployments and some hosted setups. -## Working With the Slack App +## Selecting a provider (current behavior) -To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. +In some deployments, the provider selection is still controlled via the legacy +RUNTIME environment variable: -Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote -To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. -You must be the user who started the conversation. + + The user-facing terminology in V1 is sandbox, but the configuration knob + may still be called RUNTIME while the migration is in progress. + -## Example conversation +## Terminology note (V0 vs V1) -### Start a new conversation, and select repo +Older documentation refers to these environments as **runtimes**. +Those legacy docs are now in the Legacy (V0) section of the Web tab. -Conversation is started by mentioning `@openhands`. +### Process Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/process.md -![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) +The **Process sandbox** runs the agent server directly on your machine as a +regular process. -### See agent response and send follow up messages + + This mode provides **no sandbox isolation**. -Initial request is followed up by mentioning `@openhands` in a thread reply. + The agent can read/write files your user account can access and execute + commands on your host system. -![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) + Only use this in controlled environments. + -## Pro tip +## When to use it -You can mention a repo name when starting a new conversation in the following formats +- Local development when Docker is unavailable +- Some CI environments +- Debugging issues that only reproduce outside containers -1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) -2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) +## Choosing process mode -The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. -If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. +In some deployments, this is selected via the legacy RUNTIME +environment variable: -![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) +```bash +export RUNTIME=process +# (legacy alias) +# export RUNTIME=local +``` +If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). -# Repository Customization -Source: https://docs.openhands.dev/openhands/usage/customization/repository +### Remote Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote.md -## Skills (formerly Microagents) +A **remote sandbox** runs the agent server in a remote execution environment +instead of on your local machine. -Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands -should function. See [Skills Overview](/overview/skills) for more information. +This is typically used by managed deployments (e.g., OpenHands Cloud) and +advanced self-hosted setups. +## Selecting remote mode -## Setup Script -You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. -This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. +In some self-hosted deployments, remote sandboxes are selected via the legacy +RUNTIME environment variable: -For example: ```bash -#!/bin/bash -export MY_ENV_VAR="my value" -sudo apt-get update -sudo apt-get install -y lsof -cd frontend && npm install ; cd .. +export RUNTIME=remote ``` -## Pre-commit Script -You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. -This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. +Remote sandboxes require additional configuration (API URL + API key). The exact +variable names depend on your deployment, but you may see legacy names like: -For example: -```bash -#!/bin/bash -# Run linting checks -cd frontend && npm run lint -if [ $? -ne 0 ]; then - echo "Frontend linting failed. Please fix the issues before committing." - exit 1 -fi +- SANDBOX_REMOTE_RUNTIME_API_URL +- SANDBOX_API_KEY -# Run tests -cd backend && pytest tests/unit -if [ $? -ne 0 ]; then - echo "Backend tests failed. Please fix the issues before committing." - exit 1 -fi +## Notes -exit 0 -``` +- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) + depending on the provider. +- Configuration and credentials vary by deployment. +If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). -# Debugging -Source: https://docs.openhands.dev/openhands/usage/developers/debugging +### API Keys Settings +Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md -The following is intended as a primer on debugging OpenHands for Development purposes. + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + -## Server / VSCode +## Overview -The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: +Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to +OpenHands Cloud -``` -{ - "version": "0.2.0", - "configurations": [ - { - "name": "OpenHands CLI", - "type": "debugpy", - "request": "launch", - "module": "openhands.cli.main", - "justMyCode": false - }, - { - "name": "OpenHands WebApp", - "type": "debugpy", - "request": "launch", - "module": "uvicorn", - "args": [ - "openhands.server.listen:app", - "--reload", - "--reload-exclude", - "${workspaceFolder}/workspace", - "--port", - "3000" - ], - "justMyCode": false - } - ] -} -``` +## OpenHands LLM Key -More specific debugging configurations which include more parameters may be specified: + +You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. + -``` - ... - { - "name": "Debug CodeAct", - "type": "debugpy", - "request": "launch", - "module": "openhands.core.main", - "args": [ - "-t", - "Ask me what your task is.", - "-d", - "${workspaceFolder}/workspace", - "-c", - "CodeActAgent", - "-l", - "llm.o1", - "-n", - "prompts" - ], - "justMyCode": false - } - ... -``` +You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will +use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. -Values in the snippet above can be updated such that: +## OpenHands API Key - * *t*: the task - * *d*: the openhands workspace directory - * *c*: the agent - * *l*: the LLM config (pre-defined in config.toml) - * *n*: session name (e.g. eventstream name) +These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the +[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). +### Create API Key -# Development Overview -Source: https://docs.openhands.dev/openhands/usage/developers/development-overview +1. Navigate to the `Settings > API Keys` page. +2. Click `Create API Key`. +3. Give your API key a name and click `Create`. -## Core Documentation +### Delete API Key -### Project Fundamentals -- **Main Project Overview** (`/README.md`) - The primary entry point for understanding OpenHands, including features and basic setup instructions. +1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. +2. Click `Delete` to confirm removal. -- **Development Guide** (`/Development.md`) - Guide for developers working on OpenHands, including setup, requirements, and development workflows. +### Application Settings +Source: https://docs.openhands.dev/openhands/usage/settings/application-settings.md -- **Contributing Guidelines** (`/CONTRIBUTING.md`) - Essential information for contributors, covering code style, PR process, and contribution workflows. +## Overview -### Component Documentation +The Application settings allows you to customize various application-level behaviors in OpenHands, including +language preferences, notification settings, custom Git author configuration and more. -#### Frontend -- **Frontend Application** (`/frontend/README.md`) - Complete guide for setting up and developing the React-based frontend application. +## Setting Maximum Budget Per Conversation -#### Backend -- **Backend Implementation** (`/openhands/README.md`) - Detailed documentation of the Python backend implementation and architecture. +To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) +in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but +you can choose to continue the conversation with a prompt. -- **Server Documentation** (`/openhands/server/README.md`) - Server implementation details, API documentation, and service architecture. +## Git Author Settings -- **Runtime Environment** (`/openhands/runtime/README.md`) - Documentation covering the runtime environment, execution model, and runtime configurations. +OpenHands provides the ability to customize the Git author information used when making commits and creating +pull requests on your behalf. -#### Infrastructure -- **Container Documentation** (`/containers/README.md`) - Information about Docker containers, deployment strategies, and container management. +By default, OpenHands uses the following Git author information for all commits and pull requests: -### Testing and Evaluation -- **Unit Testing Guide** (`/tests/unit/README.md`) - Instructions for writing, running, and maintaining unit tests. +- **Username**: `openhands` +- **Email**: `openhands@all-hands.dev` -- **Evaluation Framework** (`/evaluation/README.md`) - Documentation for the evaluation framework, benchmarks, and performance testing. +To override the defaults: -### Advanced Features -- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) - Detailed information about the skills architecture, implementation, and usage. +1. Navigate to the `Settings > Application` page. +2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. +3. Click `Save Changes` -### Documentation Standards -- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) - Standards and guidelines for writing and maintaining project documentation. + + When you configure a custom Git author, OpenHands will use your specified username and email as the primary author + for commits and pull requests. OpenHands will remain as a co-author. + -## Getting Started with Development +### Integrations Settings +Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md -If you're new to developing with OpenHands, we recommend following this sequence: +## Overview -1. Start with the main `README.md` to understand the project's purpose and features -2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute -3. Follow the setup instructions in `Development.md` -4. Dive into specific component documentation based on your area of interest: - - Frontend developers should focus on `/frontend/README.md` - - Backend developers should start with `/openhands/README.md` - - Infrastructure work should begin with `/containers/README.md` +OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some +integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether +you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). -## Documentation Updates +## OpenHands Cloud Integrations Settings -When making changes to the codebase, please ensure that: -1. Relevant documentation is updated to reflect your changes -2. New features are documented in the appropriate README files -3. Any API changes are reflected in the server documentation -4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + +### GitHub Settings -# Evaluation Harness -Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness +- `Configure GitHub Repositories` - Allows you to +[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. -This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. +### Slack Settings -## Setup Environment and LLM Configuration +- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in + your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. -Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. -OpenHands in development mode uses `config.toml` to keep track of most configurations. +## Running on Your Own Integrations Settings -Here's an example configuration file you can use to define and use multiple LLMs: + + These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). + -```toml -[llm] -# IMPORTANT: add your API key here, and set the model to the one you want to evaluate -model = "claude-3-5-sonnet-20241022" -api_key = "sk-XXX" +### Version Control Integrations -[llm.eval_gpt4_1106_preview_llm] -model = "gpt-4-1106-preview" -api_key = "XXX" -temperature = 0.0 +#### GitHub Setup -[llm.eval_some_openai_compatible_model_llm] -model = "openai/MODEL_NAME" -base_url = "https://OPENAI_COMPATIBLE_URL/v1" -api_key = "XXX" -temperature = 0.0 -``` +OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: + + -## How to use OpenHands in the command line + 1. **Generate a Personal Access Token (PAT)**: + - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. + - **Tokens (classic)** + - Required scopes: + - `repo` (Full control of private repositories) + - **Fine-grained tokens** + - All Repositories (You can select specific repositories, but this will impact what returns in repo search) + - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitHub Token` field. + - Click `Save Changes` to apply the changes. -OpenHands can be run from the command line using the following format: + If you're working with organizational repositories, additional setup may be required: -```bash -poetry run python ./openhands/core/main.py \ - -i \ - -t "" \ - -c \ - -l -``` + 1. **Check organization requirements**: + - Organization admins may enforce specific token policies. + - Some organizations require tokens to be created with SSO enabled. + - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). + 2. **Verify organization access**: + - Go to your token settings on GitHub. + - Look for the organization under `Organization access`. + - If required, click `Enable SSO` next to your organization. + - Complete the SSO authorization process. + -For example: + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + - Try regenerating the token. -```bash -poetry run python ./openhands/core/main.py \ - -i 10 \ - -t "Write me a bash script that prints hello world." \ - -c CodeActAgent \ - -l llm -``` + - **Organization Access Denied**: + - Check if SSO is required but not enabled. + - Verify organization membership. + - Contact organization admin if token policies are blocking access. + + -This command runs OpenHands with: -- A maximum of 10 iterations -- The specified task description -- Using the CodeActAgent -- With the LLM configuration defined in the `llm` section of your `config.toml` file +#### GitLab Setup -## How does OpenHands work +OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: -The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitLab, go to `User Settings > Access Tokens`. + - Create a new token with the following scopes: + - `api` (API access) + - `read_user` (Read user information) + - `read_repository` (Read repository) + - `write_repository` (Write repository) + - Set an expiration date or leave it blank for a non-expiring token. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitLab Token` field. + - Click `Save Changes` to apply the changes. -1. Parse command-line arguments and load the configuration -2. Create a runtime environment using `create_runtime()` -3. Initialize the specified agent -4. Run the controller using `run_controller()`, which: - - Attaches the runtime to the agent - - Executes the agent's task - - Returns a final state when complete + 3. **(Optional): Restrict agent permissions** + - Create another PAT using Step 1 and exclude `api` scope . + - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. + - OpenHands will use the higher scope token, and the agent will use the lower scope token. + -The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + - **Access Denied**: + - Verify project access permissions. + - Check if the token has the necessary scopes. + - For group/organization repositories, ensure you have proper access. + + -## Easiest way to get started: Exploring Existing Benchmarks +#### BitBucket Setup + + +1. **Generate an App password**: + - On Bitbucket, go to `Account Settings > App Password`. + - Create a new password with the following scopes: + - `account`: `read` + - `repository: write` + - `pull requests: write` + - `issues: write` + - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `BitBucket Token` field. + - Click `Save Changes` to apply the changes. + -We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + -To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. + -## How to create an evaluation workflow +### Language Model (LLM) Settings +Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings.md +## Overview -To create an evaluation workflow for your benchmark, follow these steps: +The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is +supported by litellm, but it requires a powerful model to work properly. +[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some +additional LLM settings on this page. -1. Import relevant OpenHands utilities: - ```python - import openhands.agenthub - from evaluation.utils.shared import ( - EvalMetadata, - EvalOutput, - make_metadata, - prepare_dataset, - reset_logger_for_multiprocessing, - run_evaluation, - ) - from openhands.controller.state.state import State - from openhands.core.config import ( - AppConfig, - SandboxConfig, - get_llm_config_arg, - parse_arguments, - ) - from openhands.core.logger import openhands_logger as logger - from openhands.core.main import create_runtime, run_controller - from openhands.events.action import CmdRunAction - from openhands.events.observation import CmdOutputObservation, ErrorObservation - from openhands.runtime.runtime import Runtime - ``` +## Basic LLM Settings -2. Create a configuration: - ```python - def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: - config = AppConfig( - default_agent=metadata.agent_class, - runtime='docker', - max_iterations=metadata.max_iterations, - sandbox=SandboxConfig( - base_container_image='your_container_image', - enable_auto_lint=True, - timeout=300, - ), - ) - config.set_llm_config(metadata.llm_config) - return config - ``` +The most popular providers and models are available in the basic settings. Some of the providers have been verified to +work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and +Mistral AI. -3. Initialize the runtime and set up the evaluation environment: - ```python - def initialize_runtime(runtime: Runtime, instance: pd.Series): - # Set up your evaluation environment here - # For example, setting environment variables, preparing files, etc. - pass - ``` +1. Choose your preferred provider using the `LLM Provider` dropdown. +2. Choose your favorite model using the `LLM Model` dropdown. +3. Set the `API Key` for your chosen provider and model and click `Save Changes`. -4. Create a function to process each instance: - ```python - from openhands.utils.async_utils import call_async_from_sync - def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: - config = get_config(instance, metadata) - runtime = create_runtime(config) - call_async_from_sync(runtime.connect) - initialize_runtime(runtime, instance) +This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first +restart older conversations. - instruction = get_instruction(instance, metadata) +## Advanced LLM Settings - state = run_controller( - config=config, - task_str=instruction, - runtime=runtime, - fake_user_response_fn=your_user_response_function, - ) +Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use +this when your preferred provider or model does not exist in the basic settings dropdowns. - # Evaluate the agent's actions - evaluation_result = await evaluate_agent_actions(runtime, instance) +1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the + custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have + [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). +2. `Base URL`: If your provider has a specific base URL, specify it here. +3. `API Key`: Set the API key for your custom model. +4. Click `Save Changes` - return EvalOutput( - instance_id=instance.instance_id, - instruction=instruction, - test_result=evaluation_result, - metadata=metadata, - history=compatibility_for_eval_history_pairs(state.history), - metrics=state.metrics.get() if state.metrics else None, - error=state.last_error if state and state.last_error else None, - ) - ``` +### Memory Condensation -5. Run the evaluation: - ```python - metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) - output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') - instances = prepare_dataset(your_dataset, output_file, eval_n_limit) +The memory condenser manages the language model's context by ensuring only the most important and relevant information +is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running +conversations. - await run_evaluation( - instances, - metadata, - output_file, - num_workers, - process_instance - ) - ``` +- `Enable memory condensation` - Turn on this setting to activate this feature. +- `Memory condenser max history size` - The condenser will summarize the history after this many events. -This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. +### Model Context Protocol (MCP) +Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md -Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. +## Overview -By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. +Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These +servers can provide additional functionality to the agent, such as specialized data processing, external API access, +or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). +## Supported MCPs -## Understanding the `user_response_fn` +OpenHands supports the following MCP transport protocols: -The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. +* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) +* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) +* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) +## How MCP Works -### Workflow and Interaction +When OpenHands starts, it: -The correct workflow for handling actions and the `user_response_fn` is as follows: +1. Reads the MCP configuration. +2. Connects to any configured SSE and SHTTP servers. +3. Starts any configured stdio servers. +4. Registers the tools provided by these servers with the agent. -1. Agent receives a task and starts processing -2. Agent emits an Action -3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): - - The Runtime processes the Action - - Runtime returns an Observation -4. If the Action is not executable (typically a MessageAction): - - The `user_response_fn` is called - - It returns a simulated user response -5. The agent receives either the Observation or the simulated response -6. Steps 2-5 repeat until the task is completed or max iterations are reached +The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: -Here's a more accurate visual representation: +1. OpenHands routes the call to the appropriate MCP server. +2. The server processes the request and returns a response. +3. OpenHands converts the response to an observation and presents it to the agent. -``` - [Agent] - | - v - [Emit Action] - | - v - [Is Action Executable?] - / \ - Yes No - | | - v v - [Runtime] [user_response_fn] - | | - v v - [Return Observation] [Simulated Response] - \ / - \ / - v v - [Agent receives feedback] - | - v - [Continue or Complete Task] -``` +## Configuration -In this workflow: +MCP configuration can be defined in: +* The OpenHands UI in the `Settings > MCP` page. +* The `config.toml` file under the `[mcp]` section if not using the UI. -- Executable actions (like running commands or executing code) are handled directly by the Runtime -- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` -- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` +### Configuration Options -This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. + + + SSE servers are configured using either a string URL or an object with the following properties: -### Example Implementation + - `url` (required) + - Type: `str` + - Description: The URL of the SSE server. -Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + + SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: -```python -def codeact_user_response(state: State | None) -> str: - msg = ( - 'Please continue working on the task on whatever approach you think is suitable.\n' - 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' - 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' - ) + - `url` (required) + - Type: `str` + - Description: The URL of the SHTTP server. - if state and state.history: - # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up - user_msgs = [ - event - for event in state.history - if isinstance(event, MessageAction) and event.source == 'user' - ] - if len(user_msgs) >= 2: - # let the agent know that it can give up when it has tried 3 times - return ( - msg - + 'If you want to give up, run: exit .\n' - ) - return msg -``` + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. -This function does the following: + - `timeout` (optional) + - Type: `int` + - Default: `60` + - Range: `1-3600` seconds (1 hour maximum) + - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. + - **Use Cases:** + - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. + - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. + - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. + + This timeout only applies to individual tool calls, not server connection establishment. + + + + + While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for + better reliability and performance. + -1. Provides a standard message encouraging the agent to continue working -2. Checks how many times the agent has attempted to communicate with the user -3. If the agent has made multiple attempts, it provides an option to give up + Stdio servers are configured using an object with the following properties: -By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. + - `name` (required) + - Type: `str` + - Description: A unique name for the server. + - `command` (required) + - Type: `str` + - Description: The command to run the server. -# WebSocket Connection -Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection + - `args` (optional) + - Type: `list of str` + - Default: `[]` + - Description: Command-line arguments to pass to the server. -This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. + - `env` (optional) + - Type: `dict of str to str` + - Default: `{}` + - Description: Environment variables to set for the server process. + + -## Overview +#### When to Use Direct Stdio -OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: +Direct stdio connections may still be appropriate in these scenarios: +- **Development and testing**: Quick prototyping of MCP servers. +- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. +- **Local-only environments**: When you don't want to manage additional proxy processes. -1. Receive real-time events from the agent -2. Send user actions to the agent -3. Maintain a persistent connection for ongoing conversations +### Configuration Examples -## Connecting to the WebSocket + + + For stdio-based MCP servers, we recommend using MCP proxy tools like + [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. + [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to + HTTP/SSE endpoints. -### Connection Parameters + Start the proxy servers separately: + ```bash + # Terminal 1: Filesystem server proxy + supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 -When connecting to the WebSocket, you need to provide the following query parameters: + # Terminal 2: Fetch server proxy + supergateway --stdio "uvx mcp-server-fetch" --port 8081 + ``` -- `conversation_id`: The ID of the conversation you want to join -- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) -- `providers_set`: (Optional) A comma-separated list of provider types + Then configure OpenHands to use the HTTP endpoint: -### Connection Example + ```toml + [mcp] + # SSE Servers - Recommended approach using proxy tools + sse_servers = [ + # Basic SSE server with just a URL + "http://example.com:8080/mcp", -Here's a basic example of connecting to the WebSocket using JavaScript: + # SuperGateway proxy for fetch server + "http://localhost:8081/sse", -```javascript -import { io } from "socket.io-client"; + # External MCP service with authentication + {url="https://api.example.com/mcp/sse", api_key="your-api-key"} + ] -const socket = io("http://localhost:3000", { - transports: ["websocket"], - query: { - conversation_id: "your-conversation-id", - latest_event_id: -1, - providers_set: "github,gitlab" // Optional - } -}); + # SHTTP Servers - Modern streamable HTTP transport (recommended) + shttp_servers = [ + # Basic SHTTP server with default 60s timeout + "https://api.example.com/mcp/shttp", -socket.on("connect", () => { - console.log("Connected to OpenHands WebSocket"); -}); + # Server with custom timeout for heavy operations + { + url = "https://files.example.com/mcp/shttp", + api_key = "your-api-key", + timeout = 1800 # 30 minutes for large file processing + } + ] + ``` + + + + This setup is not Recommended for production. + + ```toml + [mcp] + # Direct stdio servers - use only for development/testing + stdio_servers = [ + # Basic stdio server + {name="fetch", command="uvx", args=["mcp-server-fetch"]}, -socket.on("oh_event", (event) => { - console.log("Received event:", event); -}); + # Stdio server with environment variables + { + name="filesystem", + command="npx", + args=["@modelcontextprotocol/server-filesystem", "/"], + env={ + "DEBUG": "true" + } + } + ] + ``` -socket.on("connect_error", (error) => { - console.error("Connection error:", error); -}); + For production use, we recommend using proxy tools like SuperGateway. + + -socket.on("disconnect", (reason) => { - console.log("Disconnected:", reason); -}); -``` +Other options include: -## Sending Actions to the Agent +- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. +- **Docker-based proxies**: Containerized solutions for better isolation. +- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. -To send an action to the agent, use the `oh_user_action` event: +### Secrets Management +Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md -```javascript -// Send a user message to the agent -socket.emit("oh_user_action", { - type: "message", - source: "user", - message: "Hello, can you help me with my project?" -}); -``` +## Overview -## Receiving Events from the Agent +OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be +accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment +variables in the agent's runtime environment. -The server emits events using the `oh_event` event type. Here are some common event types you might receive: +## Accessing the Secrets Manager -- User messages (`source: "user", type: "message"`) -- Agent messages (`source: "agent", type: "message"`) -- File edits (`action: "edit"`) -- File writes (`action: "write"`) -- Command executions (`action: "run"`) +Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. -Example event handler: +## Adding a New Secret +1. Click `Add a new secret`. +2. Fill in the following fields: + - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. + - **Value**: The sensitive information you want to store. + - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. +3. Click `Add secret` to save. -```javascript -socket.on("oh_event", (event) => { - if (event.source === "agent" && event.type === "message") { - console.log("Agent says:", event.message); - } else if (event.action === "run") { - console.log("Command executed:", event.args.command); - console.log("Result:", event.result); - } -}); -``` +## Editing a Secret -## Using Websocat for Testing +1. Click the `Edit` button next to the secret you want to modify. +2. You can update the name and description of the secret. + + For security reasons, you cannot view or edit the value of an existing secret. If you need to change the + value, delete the secret and create a new one. + -[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. +## Deleting a Secret -### Installation +1. Click the `Delete` button next to the secret you want to remove. +2. Select `Confirm` to delete the secret. -```bash -# On macOS -brew install websocat +## Using Secrets in the Agent + - All custom secrets are automatically exported as environment variables in the agent's runtime environment. + - You can access them in your code using standard environment variable access methods. For example, if you create a + secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or + `os.environ['OPENAI_API_KEY']` in Python. -# On Linux -curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat -chmod +x websocat -sudo mv websocat /usr/local/bin/ -``` +### Prompting Best Practices +Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md -### Connecting to the WebSocket +## Characteristics of Good Prompts -```bash -# Connect to the WebSocket and print all received messages -echo "40{}" | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +Good prompts are: -### Sending a Message +- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. +- **Location-specific**: Specify the locations in the codebase that should be modified, if known. +- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. -```bash -# Send a message to the agent -echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +## Examples -### Complete Example with Websocat +### Good Prompt Examples -Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: +- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. +- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. +- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. -```bash -# Start a persistent connection -websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +### Bad Prompt Examples -# In another terminal, send a message -echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +- Make the code better. (Too vague, not concrete) +- Rewrite the entire backend to use a different framework. (Not appropriately scoped) +- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) -## Event Structure +## Tips for Effective Prompting -Events sent and received through the WebSocket follow a specific structure: +- Be as specific as possible about the desired outcome or the problem to be solved. +- Provide context, including relevant file paths and line numbers if available. +- Break large tasks into smaller, manageable prompts. +- Include relevant error messages or logs. +- Specify the programming language or framework, if not obvious. -```typescript -interface OpenHandsEvent { - id: string; // Unique event ID - source: string; // "user" or "agent" - timestamp: string; // ISO timestamp - message?: string; // For message events - type?: string; // Event type (e.g., "message") - action?: string; // Action type (e.g., "run", "edit", "write") - args?: any; // Action arguments - result?: any; // Action result -} +The more precise and informative your prompt, the better OpenHands can assist you. + +See [First Projects](/overview/first-projects) for more examples of helpful prompts. + +### Troubleshooting +Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md + + +OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. + + +### Launch docker client failed + +**Description** + +When running OpenHands, the following error is seen: +``` +Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. ``` -## Best Practices +**Resolution** -1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. -2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. -3. **Error Handling**: Implement proper error handling for connection errors and failed actions. -4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. +Try these in order: +* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. +* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. +* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. +* Reinstall Docker Desktop. -## Troubleshooting +### Permission Error -### Connection Issues +**Description** -- Verify that the OpenHands server is running and accessible -- Check that you're providing the correct conversation ID -- Ensure your WebSocket URL is correctly formatted +On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. -### Authentication Issues +**Resolution** -- Make sure you have the necessary authentication cookies if required -- Verify that you have permission to access the specified conversation +* Check if the `~/.openhands` is owned by `root`. If so, you can: + * Change the directory's ownership: `sudo chown : ~/.openhands`. + * or update permissions on the directory: `sudo chmod 777 ~/.openhands` + * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. +* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running + OpenHands. -### Event Handling Issues +### On Linux, Getting ConnectTimeout Error -- Check that you're correctly parsing the event data -- Verify that your event handlers are properly registered +**Description** +When running on Linux, you might run into the error `ERROR:root:: timed out`. -# Environment Variables Reference -Source: https://docs.openhands.dev/openhands/usage/environment-variables +**Resolution** -This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. +If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that +these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker +[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. -## Environment Variable Naming Convention +If that does not solve the issue, try incrementally adding the following parameters to the docker run command: +* `--network host` +* `-e SANDBOX_USE_HOST_NETWORK=true` +* `-e DOCKER_HOST_ADDR=127.0.0.1` -OpenHands follows a consistent naming pattern for environment variables: +### Internal Server Error. Ports are not available -- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) -- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) -- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) -- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) -- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) +**Description** -## Core Configuration Variables +When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP +...: bind: An attempt was made to access a socket in a +way forbidden by its access permissions.")` is encountered. -These variables correspond to the `[core]` section in `config.toml`: +**Resolution** -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | -| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | -| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | -| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | -| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | -| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | -| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | -| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | -| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | -| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | -| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | -| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | -| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | -| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | -| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | -| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | -| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | +* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: +``` +Restart-Service -Name "winnat" +``` -## LLM Configuration Variables +### Unable to access VS Code tab via local IP -These variables correspond to the `[llm]` section in `config.toml`: +**Description** -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | -| `LLM_API_KEY` | string | `""` | API key for the LLM provider | -| `LLM_BASE_URL` | string | `""` | Custom API base URL | -| `LLM_API_VERSION` | string | `""` | API version to use | -| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | -| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | -| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | -| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | -| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | -| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | -| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | -| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | -| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | -| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | -| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | -| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | -| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | -| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | -| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | -| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | -| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | -| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | - -### AWS Configuration -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | -| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | -| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | - -## Agent Configuration Variables - -These variables correspond to the `[agent]` section in `config.toml`: - -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | -| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | -| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | -| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | -| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | -| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | -| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | -| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | +When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" +error, while other parts of the UI work fine. -## Sandbox Configuration Variables +**Resolution** -These variables correspond to the `[sandbox]` section in `config.toml`: +This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. +To fix this: -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | -| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | -| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | -| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | -| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | -| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | -| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | -| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | -| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | -| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | -| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | -| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | -| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | -| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | -| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | -| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | -| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | -| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | -| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | -| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | +1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: + ```bash + docker run -it --rm \ + -e SANDBOX_VSCODE_PORT=41234 \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + -p 41234:41234 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:latest + ``` -### Sandbox Environment Variables -Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: + > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. -| Environment Variable | Description | -|---------------------|-------------| -| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | +2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. +3. If running with the development workflow, you can set this in your `config.toml` file: + ```toml + [sandbox] + vscode_port = 41234 + ``` -## Security Configuration Variables +### GitHub Organization Rename Issues -These variables correspond to the `[security]` section in `config.toml`: +**Description** -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | -| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | -| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | +After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. -## Debug and Logging Variables +**Resolution** -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `DEBUG` | boolean | `false` | Enable general debug logging | -| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | -| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | -| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | +* Update your git remote URL: + ```bash + # Check current remote + git remote get-url origin + + # Update SSH remote + git remote set-url origin git@github.com:OpenHands/OpenHands.git + + # Or update HTTPS remote + git remote set-url origin https://github.com/OpenHands/OpenHands.git + ``` +* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` +* Find and update any hardcoded references: + ```bash + git grep -i "all-hands-ai" + git grep -i "ghcr.io/all-hands-ai" + ``` -## Runtime-Specific Variables +### COBOL Modernization +Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md -### Docker Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | +Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. -### Remote Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | -| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | + +This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). + -### Local Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | -| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | -| `RUNTIME_ID` | string | `""` | Runtime identifier | -| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | +## The COBOL Modernization Challenge -## Integration Variables +[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. -### GitHub Integration -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | +The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. -### Third-Party API Keys -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `OPENAI_API_KEY` | string | `""` | OpenAI API key | -| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | -| `GOOGLE_API_KEY` | string | `""` | Google API key | -| `AZURE_API_KEY` | string | `""` | Azure API key | -| `TAVILY_API_KEY` | string | `""` | Tavily search API key | +## Overview -## Server Configuration Variables +COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. -These are primarily used when running OpenHands as a server: +OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `FRONTEND_PORT` | integer | `3000` | Frontend server port | -| `BACKEND_PORT` | integer | `8000` | Backend server port | -| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | -| `BACKEND_HOST` | string | `"localhost"` | Backend host address | -| `WEB_HOST` | string | `"localhost"` | Web server host | -| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | +1. **Understanding**: Analyze and document existing COBOL code +2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# +3. **Validation**: Ensure the modernized code behaves identically to the original -## Deprecated Variables +In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. -These variables are deprecated and should be replaced: +## Understanding -| Environment Variable | Replacement | Description | -|---------------------|-------------|-------------| -| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. -## Usage Examples +Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: -### Basic Setup with OpenAI -```bash -export LLM_MODEL="gpt-4o" -export LLM_API_KEY="your-openai-api-key" -export DEBUG=true ``` +For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. -### Docker Deployment with Custom Volumes -```bash -export RUNTIME="docker" -export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" -export SANDBOX_TIMEOUT=300 -``` +Save the results in `business_functions.json` in the following format: -### Remote Runtime Configuration -```bash -export RUNTIME="remote" -export SANDBOX_API_KEY="your-remote-api-key" -export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" +{ + ..., + "COBIL00C.cbl": { + "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", + "references": [ + "docs/billing.md#bill-payment", + "docs/transactions.md#transaction-action" + ], + }, + ... +} ``` -### Security-Enhanced Setup -```bash -export SECURITY_CONFIRMATION_MODE=true -export SECURITY_SECURITY_ANALYZER="llm" -export DEBUG_RUNTIME=true -``` +OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. -## Notes +## Translation -1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). +With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. -2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. +One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. -3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. +An example prompt is below: -4. **Precedence**: Environment variables take precedence over TOML configuration files. +``` +Convert the COBOL files in `/src` to Java in `/src/java`. -5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: - ```bash - docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands - ``` +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures (see `business_functions.json`) +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices +``` -6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. +Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. +## Validation -# Good vs. Bad Instructions -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions +Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. -The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. +Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. -## Concrete Examples of Good/Bad Prompts +## Scaling Up -### Bug Fixing Examples +The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. -#### Bad Example +The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): -``` -Fix the bug in my code. +```python +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Migrating agent converts COBOL to Java + migration_conversation.send_message(migration_prompt) + migration_conversation.run() + + # Critiquing agent evaluates the conversion + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse the score and decide whether to continue + current_score = parse_critique_score(critique_file) ``` -**Why it's bad:** -- No information about what the bug is -- No indication of where to look -- No description of expected vs. actual behavior -- OpenHands would have to guess what's wrong - -#### Good Example +By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: ``` -Fix the TypeError in src/api/users.py line 45. +Evaluate the quality of the COBOL to Java migration in `/src`. -Error message: -TypeError: 'NoneType' object has no attribute 'get' +For each Java file, assess using the following criteria: +1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? +2. Code Quality: Is the code clean, readable, and following Java 17 conventions? +3. Completeness: Are all COBOL features properly converted? +4. Best Practices: Does it use proper OOP, error handling, and documentation? -Expected behavior: The get_user_preferences() function should return -default preferences when the user has no saved preferences. +For each instance of a criteria not met, deduct a point. -Actual behavior: It crashes with the error above when user.preferences is None. +Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. -The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. +Save the results in `critique.json` in the following format: + +{ + "total_score": -12, + "files": [ + { + "cobol": "COBIL00C.cbl", + "java": "bill_payment.java", + "scores": { + "correctness": 0, + "code_quality": 0, + "completeness": -1, + "best_practices": -2 + }, + "feedback": [ + "Rename single-letter variables to meaningful names.", + "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", + ], + }, + ... + ] +} ``` -**Why it works:** -- Specific file and line number -- Exact error message -- Clear expected vs. actual behavior -- Suggested approach for the fix +In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. -### Feature Development Examples +This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. -#### Bad Example +## Try It Yourself -``` -Add user authentication to my app. +The full iterative refinement example is available in the OpenHands SDK: + +```bash +export LLM_API_KEY="your-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/31_iterative_refinement.py ``` -**Why it's bad:** -- Scope is too large and undefined -- No details about authentication requirements -- No mention of existing code or patterns -- Could mean many different things +For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. -#### Good Example -``` -Add email/password login to our Express.js API. +## Related Resources -Requirements: -1. POST /api/auth/login endpoint -2. Accept email and password in request body -3. Validate against users in PostgreSQL database -4. Return JWT token on success, 401 on failure -5. Use bcrypt for password comparison (already in dependencies) +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -Follow the existing patterns in src/api/routes.js for route structure. -Use the existing db.query() helper in src/db/index.js for database access. +### Automated Code Review +Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review.md -Success criteria: I can call the endpoint with valid credentials -and receive a JWT token that works with our existing auth middleware. -``` +Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. -**Why it works:** -- Specific, scoped feature -- Clear technical requirements -- Points to existing patterns to follow -- Defines what "done" looks like +## Overview -### Code Review Examples +The OpenHands PR Review workflow is a GitHub Actions workflow that: -#### Bad Example +- **Triggers automatically** when PRs are opened or when you request a review +- **Analyzes code changes** in the context of your entire repository +- **Posts inline comments** directly on specific lines of code in the PR +- **Provides fast feedback** - typically within 2-3 minutes -``` -Review my code. -``` +## How It Works -**Why it's bad:** -- No code provided or referenced -- No indication of what to look for -- No context about the code's purpose -- No criteria for the review +The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: -#### Good Example +1. **Trigger**: The workflow runs when: + - A new non-draft PR is opened + - A draft PR is marked as ready for review + - The `review-this` label is added to a PR + - `openhands-agent` is requested as a reviewer -``` -Review this pull request for our payment processing module: +2. **Analysis**: The agent receives the complete PR diff and uses two skills: + - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices + - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API -Focus areas: -1. Security - we're handling credit card data -2. Error handling - payments must never silently fail -3. Idempotency - duplicate requests should be safe +3. **Output**: Review comments are posted directly on the PR with: + - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) + - Specific line references + - Actionable suggestions with code examples -Context: -- This integrates with Stripe API -- It's called from our checkout flow -- We have ~10,000 transactions/day +### Review Styles -Please flag any issues as Critical/Major/Minor with explanations. -``` +Choose between two review styles: -**Why it works:** -- Clear scope and focus areas -- Important context provided -- Business implications explained -- Requested output format specified +| Style | Description | Best For | +|-------|-------------|----------| +| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | +| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | -### Refactoring Examples +## Quick Start -#### Bad Example + + + Create `.github/workflows/pr-review-by-openhands.yml` in your repository: -``` -Make the code better. -``` + ```yaml + name: PR Review by OpenHands -**Why it's bad:** -- "Better" is subjective and undefined -- No specific problems identified -- No goals for the refactoring -- No constraints or requirements + on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] -#### Good Example + permissions: + contents: read + pull-requests: write + issues: write -``` -Refactor the UserService class in src/services/user.js: + jobs: + pr-review: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + review-style: standard + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} + ``` + -Problems to address: -1. The class is 500+ lines - split into smaller, focused services -2. Database queries are mixed with business logic - separate them -3. There's code duplication in the validation methods + + Go to your repository's **Settings → Secrets and variables → Actions** and add: + - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) + -Constraints: -- Keep the public API unchanged (other code depends on it) -- Maintain test coverage (run npm test after changes) -- Follow our existing service patterns in src/services/ + + Create a `review-this` label in your repository: + 1. Go to **Issues → Labels** + 2. Click **New label** + 3. Name: `review-this` + 4. Description: `Trigger OpenHands PR review` + -Goal: Improve maintainability while keeping the same functionality. -``` + + Open a PR and either: + - Add the `review-this` label, OR + - Request `openhands-agent` as a reviewer + + -**Why it works:** -- Specific problems identified -- Clear constraints and requirements -- Points to patterns to follow -- Measurable success criteria +## Composite Action -## Key Principles for Effective Instructions +The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: -### Be Specific +- Checking out the SDK at the specified version +- Setting up Python and dependencies +- Running the PR review agent +- Uploading logs as artifacts -Vague instructions produce vague results. Be concrete about: +### Action Inputs -| Instead of... | Say... | -|---------------|--------| -| "Fix the error" | "Fix the TypeError on line 45 of api.py" | -| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | -| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | -| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | +| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | -### Provide Context + +Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. + -Help OpenHands understand the bigger picture: +## Customization -``` -Context to include: -- What does this code do? (purpose) -- Who uses it? (users/systems) -- Why does this matter? (business impact) -- What constraints exist? (performance, compatibility) -- What patterns should be followed? (existing conventions) -``` +### Repository-Specific Review Guidelines -**Example with context:** +Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: -``` -Add rate limiting to our public API endpoints. +```markdown +--- +name: code-review +description: Custom code review guidelines for this repository +triggers: +- /codereview +--- -Context: -- This is a REST API serving mobile apps and third-party integrations -- We've been seeing abuse from web scrapers hitting us 1000+ times/minute -- Our infrastructure can handle 100 req/sec per client sustainably -- We use Redis (already available in the project) -- Our API follows the controller pattern in src/controllers/ +# Repository Code Review Guidelines -Requirement: Limit each API key to 100 requests per minute with -appropriate 429 responses and Retry-After headers. -``` +You are reviewing code for [Your Project Name]. Follow these guidelines: -### Set Clear Goals +## Review Decisions -Define what success looks like: +### When to APPROVE +- Configuration changes following existing patterns +- Documentation-only changes +- Test-only changes without production code changes +- Simple additions following established conventions -``` -Success criteria checklist: -✓ What specific outcome do you want? -✓ How will you verify it worked? -✓ What tests should pass? -✓ What should the user experience be? -``` +### When to COMMENT +- Issues that need attention (bugs, security concerns) +- Suggestions for improvement +- Questions about design decisions -**Example with clear goals:** +## Core Principles -``` -Implement password reset functionality. +1. **[Your Principle 1]**: Description +2. **[Your Principle 2]**: Description -Success criteria: -1. User can request reset via POST /api/auth/forgot-password -2. System sends email with secure reset link -3. Link expires after 1 hour -4. User can set new password via POST /api/auth/reset-password -5. Old sessions are invalidated after password change -6. All edge cases return appropriate error messages -7. Existing tests still pass, new tests cover the feature -``` +## What to Check -### Include Constraints +- **[Category 1]**: What to look for +- **[Category 2]**: What to look for -Specify what you can't or won't change: +## Repository Conventions -``` -Constraints to specify: -- API compatibility (can't break existing clients) -- Technology restrictions (must use existing stack) -- Performance requirements (must respond in <100ms) -- Security requirements (must not log PII) -- Time/scope limits (just this one file) +- Use [your linter] for style checking +- Follow [your style guide] +- Tests should be in [your test directory] ``` -## Common Pitfalls to Avoid + +The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. + -### Vague Requirements +### Workflow Configuration - - - ``` - Make the dashboard faster. - ``` - - - ``` - The dashboard takes 5 seconds to load. - - Profile it and optimize to load in under 1 second. - - Likely issues: - - N+1 queries in getWidgetData() - - Uncompressed images - - Missing database indexes - - Focus on the biggest wins first. - ``` - - +Customize the workflow by modifying the action inputs: -### Missing Context +```yaml +- name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + # Change the LLM model + llm-model: anthropic/claude-sonnet-4-5-20250929 + # Use a custom LLM endpoint + llm-base-url: https://your-llm-proxy.example.com + # Switch to "roasted" style for brutally honest reviews + review-style: roasted + # Pin to a specific SDK version for stability + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` - - - ``` - Add caching to the API. - ``` - - - ``` - Add caching to the product catalog API. - - Context: - - 95% of requests are for the same 1000 products - - Product data changes only via admin panel (rare) - - We already have Redis running for sessions - - Current response time is 200ms, target is <50ms - - Cache strategy: Cache product data in Redis with 5-minute TTL, - invalidate on product update. - ``` - - +### Trigger Customization -### Unrealistic Expectations +Modify when reviews are triggered by editing the workflow conditions: - - - ``` - Rewrite our entire backend from PHP to Go. - ``` - - - ``` - Create a Go microservice for the image processing currently in - src/php/ImageProcessor.php. - - This is the first step in our gradual migration. - The Go service should: - 1. Expose the same API endpoints - 2. Be deployable alongside the existing PHP app - 3. Include a feature flag to route traffic - - Start with just the resize and crop functions. - ``` - - +```yaml +# Only trigger on label (disable auto-review on PR open) +if: github.event.label.name == 'review-this' -### Incomplete Information +# Only trigger when specific reviewer is requested +if: github.event.requested_reviewer.login == 'openhands-agent' - - - ``` - The login is broken, fix it. - ``` - - - ``` - Users can't log in since yesterday's deployment. - - Symptoms: - - Login form submits but returns 500 error - - Server logs show: "Redis connection refused" - - Redis was moved to a new host yesterday - - The issue is likely in src/config/redis.js which may - have the old host hardcoded. - - Expected: Login should work with the new Redis at redis.internal:6380 - ``` - - +# Trigger on all PRs (including drafts) +if: | + github.event.action == 'opened' || + github.event.action == 'synchronize' +``` -## Best Practices +## Security Considerations -### Structure Your Instructions +The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. -Use clear structure for complex requests: + +**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. -``` -## Task -[One sentence describing what you want] +To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. + -## Background -[Context and why this matters] +## Example Reviews -## Requirements -1. [Specific requirement] -2. [Specific requirement] -3. [Specific requirement] +See real automated reviews in action on the OpenHands Software Agent SDK repository: -## Constraints -- [What you can't change] -- [What must be preserved] +| PR | Description | Review Highlights | +|----|-------------|-------------------| +| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | +| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | +| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | +| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | -## Success Criteria -- [How to verify it works] -``` +## Troubleshooting -### Provide Examples + + + - Ensure the `LLM_API_KEY` secret is set correctly + - Check that the label name matches exactly (`review-this`) + - Verify the workflow file is in `.github/workflows/` + - Check the Actions tab for workflow run errors + + + + - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission + - Check the workflow logs for API errors + - Verify the PR is not from a fork with restricted permissions + + + + - Large PRs may take longer to analyze + - Consider splitting large PRs into smaller ones + - Check if the LLM API is experiencing delays + + -Show what you want through examples: +## Related Resources -``` -Add input validation to the user registration endpoint. +- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews +- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows +- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud +- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills -Example of what validation errors should look like: +### Dependency Upgrades +Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md -{ - "error": "validation_failed", - "details": [ - {"field": "email", "message": "Invalid email format"}, - {"field": "password", "message": "Must be at least 8 characters"} - ] -} +Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. -Validate: -- email: valid format, not already registered -- password: min 8 chars, at least 1 number -- username: 3-20 chars, alphanumeric only -``` +## Overview -### Define Success Criteria +OpenHands helps with dependency management by: -Be explicit about what "done" means: +- **Analyzing dependencies**: Identifying outdated packages and their versions +- **Planning upgrades**: Creating upgrade strategies and migration guides +- **Implementing changes**: Updating code to handle breaking changes +- **Validating results**: Running tests and verifying functionality -``` -This task is complete when: -1. All existing tests pass (npm test) -2. New tests cover the added functionality -3. The feature works as described in the acceptance criteria -4. Code follows our style guide (npm run lint passes) -5. Documentation is updated if needed -``` +## Dependency Analysis Examples -### Iterate and Refine +### Identifying Outdated Dependencies -Build on previous work: +Start by understanding your current dependency state: ``` -In our last session, you added the login endpoint. - -Now add the logout functionality: -1. POST /api/auth/logout endpoint -2. Invalidate the current session token -3. Clear any server-side session data -4. Follow the same patterns used in login +Analyze the dependencies in this project and create a report: -The login implementation is in src/api/auth/login.js for reference. +1. List all direct dependencies with current and latest versions +2. Identify dependencies more than 2 major versions behind +3. Flag any dependencies with known security vulnerabilities +4. Highlight dependencies that are deprecated or unmaintained +5. Prioritize which updates are most important ``` -## Quick Reference - -| Element | Bad | Good | -|---------|-----|------| -| Location | "in the code" | "in src/api/users.py line 45" | -| Problem | "it's broken" | "TypeError when user.preferences is None" | -| Scope | "add authentication" | "add JWT-based login endpoint" | -| Behavior | "make it work" | "return 200 with user data on success" | -| Patterns | (none) | "follow patterns in src/services/" | -| Success | (none) | "all tests pass, endpoint returns correct data" | +**Example output:** - -The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. - +| Package | Current | Latest | Risk | Priority | +|---------|---------|--------|------|----------| +| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | +| react | 16.8.0 | 18.2.0 | Outdated | Medium | +| express | 4.17.1 | 4.18.2 | Minor update | Low | +| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | +### Security-Related Dependency Upgrades -# OpenHands in Your SDLC -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration +Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: -OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. +- Automating vulnerability detection and remediation +- Integrating with security scanners (Snyk, Dependabot, CodeQL) +- Building automated pipelines for security fixes +- Using OpenHands agents to create pull requests automatically -## Integration with Development Workflows +### Compatibility Checking -### Planning Phase +Check for compatibility issues before upgrading: -Use OpenHands during planning to accelerate technical decisions: +``` +Check compatibility for upgrading React from 16 to 18: -**Technical specification assistance:** +1. Review our codebase for deprecated React patterns +2. List all components using lifecycle methods +3. Identify usage of string refs or findDOMNode +4. Check third-party library compatibility with React 18 +5. Estimate the effort required for migration ``` -Create a technical specification for adding search functionality: -Requirements from product: -- Full-text search across products and articles -- Filter by category, price range, and date -- Sub-200ms response time at 1000 QPS +**Compatibility matrix:** -Provide: -1. Architecture options (Elasticsearch vs. PostgreSQL full-text) -2. Data model changes needed -3. API endpoint designs -4. Estimated implementation effort -5. Risks and mitigations -``` +| Dependency | React 16 | React 17 | React 18 | Action Needed | +|------------|----------|----------|----------|---------------| +| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | +| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | +| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | -**Sprint planning support:** -``` -Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: +## Automated Upgrade Examples -Story 1: As a user, I can reset my password via email -Story 2: As an admin, I can view user activity logs +### Version Updates -For each story, create: -- Technical subtasks -- Estimated effort (hours) -- Dependencies on other work -- Testing requirements -``` +Perform straightforward version updates: -### Development Phase + + + ``` + Update all patch and minor versions in package.json: + + 1. Review each update for changelog notes + 2. Update package.json with new versions + 3. Update package-lock.json + 4. Run the test suite + 5. List any deprecation warnings + ``` + + + ``` + Update dependencies in requirements.txt: + + 1. Check each package for updates + 2. Update requirements.txt with compatible versions + 3. Update requirements-dev.txt similarly + 4. Run tests and verify functionality + 5. Note any deprecation warnings + ``` + + + ``` + Update dependencies in pom.xml: + + 1. Check for newer versions of each dependency + 2. Update version numbers in pom.xml + 3. Run mvn dependency:tree to check conflicts + 4. Run the test suite + 5. Document any API changes encountered + ``` + + -OpenHands excels during active development: +### Breaking Change Handling -**Feature implementation:** -- Write new features with clear specifications -- Follow existing code patterns automatically -- Generate tests alongside code -- Create documentation as you go +When major versions introduce breaking changes: -**Bug fixing:** -- Analyze error logs and stack traces -- Identify root causes -- Implement fixes with regression tests -- Document the issue and solution +``` +Upgrade axios from v0.x to v1.x and handle breaking changes: -**Code improvement:** -- Refactor for clarity and maintainability -- Optimize performance bottlenecks -- Update deprecated APIs -- Improve error handling +1. List all breaking changes in axios 1.0 changelog +2. Find all axios usages in our codebase +3. For each breaking change: + - Show current code + - Show updated code + - Explain the change +4. Create a git commit for each logical change +5. Verify all tests pass +``` -### Testing Phase +**Example transformation:** -Automate test creation and improvement: +```javascript +// Before (axios 0.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const response = await axios.get('/users', { + cancelToken: source.token +}); +// After (axios 1.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const controller = new AbortController(); +const response = await axios.get('/users', { + signal: controller.signal +}); ``` -Add comprehensive tests for the UserService module: -Current coverage: 45% -Target coverage: 85% +### Code Adaptation -1. Analyze uncovered code paths using the codecov module -2. Write unit tests for edge cases -3. Add integration tests for API endpoints -4. Create test data factories -5. Document test scenarios +Adapt code to new API patterns: -Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). ``` +Migrate our codebase from moment.js to date-fns: -### Review Phase - -Accelerate code reviews: - +1. List all moment.js usages in our code +2. Map moment methods to date-fns equivalents +3. Update imports throughout the codebase +4. Handle any edge cases where APIs differ +5. Remove moment.js from dependencies +6. Verify all date handling still works correctly ``` -Review this PR for our coding standards: -Check for: -1. Security issues (SQL injection, XSS, etc.) -2. Performance concerns -3. Test coverage adequacy -4. Documentation completeness -5. Adherence to our style guide +**Migration map:** -Provide actionable feedback with severity ratings. -``` +| moment.js | date-fns | Notes | +|-----------|----------|-------| +| `moment()` | `new Date()` | Different return type | +| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | +| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | +| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | -### Deployment Phase +## Testing and Validation Examples -Assist with deployment preparation: +### Automated Test Execution + +Run comprehensive tests after upgrades: ``` -Prepare for production deployment: +After the dependency upgrades, validate the application: -1. Review all changes since last release -2. Check for breaking API changes -3. Verify database migrations are reversible -4. Update the changelog -5. Create release notes -6. Identify rollback steps if needed +1. Run the full test suite (unit, integration, e2e) +2. Check test coverage hasn't decreased +3. Run type checking (if applicable) +4. Run linting with new lint rule versions +5. Build the application for production +6. Report any failures with analysis ``` -## CI/CD Integration +### Integration Testing -OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. +Verify integrations still work: -### GitHub Actions Integration +``` +Test our integrations after upgrading the AWS SDK: -The Software Agent SDK provides composite GitHub Actions for common workflows: +1. Test S3 operations (upload, download, list) +2. Test DynamoDB operations (CRUD) +3. Test Lambda invocations +4. Test SQS send/receive +5. Compare behavior to before the upgrade +6. Note any subtle differences +``` -- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments -- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK +### Regression Detection -For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. +Detect regressions from upgrades: -### What You Can Automate +``` +Check for regressions after upgrading the ORM: -Using the SDK, you can create GitHub Actions workflows to: +1. Run database operation benchmarks +2. Compare query performance before and after +3. Verify all migrations still work +4. Check for any N+1 queries introduced +5. Validate data integrity in test database +6. Document any behavioral changes +``` -1. **Automatic code review** when a PR is opened -2. **Automatically update docs** weekly when new functionality is added -3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements -4. **Manage TODO comments** and track technical debt -5. **Assign reviewers** based on code ownership patterns +## Additional Examples -### Getting Started +### Security-Driven Upgrade -To integrate OpenHands into your CI/CD: +``` +We have a critical security vulnerability in jsonwebtoken. -1. Review the [SDK Getting Started guide](/sdk/getting-started) -2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) -3. Set up your `LLM_API_KEY` as a repository secret -4. Use the provided composite actions or build custom workflows +Current: jsonwebtoken@8.5.1 +Required: jsonwebtoken@9.0.0 -See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. +Perform the upgrade: +1. Check for breaking changes in v9 +2. Find all usages of jsonwebtoken in our code +3. Update any deprecated methods +4. Update the package version +5. Verify all JWT operations work +6. Run security tests +``` -## Team Workflows +### Framework Major Upgrade -### Solo Developer Workflows +``` +Upgrade our Next.js application from 12 to 14: -For individual developers: +Key areas to address: +1. App Router migration (pages -> app) +2. New metadata API +3. Server Components by default +4. New Image component +5. Route handlers replacing API routes -**Daily workflow:** -1. **Morning review**: Have OpenHands analyze overnight CI results -2. **Feature development**: Use OpenHands for implementation -3. **Pre-commit**: Request review before pushing -4. **Documentation**: Generate/update docs for changes +For each area: +- Show current implementation +- Show new implementation +- Test the changes +``` -**Best practices:** -- Set up automated reviews on all PRs -- Use OpenHands for boilerplate and repetitive tasks -- Keep AGENTS.md updated with project patterns +### Multi-Package Coordinated Upgrade -### Small Team Workflows +``` +Upgrade our React ecosystem packages together: -For teams of 2-10 developers: +Current: +- react: 17.0.2 +- react-dom: 17.0.2 +- react-router-dom: 5.3.0 +- @testing-library/react: 12.1.2 -**Collaborative workflow:** -``` -Team Member A: Creates feature branch, writes initial implementation -OpenHands: Reviews code, suggests improvements -Team Member B: Reviews OpenHands suggestions, approves or modifies -OpenHands: Updates documentation, adds missing tests -Team: Merges after final human review +Target: +- react: 18.2.0 +- react-dom: 18.2.0 +- react-router-dom: 6.x +- @testing-library/react: 14.x + +Create an upgrade plan that handles all these together, +addressing breaking changes in the correct order. ``` -**Communication integration:** -- Slack notifications for OpenHands findings -- Automatic issue creation for bugs found -- Weekly summary reports +## Related Resources -### Enterprise Team Workflows +- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities +- [Security Guide](/sdk/guides/security) - Security best practices for AI agents +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -For larger organizations: +### Incident Triage +Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md -**Governance and oversight:** -- Configure approval requirements for OpenHands changes -- Set up audit logging for all AI-assisted changes -- Define scope limits for automated actions -- Establish human review requirements +When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). -**Scale patterns:** -``` -Central Platform Team: -├── Defines OpenHands policies -├── Manages integrations -└── Monitors usage and quality + +This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). + -Feature Teams: -├── Use OpenHands within policies -├── Customize for team needs -└── Report issues to platform team -``` +## Overview -## Best Practices +Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. -### Code Review Integration +What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. -Set up effective automated reviews: +OpenHands accelerates incident response by: -```yaml -# .openhands/review-config.yml -review: - focus_areas: - - security - - performance - - test_coverage - - documentation - - severity_levels: - block_merge: - - critical - - security - require_response: - - major - informational: - - minor - - suggestion - - ignore_patterns: - - "*.generated.*" - - "vendor/*" -``` +- **Automated error analysis**: AI agents investigate errors and provide detailed reports +- **Root cause identification**: Connect symptoms to underlying issues in your codebase +- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues +- **Integration with monitoring tools**: Work directly with platforms like Datadog -### Pull Request Automation +## Automated Datadog Error Analysis -Automate common PR tasks: +The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. -| Trigger | Action | -|---------|--------| -| PR opened | Auto-review, label by type | -| Tests fail | Analyze failures, suggest fixes | -| Coverage drops | Identify missing tests | -| PR approved | Update changelog, check docs | +### How It Works -### Quality Gates +[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. -Define automated quality gates: +[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. -```yaml -quality_gates: - - name: test_coverage - threshold: 80% - action: block_merge - - - name: security_issues - threshold: 0 critical - action: block_merge - - - name: code_review_score - threshold: 7/10 - action: require_review - - - name: documentation - requirement: all_public_apis - action: warn -``` +### Triggering Automated Debugging -### Automated Testing +The GitHub Actions workflow can be triggered in two ways: -Integrate OpenHands with your testing strategy: +1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. -**Test generation triggers:** -- New code without tests -- Coverage below threshold -- Bug fix without regression test -- API changes without contract tests +2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. -**Example workflow:** -```yaml -on: - push: - branches: [main] +### Automated Investigation Process -jobs: - ensure-coverage: - steps: - - name: Check coverage - run: | - COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') - if [ "$COVERAGE" -lt "80" ]; then - openhands generate-tests --target 80 - fi -``` +When the workflow runs, it automatically performs the following steps: -## Common Integration Patterns +1. Get detailed info from the DataDog API +2. Create or find an existing GitHub issue to track the error +3. Clone all relevant repositories to get full code context +4. Run an OpenHands agent to analyze the error and investigate the code +5. Post the findings as a comment on the GitHub issue -### Pre-Commit Hooks +The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. -Run OpenHands checks before commits: + +The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. + -```bash -# .git/hooks/pre-commit -#!/bin/bash +## Setting Up the Workflow -# Quick code review -openhands review --quick --staged-only +To set up automated Datadog debugging in your own repository: -if [ $? -ne 0 ]; then - echo "OpenHands found issues. Review and fix before committing." - exit 1 -fi -``` +1. Copy the workflow file to `.github/workflows/` in your repository +2. Configure the required secrets (Datadog API keys, LLM API key) +3. Customize the default queries and repository lists for your needs +4. Run the workflow manually or set up scheduled runs -### Post-Commit Actions +The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. -Automate tasks after commits: +Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. -```yaml -# .github/workflows/post-commit.yml -on: - push: - branches: [main] +## Manual Incident Investigation -jobs: - update-docs: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - name: Update API docs - run: openhands update-docs --api - - name: Commit changes - run: | - git add docs/ - git commit -m "docs: auto-update API documentation" || true - git push -``` +You can also use OpenHands directly to investigate incidents without the automated workflow. -### Scheduled Tasks +### Log Analysis -Run regular maintenance: +OpenHands can analyze logs to identify patterns and anomalies: -```yaml -# Weekly dependency check -on: - schedule: - - cron: '0 9 * * 1' # Monday 9am +``` +Analyze these application logs for the incident that occurred at 14:32 UTC: -jobs: - dependency-review: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - name: Check dependencies - run: | - openhands check-dependencies --security --outdated - - name: Create issues - run: openhands create-issues --from-report deps.json +1. Identify the first error or warning that appeared +2. Trace the sequence of events leading to the failure +3. Find any correlated errors across services +4. Identify the user or request that triggered the issue +5. Summarize the timeline of events ``` -### Event-Triggered Workflows +**Log analysis capabilities:** -You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. +| Log Type | Analysis Capabilities | +|----------|----------------------| +| Application logs | Error patterns, exception traces, timing anomalies | +| Access logs | Traffic patterns, slow requests, error responses | +| System logs | Resource exhaustion, process crashes, system errors | +| Database logs | Slow queries, deadlocks, connection issues | -For more event-driven automation patterns, see: -- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events -- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage +### Stack Trace Analysis +Deep dive into stack traces: -# When to Use OpenHands -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands +``` +Analyze this stack trace from our production error: -OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. +[paste full stack trace] -## Task Complexity Guidance +1. Identify the exception type and message +2. Trace back to our code (not framework code) +3. Identify the likely cause +4. Check if this code path has changed recently +5. Suggest a fix +``` -### Simple Tasks +**Multi-language support:** -**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. + + + ``` + Analyze this Java exception: + + java.lang.OutOfMemoryError: Java heap space + at java.util.Arrays.copyOf(Arrays.java:3210) + at java.util.ArrayList.grow(ArrayList.java:265) + at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) + + Identify: + 1. What operation is consuming memory? + 2. Is there a memory leak or just too much data? + 3. What's the fix? + ``` + + + ``` + Analyze this Python traceback: + + Traceback (most recent call last): + File "app/api/orders.py", line 45, in create_order + order = OrderService.create(data) + File "app/services/order.py", line 89, in create + inventory.reserve(item_id, quantity) + AttributeError: 'NoneType' object has no attribute 'reserve' + + What's None and why? + ``` + + + ``` + Analyze this Node.js error: + + TypeError: Cannot read property 'map' of undefined + at processItems (/app/src/handlers/items.js:23:15) + at async handleRequest (/app/src/api/router.js:45:12) + + What's undefined and how should we handle it? + ``` + + -- Adding a new function or method -- Writing unit tests for existing code -- Fixing simple bugs with clear error messages -- Code formatting and style fixes -- Adding documentation or comments -- Simple refactoring (rename, extract method) -- Configuration changes +### Root Cause Analysis -**Example prompt:** -``` -Add a calculateDiscount() function to src/utils/pricing.js that takes -a price and discount percentage, returns the discounted price. -Add unit tests. -``` +Identify the underlying cause of an incident: -### Medium Complexity Tasks +``` +Perform root cause analysis for this incident: -**Good for OpenHands** — These tasks may need more context and possibly some iteration. +Symptoms: +- API response times increased 5x at 14:00 +- Error rate jumped from 0.1% to 15% +- Database CPU spiked to 100% -- Implementing a new API endpoint -- Adding a feature to an existing module -- Debugging issues that span multiple files -- Migrating code to a new pattern -- Writing integration tests -- Performance optimization with clear metrics -- Setting up CI/CD workflows +Available data: +- Application metrics (Grafana dashboard attached) +- Recent deployments: v2.3.1 deployed at 13:45 +- Database slow query log (attached) -**Example prompt:** -``` -Add a user profile endpoint to our API: -- GET /api/users/:id/profile -- Return user data with their recent activity -- Follow patterns in existing controllers -- Add integration tests -- Handle not-found and unauthorized cases +Identify the root cause using the 5 Whys technique. ``` -### Complex Tasks +## Common Incident Patterns -**May require iteration** — These benefit from breaking down into smaller pieces. +OpenHands can recognize and help diagnose these common patterns: -- Large refactoring across many files -- Architectural changes -- Implementing complex business logic -- Multi-service integrations -- Performance optimization without clear cause -- Security audits -- Framework or major dependency upgrades +- **Connection pool exhaustion**: Increasing connection errors followed by complete failure +- **Memory leaks**: Gradual memory increase leading to OOM +- **Cascading failures**: One service failure triggering others +- **Thundering herd**: Simultaneous requests overwhelming a service +- **Split brain**: Inconsistent state across distributed components -**Recommended approach:** -``` -Break large tasks into phases: +## Quick Fix Generation -Phase 1: "Analyze the current authentication system and document -all touch points that need to change for OAuth2 migration." +Once the root cause is identified, generate fixes: -Phase 2: "Implement the OAuth2 provider configuration and basic -token flow, keeping existing auth working in parallel." +``` +We've identified the root cause: a missing null check in OrderProcessor.java line 156. -Phase 3: "Migrate the user login flow to use OAuth2, maintaining -backwards compatibility." +Generate a fix that: +1. Adds proper null checking +2. Logs when null is encountered +3. Returns an appropriate error response +4. Includes a unit test for the edge case +5. Is minimally invasive for a hotfix ``` -## Best Use Cases +## Best Practices -### Ideal Scenarios +### Investigation Checklist -OpenHands is **most effective** when: +Use this checklist when investigating: -| Scenario | Why It Works | -|----------|--------------| -| Clear requirements | OpenHands can work independently | -| Well-defined scope | Less ambiguity, fewer iterations | -| Existing patterns to follow | Consistency with codebase | -| Good test coverage | Easy to verify changes | -| Isolated changes | Lower risk of side effects | - -**Perfect use cases:** - -- **Bug fixes with reproduction steps**: Clear problem, measurable solution -- **Test additions**: Existing code provides the specification -- **Documentation**: Code is the source of truth -- **Boilerplate generation**: Follows established patterns -- **Code review and analysis**: Read-only, analytical tasks - -### Good Fit Scenarios - -OpenHands works **well with some guidance** for: - -- **Feature implementation**: When requirements are documented -- **Refactoring**: When goals and constraints are clear -- **Debugging**: When you can provide logs and context -- **Code modernization**: When patterns are established -- **API development**: When specs exist - -**Tips for these scenarios:** +1. **Scope the impact** + - How many users affected? + - What functionality is broken? + - What's the business impact? -1. Provide clear acceptance criteria -2. Point to examples of similar work in the codebase -3. Specify constraints and non-goals -4. Be ready to iterate and clarify +2. **Establish timeline** + - When did it start? + - What changed around that time? + - Is it getting worse or stable? -### Poor Fit Scenarios +3. **Gather data** + - Application logs + - Infrastructure metrics + - Recent deployments + - Configuration changes -**Consider alternatives** when: +4. **Form hypotheses** + - List possible causes + - Rank by likelihood + - Test systematically -| Scenario | Challenge | Alternative | -|----------|-----------|-------------| -| Vague requirements | Unclear what "done" means | Define requirements first | -| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | -| Highly sensitive code | Risk tolerance is zero | Human review essential | -| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | -| Visual design | Subjective aesthetic judgments | Use design tools | +5. **Implement fix** + - Choose safest fix + - Test before deploying + - Monitor after deployment -**Red flags that a task may not be suitable:** +### Common Pitfalls -- "Make it look better" (subjective) -- "Figure out what's wrong" (too vague) -- "Rewrite everything" (too large) -- "Do what makes sense" (unclear requirements) -- Changes to production infrastructure without review + +Avoid these common incident response mistakes: -## Limitations +- **Jumping to conclusions**: Gather data before assuming the cause +- **Changing multiple things**: Make one change at a time to isolate effects +- **Not documenting**: Record all actions for the post-mortem +- **Ignoring rollback**: Always have a rollback plan before deploying fixes + -### Current Limitations + +For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. + -Be aware of these constraints: +## Related Resources -- **Long-running processes**: Sessions have time limits -- **Interactive debugging**: Can't set breakpoints interactively -- **Visual verification**: Can't see rendered UI easily -- **External system access**: May need credentials configured -- **Large codebase analysis**: Memory and time constraints +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -### Technical Constraints +### Spark Migrations +Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md -| Constraint | Impact | Workaround | -|------------|--------|------------| -| Session duration | Very long tasks may timeout | Break into smaller tasks | -| Context window | Can't see entire large codebase at once | Focus on relevant files | -| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | -| Network access | Some external services may be blocked | Use local resources when possible | +Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. -### Scope Boundaries +## Overview -OpenHands works within your codebase but has boundaries: +Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. -**Can do:** -- Read and write files in the repository -- Run tests and commands -- Access configured services and APIs -- Browse documentation and reference material +Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. -**Cannot do:** -- Access your local environment outside the sandbox -- Make decisions requiring business context it doesn't have -- Replace human judgment for critical decisions -- Guarantee production-safe changes without review +Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. -## Pre-Task Checklist +Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: -### Prerequisites +1. **Understanding**: Analyze the existing codebase to identify what needs to change and why +2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences +3. **Validation**: Verify that migrated pipelines produce identical results to the originals -Before starting a task, ensure: +In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. -- [ ] Clear description of what you want -- [ ] Expected outcome is defined -- [ ] Relevant files are identified -- [ ] Dependencies are available -- [ ] Tests can be run +## Understanding -### Environment Setup +Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. -Prepare your repository: +Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. -```markdown -## AGENTS.md Checklist +If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: -- [ ] Build commands documented -- [ ] Test commands documented -- [ ] Code style guidelines noted -- [ ] Architecture overview included -- [ ] Common patterns described ``` +Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. -See [Repository Setup](/openhands/usage/customization/repository) for details. - -### Repository Preparation - -Optimize for success: - -1. **Clean state**: Commit or stash uncommitted changes -2. **Working build**: Ensure the project builds -3. **Passing tests**: Start from a green state -4. **Updated dependencies**: Resolve any dependency issues -5. **Clear documentation**: Update AGENTS.md if needed - -## Post-Task Review - -### Quality Checks - -After OpenHands completes a task: - -- [ ] Review all changed files -- [ ] Understand each change made -- [ ] Check for unintended modifications -- [ ] Verify code style consistency -- [ ] Look for hardcoded values or credentials - -### Validation Steps - -1. **Run tests**: `npm test`, `pytest`, etc. -2. **Check linting**: Ensure style compliance -3. **Build the project**: Verify it still compiles -4. **Manual testing**: Test the feature yourself -5. **Edge cases**: Try unusual inputs - -### Learning from Results - -After each significant task: +Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. -**What went well?** -- Note effective prompt patterns -- Document successful approaches -- Update AGENTS.md with learnings +Then, for each source file, identify -**What could improve?** -- Identify unclear instructions -- Note missing context -- Plan better for next time +1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) +2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) +3. Configuration properties that have changed defaults or been renamed +4. Dependencies that need version updates -**Update your repository:** -```markdown -## Things OpenHands Should Know (add to AGENTS.md) +Save the results in `migration_inventory.json` in the following format: -- When adding API endpoints, always add to routes/index.js -- Our date format is ISO 8601 everywhere -- All database queries go through the repository pattern +{ + ..., + "src/main/scala/etl/TransformJob.scala": { + "deprecated_apis": [ + {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} + ], + "behavioral_changes": [ + {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} + ], + "config_changes": [], + "risk": "medium" + }, + ... +} ``` -## Decision Framework +Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. -Use this framework to decide if a task is right for OpenHands: +## Migration -``` -Is the task well-defined? -├── No → Define it better first -└── Yes → Continue +With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. -Do you have clear success criteria? -├── No → Define acceptance criteria -└── Yes → Continue +To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. -Is the scope manageable (< 100 LOC)? -├── No → Break into smaller tasks -└── Yes → Continue +The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. -Do examples exist in the codebase? -├── No → Provide examples or patterns -└── Yes → Continue +An example prompt is below: -Can you verify the result? -├── No → Add tests or verification steps -└── Yes → ✅ Good candidate for OpenHands ``` +Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. -OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! - -But it can be particularly useful for certain types of tasks. For instance: +Use `migration_inventory.json` to guide the changes. -- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. -- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. -- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. -- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. -- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. +For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. -There are also some tasks where agent struggle a little more. +Requirements: +1. Replace all deprecated APIs with their Spark 3.0 equivalents +2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) +3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions +4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical +5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists +6. Update import statements for any relocated classes +7. Preserve all existing business logic and output schemas +``` -- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. -- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. +Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. +## Validation -# Tutorial Library -Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials +Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. -Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. +The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. -## Categories Overview +An example prompt for data-level comparison: -| Category | Best For | Complexity | -|----------|----------|------------| -| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | -| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | -| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | -| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | -| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | -| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | +``` +Validate the migrated Spark application in `/src` against the original. - -For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. - +1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` +2. Compare outputs: + - Row counts must match exactly + - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns + - Flag any NULL handling differences +3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments +4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory -## Task Complexity Guidance +Save the results in `validation_report.json` in the following format: -Before starting, assess your task's complexity: +{ + "jobs": [ + { + "name": "daily_etl", + "data_match": true, + "row_count": {"v2": 1000000, "v3": 1000000}, + "column_diffs": [], + "performance": { + "duration_seconds": {"v2": 340, "v3": 285}, + "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} + } + }, + ... + ] +} +``` -**Simple tasks** (5-15 minutes): -- Single file changes -- Clear, well-defined requirements -- Existing patterns to follow +Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. -**Medium tasks** (15-45 minutes): -- Multiple file changes -- Some discovery required -- Integration with existing code +Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. -**Complex tasks** (45+ minutes): -- Architectural changes -- Multiple components -- Requires iteration +## Beyond Version Upgrades - -Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. - +While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: -## Best Use Cases +- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. +- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. -OpenHands excels at: +In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. -- **Repetitive tasks**: Boilerplate code, test generation -- **Pattern application**: Following established conventions -- **Analysis**: Code review, debugging, documentation -- **Exploration**: Understanding new codebases +## Related Resources -## Example Tutorials by Category +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -### Testing +### Vulnerability Remediation +Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md -#### Tutorial: Add Unit Tests for a Module +Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. -**Goal**: Achieve 80%+ test coverage for a service module +## The Challenge -**Prompt**: -``` -Add unit tests for the UserService class in src/services/user.js. +The traditional approach to vulnerability remediation is manual and time-consuming: -Current coverage: 35% -Target coverage: 80% +1. Scan repositories for vulnerabilities +2. Review each vulnerability and its impact +3. Research the fix (usually a version upgrade) +4. Update dependency files +5. Test the changes +6. Create pull requests +7. Get reviews and merge -Requirements: -1. Test all public methods -2. Cover edge cases (null inputs, empty arrays, etc.) -3. Mock external dependencies (database, API calls) -4. Follow our existing test patterns in tests/services/ -5. Use Jest as the testing framework +This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. -Focus on these methods: -- createUser() -- updateUser() -- deleteUser() -- getUserById() -``` +**What if we could automate this entire process using AI agents?** -**What OpenHands does**: -1. Analyzes the UserService class -2. Identifies untested code paths -3. Creates test file with comprehensive tests -4. Mocks dependencies appropriately -5. Runs tests to verify they pass +## Automated Vulnerability Remediation with OpenHands -**Tips**: -- Provide existing test files as examples -- Specify the testing framework -- Mention any mocking conventions +The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. ---- +OpenHands assists with vulnerability remediation by: -#### Tutorial: Add Integration Tests for an API +- **Identifying vulnerabilities**: Analyzing code for common security issues +- **Understanding impact**: Explaining the risk and exploitation potential +- **Implementing fixes**: Generating secure code to address vulnerabilities +- **Validating remediation**: Verifying fixes are effective and complete -**Goal**: Test API endpoints end-to-end +## Two Approaches to Vulnerability Fixing -**Prompt**: -``` -Add integration tests for the /api/products endpoints. +### 1. Point to a GitHub Repository -Endpoints to test: -- GET /api/products (list all) -- GET /api/products/:id (get one) -- POST /api/products (create) -- PUT /api/products/:id (update) -- DELETE /api/products/:id (delete) +Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. -Requirements: -1. Use our test database (configured in jest.config.js) -2. Set up and tear down test data properly -3. Test success cases and error cases -4. Verify response bodies and status codes -5. Follow patterns in tests/integration/ -``` +### 2. Upload Security Scanner Reports ---- +Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. -### Data Analysis +This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. -#### Tutorial: Create a Data Processing Script +## Architecture Overview -**Goal**: Process CSV data and generate a report +A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. -**Prompt**: -``` -Create a Python script to analyze our sales data. +The key architectural components include: -Input: sales_data.csv with columns: date, product, quantity, price, region +- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) +- **WebSocket interface**: Enables real-time status updates on agent actions and operations +- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider +- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud -Requirements: -1. Load and validate the CSV data -2. Calculate: - - Total revenue by product - - Monthly sales trends - - Top 5 products by quantity - - Revenue by region -3. Generate a summary report (Markdown format) -4. Create visualizations (bar chart for top products, line chart for trends) -5. Save results to reports/ directory +This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. -Use pandas for data processing and matplotlib for charts. -``` +## Example: Vulnerability Fixer Application -**What OpenHands does**: -1. Creates a Python script with proper structure -2. Implements data loading with validation -3. Calculates requested metrics -4. Generates formatted report -5. Creates and saves visualizations +An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: ---- +1. User points to a repository or uploads a security scan report +2. Agent analyzes the vulnerabilities +3. Agent creates fixes and pull requests automatically +4. User reviews and merges the changes -#### Tutorial: Database Query Analysis +## Security Scanning Integration -**Goal**: Analyze and optimize slow database queries +Use OpenHands to analyze security scanner output: -**Prompt**: ``` -Analyze our slow query log and identify optimization opportunities. - -File: logs/slow_queries.log +We ran a security scan and found these issues. Analyze each one: -For each slow query: -1. Explain why it's slow -2. Suggest index additions if helpful -3. Rewrite the query if it can be optimized -4. Estimate the improvement +1. SQL Injection in src/api/users.py:45 +2. XSS in src/templates/profile.html:23 +3. Hardcoded credential in src/config/database.py:12 +4. Path traversal in src/handlers/files.py:67 -Create a report in reports/query_optimization.md with: -- Summary of findings -- Prioritized recommendations -- SQL for suggested changes +For each vulnerability: +- Explain what the vulnerability is +- Show how it could be exploited +- Rate the severity (Critical/High/Medium/Low) +- Suggest a fix ``` ---- - -### Web Scraping +## Common Vulnerability Patterns -#### Tutorial: Build a Web Scraper +OpenHands can detect these common vulnerability patterns: -**Goal**: Extract product data from a website +| Vulnerability | Pattern | Example | +|--------------|---------|---------| +| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | +| XSS | Unescaped user input in HTML | `
${user_comment}
` | +| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | +| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | +| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | -**Prompt**: -``` -Create a web scraper to extract product information from our competitor's site. +## Automated Remediation -Target URL: https://example-store.com/products +### Applying Security Patches -Extract for each product: -- Name -- Price -- Description -- Image URL -- SKU (if available) +Fix identified vulnerabilities: -Requirements: -1. Use Python with BeautifulSoup or Scrapy -2. Handle pagination (site has 50 pages) -3. Respect rate limits (1 request/second) -4. Save results to products.json -5. Handle errors gracefully -6. Log progress to console + + + ``` + Fix the SQL injection vulnerability in src/api/users.py: + + Current code: + query = f"SELECT * FROM users WHERE id = {user_id}" + cursor.execute(query) + + Requirements: + 1. Use parameterized queries + 2. Add input validation + 3. Maintain the same functionality + 4. Add a test case for the fix + ``` + + **Fixed code:** + ```python + # Using parameterized query + query = "SELECT * FROM users WHERE id = %s" + cursor.execute(query, (user_id,)) + ``` + + + ``` + Fix the XSS vulnerability in src/templates/profile.html: + + Current code: +
${user.bio}
+ + Requirements: + 1. Properly escape user content + 2. Consider Content Security Policy + 3. Handle rich text if needed + 4. Test with malicious input + ``` + + **Fixed code:** + ```html + +
{{ user.bio | escape }}
+ ``` +
+ + ``` + Fix the command injection in src/utils/network.py: + + Current code: + def ping_host(hostname): + os.system(f"ping -c 1 {hostname}") + + Requirements: + 1. Use safe subprocess calls + 2. Validate input format + 3. Avoid shell=True + 4. Handle errors properly + ``` + + **Fixed code:** + ```python + import subprocess + import re + + def ping_host(hostname): + # Validate hostname format + if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): + raise ValueError("Invalid hostname") + + # Use subprocess without shell + result = subprocess.run( + ["ping", "-c", "1", hostname], + capture_output=True, + text=True + ) + return result.returncode == 0 + ``` + +
-Include a README with usage instructions. -``` +### Code-Level Vulnerability Fixes -**Tips**: -- Specify rate limiting requirements -- Mention error handling expectations -- Request logging for debugging +Fix application-level security issues: ---- +``` +Fix the broken access control in our API: -### Code Review +Issue: Users can access other users' data by changing the ID in the URL. - -For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). - +Current code: +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int): + return db.get_documents(user_id) -#### Tutorial: Security-Focused Code Review +Requirements: +1. Add authorization check +2. Verify requesting user matches or is admin +3. Return 403 for unauthorized access +4. Log access attempts +5. Add tests for authorization +``` -**Goal**: Identify security vulnerabilities in a PR +**Fixed code:** -**Prompt**: +```python +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int, current_user: User = Depends(get_current_user)): + # Check authorization + if current_user.id != user_id and not current_user.is_admin: + logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") + raise HTTPException(status_code=403, detail="Not authorized") + + return db.get_documents(user_id) ``` -Review this pull request for security issues: -Focus areas: -1. Input validation - check all user inputs are sanitized -2. Authentication - verify auth checks are in place -3. SQL injection - check for parameterized queries -4. XSS - verify output encoding -5. Sensitive data - ensure no secrets in code +## Security Testing -For each issue found, provide: -- File and line number -- Severity (Critical/High/Medium/Low) -- Description of the vulnerability -- Suggested fix with code example +Test your fixes thoroughly: -Output format: Markdown suitable for PR comments ``` +Create security tests for the SQL injection fix: ---- +1. Test with normal input +2. Test with SQL injection payloads: + - ' OR '1'='1 + - '; DROP TABLE users; -- + - UNION SELECT * FROM passwords +3. Test with special characters +4. Test with null/empty input +5. Verify error handling doesn't leak information +``` -#### Tutorial: Performance Review +## Automated Remediation Pipeline -**Goal**: Identify performance issues in code +Create an end-to-end automated pipeline: -**Prompt**: ``` -Review the OrderService class for performance issues. +Create an automated vulnerability remediation pipeline: -File: src/services/order.js +1. Parse Snyk/Dependabot/CodeQL alerts +2. Categorize by severity and type +3. For each vulnerability: + - Create a branch + - Apply the fix + - Run tests + - Create a PR with: + - Description of vulnerability + - Fix applied + - Test results +4. Request review from security team +5. Auto-merge low-risk fixes after tests pass +``` -Check for: -1. N+1 database queries -2. Missing indexes (based on query patterns) -3. Inefficient loops or algorithms -4. Missing caching opportunities -5. Unnecessary data fetching +## Building Your Own Vulnerability Fixer -For each issue: -- Explain the impact -- Show the problematic code -- Provide an optimized version -- Estimate the improvement -``` +The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. ---- +To build your own vulnerability remediation agent: -### Bug Fixing +1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent +2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) +3. Configure the agent to create pull requests automatically +4. Set up human review workflows for critical fixes + +As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. + +## Related Resources + +- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example +- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents +- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Windows Without WSL +Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl.md -For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. + This way of running OpenHands is not officially supported. It is maintained by the community and may not work. -#### Tutorial: Fix a Crash Bug +# Running OpenHands GUI on Windows Without WSL -**Goal**: Diagnose and fix an application crash +This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. -**Prompt**: -``` -Fix the crash in the checkout process. +## Prerequisites -Error: -TypeError: Cannot read property 'price' of undefined - at calculateTotal (src/checkout/calculator.js:45) - at processOrder (src/checkout/processor.js:23) +1. **Windows 10/11** - A modern Windows operating system +2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) +3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet +4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) +5. **Git** - For cloning the repository and version control +6. **Node.js and npm** - For running the frontend -Steps to reproduce: -1. Add item to cart -2. Apply discount code "SAVE20" -3. Click checkout -4. Crash occurs +## Step 1: Install Required Software -The bug was introduced in commit abc123 (yesterday's deployment). +1. **Install Python 3.12 or 3.13** + - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) + - During installation, check "Add Python to PATH" + - Verify installation by opening PowerShell and running: + ```powershell + python --version + ``` -Requirements: -1. Identify the root cause -2. Fix the bug -3. Add a regression test -4. Verify the fix doesn't break other functionality -``` +2. **Install PowerShell 7** + - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) + - Choose the MSI installer appropriate for your system (x64 for most modern computers) + - Run the installer with default options + - Verify installation by opening a new terminal and running: + ```powershell + pwsh --version + ``` + - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors -**What OpenHands does**: -1. Analyzes the stack trace -2. Reviews recent changes -3. Identifies the null reference issue -4. Implements a defensive fix -5. Creates test to prevent regression +3. **Install .NET Core Runtime** + - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose the latest .NET Core Runtime (not SDK) + - Verify installation by opening PowerShell and running: + ```powershell + dotnet --info + ``` + - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. ---- +4. **Install Git** + - Download Git from [git-scm.com](https://git-scm.com/download/win) + - Use default installation options + - Verify installation: + ```powershell + git --version + ``` -#### Tutorial: Fix a Memory Leak +5. **Install Node.js and npm** + - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) + - During installation, accept the default options which will install npm as well + - Verify installation: + ```powershell + node --version + npm --version + ``` -**Goal**: Identify and fix a memory leak +6. **Install Poetry** + - Open PowerShell as Administrator and run: + ```powershell + (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - + ``` + - Add Poetry to your PATH: + ```powershell + $env:Path += ";$env:APPDATA\Python\Scripts" + ``` + - Verify installation: + ```powershell + poetry --version + ``` -**Prompt**: -``` -Investigate and fix the memory leak in our Node.js application. +## Step 2: Clone and Set Up OpenHands -Symptoms: -- Memory usage grows 100MB/hour -- After 24 hours, app becomes unresponsive -- Restarting temporarily fixes the issue +1. **Clone the Repository** + ```powershell + git clone https://github.com/OpenHands/OpenHands.git + cd OpenHands + ``` -Suspected areas: -- Event listeners in src/events/ -- Cache implementation in src/cache/ -- WebSocket connections in src/ws/ +2. **Install Dependencies** + ```powershell + poetry install + ``` -Analyze these areas and: -1. Identify the leak source -2. Explain why it's leaking -3. Implement a fix -4. Add monitoring to detect future leaks -``` + This will install all required dependencies, including: + - pythonnet - Required for Windows PowerShell integration + - All other OpenHands dependencies ---- +## Step 3: Run OpenHands -### Feature Development +1. **Build the Frontend** + ```powershell + cd frontend + npm install + npm run build + cd .. + ``` -#### Tutorial: Add a REST API Endpoint + This will build the frontend files that the backend will serve. -**Goal**: Create a new API endpoint with full functionality +2. **Start the Backend** + ```powershell + # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell + pwsh + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` -**Prompt**: -``` -Add a user preferences API endpoint. + This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. -Endpoint: /api/users/:id/preferences + > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. -Operations: -- GET: Retrieve user preferences -- PUT: Update user preferences -- PATCH: Partially update preferences + > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. -Preferences schema: -{ - theme: "light" | "dark", - notifications: { email: boolean, push: boolean }, - language: string, - timezone: string -} +3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** + ```powershell + cd frontend + npm run dev + ``` -Requirements: -1. Follow patterns in src/api/routes/ -2. Add request validation with Joi -3. Use UserPreferencesService for business logic -4. Add appropriate error handling -5. Document the endpoint in OpenAPI format -6. Add unit and integration tests -``` +4. **Access the OpenHands GUI** -**What OpenHands does**: -1. Creates route handler following existing patterns -2. Implements validation middleware -3. Creates or updates the service layer -4. Adds error handling -5. Generates API documentation -6. Creates comprehensive tests + Open your browser and navigate to: + ``` + http://localhost:3000 + ``` ---- + > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` -#### Tutorial: Implement a Feature Flag System +## Installing and Running the CLI -**Goal**: Add feature flags to the application +To install and run the OpenHands CLI on Windows without WSL, follow these steps: -**Prompt**: +### 1. Install uv (Python Package Manager) + +Open PowerShell as Administrator and run: + +```powershell +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" ``` -Implement a feature flag system for our application. -Requirements: -1. Create a FeatureFlags service -2. Support these flag types: - - Boolean (on/off) - - Percentage (gradual rollout) - - User-based (specific user IDs) -3. Load flags from environment variables initially -4. Add a React hook: useFeatureFlag(flagName) -5. Add middleware for API routes +### 2. Install .NET SDK (Required) -Initial flags to configure: -- new_checkout: boolean, default false -- dark_mode: percentage, default 10% -- beta_features: user-based +The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: -Include documentation and tests. +```powershell +winget install Microsoft.DotNet.SDK.8 ``` ---- +Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). -## Contributing Tutorials +After installation, restart your PowerShell session to ensure the environment variables are updated. -Have a great use case? Share it with the community! +### 3. Install and Run OpenHands -**What makes a good tutorial:** -- Solves a common problem -- Has clear, reproducible steps -- Includes example prompts -- Explains expected outcomes -- Provides tips for success +After installing the prerequisites, install OpenHands with: -**How to contribute:** -1. Create a detailed example following this format -2. Test it with OpenHands to verify it works -3. Submit via GitHub pull request to the docs repository -4. Include any prerequisites or setup required +```powershell +uv tool install openhands --python 3.12 +``` - -These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. - +Then run OpenHands: +```powershell +openhands +``` -# Key Features -Source: https://docs.openhands.dev/openhands/usage/key-features +To upgrade OpenHands in the future: - - - - Displays the conversation between the user and OpenHands. - - OpenHands explains its actions in this panel. +```powershell +uv tool upgrade openhands --python 3.12 +``` - ![overview](/openhands/static/img/chat-panel.png) - - - - Shows the file changes performed by OpenHands. +### Troubleshooting CLI Issues - ![overview](/openhands/static/img/changes-tab.png) - - - - Embedded VS Code for browsing and modifying files. - - Can also be used to upload and download files. +#### CoreCLR Error - ![overview](/openhands/static/img/vs-tab.png) - - - - A space for OpenHands and users to run terminal commands. +If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: - ![overview](/openhands/static/img/terminal-tab.png) - - - - Displays the web server when OpenHands runs an application. - - Users can interact with the running application. +1. Install the .NET SDK as described in step 2 above +2. Verify that your system PATH includes the .NET SDK directories +3. Restart your PowerShell session completely after installing the .NET SDK +4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell - ![overview](/openhands/static/img/app-tab.png) - - - - Used by OpenHands to browse websites. - - The browser is non-interactive. +To verify your .NET installation, run: - ![overview](/openhands/static/img/browser-tab.png) - - +```powershell +dotnet --info +``` +This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. -# Azure -Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms +If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). -## Azure OpenAI Configuration +## Limitations on Windows -When running OpenHands, you'll need to set the following environment variable using `-e` in the -docker run command: +When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: -``` -LLM_API_VERSION="" # e.g. "2023-05-15" -``` +1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. -Example: -```bash -docker run -it --pull=always \ - -e LLM_API_VERSION="2023-05-15" - ... -``` +2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. -Then in the OpenHands UI Settings under the `LLM` tab: +3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. - -You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as -<deployment-name> below. - +4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. -1. Enable `Advanced` options. -2. Set the following: - - `Custom Model` to azure/<deployment-name> - - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) - - `API Key` to your Azure API key - -### Azure OpenAI Configuration - -When running OpenHands, set the following environment variable using `-e` in the -docker run command: - -``` -LLM_API_VERSION="" # e.g. "2024-02-15-preview" -``` - - -# Custom LLM Configurations -Source: https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs +## Troubleshooting -## How It Works +### "System.Management.Automation" Not Found Error -Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: +If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. -```toml -# Default LLM configuration -[llm] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.0 +> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. -# Custom LLM configuration for a cheaper model -[llm.gpt3] -model = "gpt-3.5-turbo" -api_key = "your-api-key" -temperature = 0.2 +To resolve this issue: -# Another custom configuration with different parameters -[llm.high-creativity] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.8 -top_p = 0.9 -``` +1. **Install the latest version of PowerShell 7** from the official Microsoft repository: + - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) + - Download and install the latest MSI package for your system architecture (x64 for most systems) + - During installation, ensure you select the following options: + - "Add PowerShell to PATH environment variable" + - "Register Windows PowerShell 7 as the default shell" + - "Enable PowerShell remoting" + - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default -Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. +2. **Restart your terminal or command prompt** to ensure the new PowerShell is available -## Using Custom Configurations +3. **Verify the installation** by running: + ```powershell + pwsh --version + ``` -### With Agents + You should see output indicating PowerShell 7.x.x -You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: +4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: + ```powershell + pwsh + cd path\to\openhands + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` -```toml -[agent.RepoExplorerAgent] -# Use the cheaper GPT-3 configuration for this agent -llm_config = 'gpt3' + > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". -[agent.CodeWriterAgent] -# Use the high creativity configuration for this agent -llm_config = 'high-creativity' -``` +5. **If the issue persists**, ensure that you have the .NET Runtime installed: + - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose ".NET Runtime" (not SDK) version 6.0 or later + - After installation, verify it's properly installed by running: + ```powershell + dotnet --info + ``` + - Restart your computer after installation + - Try running OpenHands again -### Configuration Options +6. **Ensure that the .NET Framework is properly installed** on your system: + - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off + - Make sure ".NET Framework 4.8 Advanced Services" is enabled + - Click OK and restart if prompted -Each named LLM configuration supports all the same options as the default LLM configuration. These include: +This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. -- Model selection (`model`) -- API configuration (`api_key`, `base_url`, etc.) -- Model parameters (`temperature`, `top_p`, etc.) -- Retry settings (`num_retries`, `retry_multiplier`, etc.) -- Token limits (`max_input_tokens`, `max_output_tokens`) -- And all other LLM configuration options +## OpenHands Cloud -For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. +### Bitbucket Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md -## Use Cases +## Prerequisites -Custom LLM configurations are particularly useful in several scenarios: +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). -- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. -- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. -- **Different Providers**: Use different LLM providers or API endpoints for different tasks. -- **Testing and Development**: Easily switch between different model configurations during development and testing. +## Adding Bitbucket Repository Access -## Example: Cost Optimization +Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. -A practical example of using custom LLM configurations to optimize costs: +## Working With Bitbucket Repos in Openhands Cloud -```toml -# Default configuration using GPT-4 for high-quality responses -[llm] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.0 +After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! -# Cheaper configuration for repository exploration -[llm.repo-explorer] -model = "gpt-3.5-turbo" -temperature = 0.2 +![Connect Repo](/openhands/static/img/connect-repo.png) -# Configuration for code generation -[llm.code-gen] -model = "gpt-4" -temperature = 0.0 -max_output_tokens = 2000 +## IP Whitelisting -[agent.RepoExplorerAgent] -llm_config = 'repo-explorer' +If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow +OpenHands to access your repositories: -[agent.CodeWriterAgent] -llm_config = 'code-gen' +### Core App IP +``` +34.68.58.200 ``` -In this example: -- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code -- Code generation uses GPT-4 with a higher token limit for generating larger code blocks -- The default configuration remains available for other tasks - -# Custom Configurations with Reserved Names - -OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. +### Runtime IPs +``` +34.10.175.217 +34.136.162.246 +34.45.0.142 +34.28.69.126 +35.224.240.213 +34.70.174.52 +34.42.4.87 +35.222.133.153 +34.29.175.97 +34.60.55.59 +``` -## Draft Editor Configuration +## Next Steps -The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -For example, you can define in `config.toml` a draft editor like this: +### Cloud API +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md -```toml -[llm.draft_editor] -model = "gpt-4" -temperature = 0.2 -top_p = 0.95 -presence_penalty = 0.0 -frequency_penalty = 0.0 -``` +For the available API endpoints, refer to the +[OpenHands API Reference](https://docs.openhands.dev/api-reference). -This configuration: -- Uses GPT-4 for high-quality edits and suggestions -- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility -- Uses a high top_p value (0.95) to consider a wide range of token options -- Disables presence and frequency penalties to maintain focus on the specific edits needed +## Obtaining an API Key -Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: -- Review and suggest code improvements -- Refine existing content while maintaining its core meaning -- Make precise, focused changes to code or text +To use the OpenHands Cloud API, you'll need to generate an API key: - -Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. - +1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. +2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. +3. Click `Create API Key`. +4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. +5. Copy the generated API key and store it securely. It will only be shown once. +## API Usage Example (V1) -# Google Gemini/Vertex -Source: https://docs.openhands.dev/openhands/usage/llms/google-llms +### Starting a New Conversation -## Gemini - Google AI Studio Configs +To start a new conversation with OpenHands to perform a task, +make a POST request to the V1 app-conversations endpoint. -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `Gemini` -- `LLM Model` to the model you will be using. -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` -(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). -- `API Key` to your Gemini API key + + + ```bash + curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests -## VertexAI - Google Cloud Platform Configs + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/v1/app-conversations" -To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment -variables using `-e` in the docker run command: + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } -``` -GOOGLE_APPLICATION_CREDENTIALS="" -VERTEXAI_PROJECT="" -VERTEXAI_LOCATION="" -``` + data = { + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + } -Then set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `VertexAI` -- `LLM Model` to the model you will be using. -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` -(e.g. vertex_ai/<model-name>). + response = requests.post(url, headers=headers, json=data) + result = response.json() + # The response contains a start task with the conversation ID + conversation_id = result.get("app_conversation_id") or result.get("id") + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") + print(f"Status: {result['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/v1/app-conversations"; -# Groq -Source: https://docs.openhands.dev/openhands/usage/llms/groq + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; -## Configuration + const data = { + initial_message: { + content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] + }, + selected_repository: "yourusername/your-repo" + }; -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `Groq` -- `LLM Model` to the model you will be using. [Visit here to see the list of -models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, -enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). -- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); -## Using Groq as an OpenAI-Compatible Endpoint + const result = await response.json(); -The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you -would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: -1. Enable `Advanced` options -2. Set the following: - - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) - - `Base URL` to `https://api.groq.com/openai/v1` - - `API Key` to your Groq API key + // The response contains a start task with the conversation ID + const conversationId = result.app_conversation_id || result.id; + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); + console.log(`Status: ${result.status}`); + return result; + } catch (error) { + console.error("Error starting conversation:", error); + } + } -# LiteLLM Proxy -Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy + startConversation(); + ``` + + -## Configuration +#### Response -To use LiteLLM proxy with OpenHands, you need to: +The API will return a JSON object with details about the conversation start task: -1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) -2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: - * Enable `Advanced` options - * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) - * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) - * `API Key` to your LiteLLM proxy API key +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "status": "WORKING", + "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", + "sandbox_id": "sandbox-abc123", + "created_at": "2025-01-15T10:30:00Z" +} +``` -## Supported Models +The `status` field indicates the current state of the conversation startup process: +- `WORKING` - Initial processing +- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready +- `PREPARING_REPOSITORY` - Cloning and setting up the repository +- `READY` - Conversation is ready to use +- `ERROR` - An error occurred during startup -The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy -is configured to handle. +You may receive an authentication error if: -Refer to your LiteLLM proxy configuration for the list of available models and their names. +- You provided an invalid API key. +- You provided the wrong repository name. +- You don't have access to the repository. +### Streaming Conversation Start (Optional) -# Overview -Source: https://docs.openhands.dev/openhands/usage/llms/llms +For real-time updates during conversation startup, you can use the streaming endpoint: - -This section is for users who want to connect OpenHands to different LLMs. - +```bash +curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Your task description here"}] + }, + "selected_repository": "yourusername/your-repo" + }' +``` - -OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this -page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation -for the canonical list of supported parameters. - +#### Streaming Response -## Model Recommendations +The endpoint streams a JSON array incrementally. Each element represents a status update: -Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some -recommendations for model selection. Our latest benchmarking results can be found in -[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). +```json +[ + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} +] +``` -Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: +Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. -### Cloud / API-Based Models +## Rate Limits -- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) -- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) -- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) -- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) -- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) -- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) +If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. +If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). -If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process -to help others using the same provider! +--- -For a full list of the providers and models available, please consult the -[litellm documentation](https://docs.litellm.ai/docs/providers). +## Migrating from V0 to V1 API -OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending -limits and monitor usage. + The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. + Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. -### Local / Self-Hosted Models +### Key Differences -- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) -- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) +| Feature | V0 API | V1 API | +|---------|--------|--------| +| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | +| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | +| Repository field | `repository` | `selected_repository` | +| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | -### Known Issues +### Migration Steps - -Most current local and open source models are not as powerful. When using such models, you may see long -wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the -models driving it. However, if you do find ones that work, please add them to the verified list above. - +1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` -## LLM Configuration +2. **Update the request body**: + - Change `repository` to `selected_repository` + - Change `initial_user_msg` (string) to `initial_message` (object with content array): + ```json + // V0 format + { "initial_user_msg": "Your message here" } -The following can be set in the OpenHands UI through the Settings. Each option is serialized into the -`LLM.load_from_env()` schema before being passed to the Agent SDK: + // V1 format + { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } + ``` -- `LLM Provider` -- `LLM Model` -- `API Key` -- `Base URL` (through `Advanced` settings) +3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. -There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them -as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: +--- -- `LLM_API_VERSION` -- `LLM_EMBEDDING_MODEL` -- `LLM_EMBEDDING_DEPLOYMENT_NAME` -- `LLM_DROP_PARAMS` -- `LLM_DISABLE_VISION` -- `LLM_CACHING_PROMPT` +## Legacy API (V0) - Deprecated -## LLM Provider Guides + + The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. + New integrations should use the V1 API documented above. + -We have a few guides for running OpenHands with specific model providers: +### Starting a New Conversation (V0) -- [Azure](/openhands/usage/llms/azure-llms) -- [Google](/openhands/usage/llms/google-llms) -- [Groq](/openhands/usage/llms/groq) -- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) -- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) -- [Moonshot AI](/openhands/usage/llms/moonshot) -- [OpenAI](/openhands/usage/llms/openai-llms) -- [OpenHands](/openhands/usage/llms/openhands-llms) -- [OpenRouter](/openhands/usage/llms/openrouter) + + + ```bash + curl -X POST "https://app.all-hands.dev/api/conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests -These pages remain the authoritative provider references for both the Agent SDK -and the OpenHands interfaces. + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/conversations" -## Model Customization + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } -LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: + data = { + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + } -- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. -- **Native Tool Calling**: Toggle native function/tool calling capabilities. + response = requests.post(url, headers=headers, json=data) + conversation = response.json() -For detailed information about model customization, see -[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") + print(f"Status: {conversation['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/conversations"; -### API retries and rate limits + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; -LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically -retry requests if it receives a Rate Limit Error (429 error code). + const data = { + initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + repository: "yourusername/your-repo" + }; -You can customize these options as you need for the provider you're using. Check their documentation, and set the -following environment variables to control the number of retries and the time between retries: + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); -- `LLM_NUM_RETRIES` (Default of 4 times) -- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) -- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) -- `LLM_RETRY_MULTIPLIER` (Default of 2) + const conversation = await response.json(); -If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); + console.log(`Status: ${conversation.status}`); -```toml -[llm] -num_retries = 4 -retry_min_wait = 5 -retry_max_wait = 30 -retry_multiplier = 2 + return conversation; + } catch (error) { + console.error("Error starting conversation:", error); + } + } + + startConversation(); + ``` + + + +#### Response (V0) + +```json +{ + "status": "ok", + "conversation_id": "abc1234" +} ``` +### Cloud UI +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md -# Local LLMs -Source: https://docs.openhands.dev/openhands/usage/llms/local-llms +## Landing Page -## News +The landing page is where you can: -- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! +- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), + [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or + [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. +- Launch an empty conversation using `New Conversation`. +- See `Suggested Tasks` for repositories that OpenHands has access to. +- See your `Recent Conversations`. -## Quickstart: Running OpenHands with a Local LLM using LM Studio +## Settings -This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. +Settings are divided across tabs, with each tab focusing on a specific area of configuration. -We recommend: -- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. -- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. +- `User` + - Change your email address. +- `Integrations` + - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). +- `Application` + - Set your preferred language, notifications and other preferences. + - Toggle task suggestions on GitHub. + - Toggle Solvability Analysis. + - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). + - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). +- `LLM` + - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). +- `Billing` + - Add credits for using the OpenHands provider. +- `Secrets` + - [Manage secrets](/openhands/usage/settings/secrets-settings). +- `API Keys` + - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). +- `MCP` + - [Setup an MCP server](/openhands/usage/settings/mcp-settings) -### Hardware Requirements +## Key Features -Running Qwen3-Coder-30B-A3B-Instruct requires: -- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or -- A Mac with Apple Silicon with at least 32GB of RAM +For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) +section of the documentation. -### 1. Install LM Studio +## Next Steps -Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -### 2. Download the Model +### GitHub Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation.md -1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. -2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. +## Prerequisites -![image](./screenshots/01_lm_studio_open_model_hub.png) +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). -3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. +## Adding GitHub Repository Access -![image](./screenshots/02_lm_studio_download_devstral.png) +You can grant OpenHands access to specific GitHub repositories: -4. Wait for the download to finish. +1. Click on `+ Add GitHub Repos` in the repository selection dropdown. +2. Select your organization and choose the specific repositories to grant OpenHands access to. + + - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: + - Actions: Read and write + - Commit statuses: Read and write + - Contents: Read and write + - Issues: Read and write + - Metadata: Read-only + - Pull requests: Read and write + - Webhooks: Read and write + - Workflows: Read and write + - Repository access for a user is granted based on: + - Permission granted for the repository + - User's GitHub permissions (owner/collaborator) + -### 3. Load the Model +3. Click `Install & Authorize`. -1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. -2. Click the "Select a model to load" dropdown at the top of the application window. +## Modifying Repository Access -![image](./screenshots/03_lm_studio_open_load_model.png) +You can modify GitHub repository access at any time by: +- Selecting `+ Add GitHub Repos` in the repository selection dropdown or +- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` -3. Enable the "Manually choose model load parameters" switch. -4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. +## Working With GitHub Repos in Openhands Cloud -![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) +Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the +`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click +on `Launch` to start the conversation! -5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. -6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. -7. Click "Load Model" to start loading the model. +![Connect Repo](/openhands/static/img/connect-repo.png) -![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) +## Working on GitHub Issues and Pull Requests Using Openhands -### 4. Start the LLM server +To allow OpenHands to work directly from GitHub directly, you must +[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is +given, you can use OpenHands by labeling the issue or by tagging `@openhands`. -1. Enable the switch next to "Status" at the top-left of the Window. -2. Take note of the Model API Identifier shown on the sidebar on the right. +### Working with Issues -![image](./screenshots/06_lm_studio_start_server.png) +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a pull request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. -### 5. Start OpenHands +### Working with Pull Requests -1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: +To get OpenHands to work on pull requests, mention `@openhands` in the comments to: +- Ask questions +- Request updates +- Get code explanations -```bash -docker run -it --rm --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e LOG_ALL_EVENTS=true \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:1.4 -``` + +The `@openhands` mention functionality in pull requests only works if the pull request is both +*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate +permissions to access both repositories. + -2. Wait until the server is running (see log below): -``` -Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f -Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 -Starting OpenHands... -Running OpenHands as root -14:22:13 - openhands:INFO: server_config.py:50 - Using config class None -INFO: Started server process [8] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) -``` -3. Visit `http://localhost:3000` in your browser. +## Next Steps -### 6. Configure OpenHands to use the LLM server +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. +### GitLab Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md -When started for the first time, OpenHands will prompt you to set up the LLM provider. +## Prerequisites -1. Click "see advanced settings" to open the LLM Settings page. +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). -![image](./screenshots/07_openhands_open_advanced_settings.png) +## Adding GitLab Repository Access -2. Enable the "Advanced" switch at the top of the page to show all the available settings. +Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. -3. Set the following values: - - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") - - **Base URL**: `http://host.docker.internal:1234/v1` - - **API Key**: `local-llm` +## Working With GitLab Repos in Openhands Cloud -4. Click "Save Settings" to save the configuration. +After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! -![image](./screenshots/08_openhands_configure_local_llm_parameters.png) +![Connect Repo](/openhands/static/img/connect-repo.png) -That's it! You can now start using OpenHands with the local LLM server. +## Using Tokens with Reduced Scopes -If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). +OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. +To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, +which will override the default token assigned to the agent. While the high-permission API token is still requested +and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. -## Advanced: Alternative LLM Backends +## Working on GitLab Issues and Merge Requests Using Openhands -This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. + +This feature works for personal projects and is available for group projects with a +[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). -### Create an OpenAI-Compatible Endpoint with Ollama +A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into +OpenHands Cloud. -- Install Ollama following [the official documentation](https://ollama.com/download). -- Example launch command for Qwen3-Coder-30B-A3B-Instruct: + -```bash -# ⚠️ WARNING: OpenHands requires a large context size to work properly. -# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. -# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. -OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & -ollama pull qwen3-coder:30b -``` +Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. -### Create an OpenAI-Compatible Endpoint with vLLM or SGLang +### Working with Issues -First, download the model checkpoint: +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: -```bash -huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct -``` +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a merge request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. -#### Serving the model using SGLang +### Working with Merge Requests -- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). -- Example launch command (with at least 2 GPUs): +To get OpenHands to work on merge requests, mention `@openhands` in the comments to: -```bash -SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ - --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --port 8000 \ - --tp 2 --dp 1 \ - --host 0.0.0.0 \ - --api-key mykey --context-length 131072 -``` +- Ask questions +- Request updates +- Get code explanations -#### Serving the model using vLLM +## Managing GitLab Webhooks -- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). -- Example launch command (with at least 2 GPUs): +The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. -```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --host 0.0.0.0 --port 8000 \ - --api-key mykey \ - --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --enable-prefix-caching -``` +### Accessing Webhook Management -If you are interested in further improved inference speed, you can also try Snowflake's version -of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), -which can achieve up to 2x speedup in some cases. +The webhook management table is available on the Integrations page when: -1. Install the Arctic Inference library that automatically patches vLLM: +- You are signed in to OpenHands Cloud with a GitLab account +- Your GitLab token is connected -```bash -pip install git+https://github.com/snowflakedb/ArcticInference.git -``` +To access it: -2. Run the launch command with speculative decoding enabled: +1. Navigate to the `Settings > Integrations` page +2. Find the GitLab section +3. If your GitLab token is connected, you'll see the webhook management table below the connection status -```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --host 0.0.0.0 --port 8000 \ - --api-key mykey \ - --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --speculative-config '{"method": "suffix"}' -``` +### Viewing Webhook Status -### Run OpenHands (Alternative Backends) +The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. -#### Using Docker +- **Resource**: The name and full path of the project or group +- **Type**: Whether it's a "project" or "group" +- **Status**: The current webhook installation status: + - **Installed**: The webhook is active and working + - **Not Installed**: No webhook is currently installed + - **Failed**: A previous installation attempt failed (error details are shown below the status) -Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). +### Reinstalling Webhooks -#### Using Development Mode +If a webhook is not installed or has failed, you can reinstall it: -Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. +1. Find the resource in the webhook management table +2. Click the `Reinstall` button in the Action column +3. The button will show `Reinstalling...` while the operation is in progress +4. Once complete, the status will update to reflect the result -Start OpenHands using `make run`. + + To reinstall an existing webhook, you must first delete the current webhook + from the GitLab UI before using the Reinstall button in OpenHands Cloud. + -### Configure OpenHands (Alternative Backends) +**Important behaviors:** -Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. +- The Reinstall button is disabled if the webhook is already installed +- Only one reinstall operation can run at a time +- After a successful reinstall, the button remains disabled to prevent duplicate installations +- If a reinstall fails, the error message is displayed below the status badge +- The resources list automatically refreshes after a reinstall completes -1. Click **"see advanced settings"** to access the full configuration panel. -2. Enable the **Advanced** toggle at the top of the page. -3. Set the following parameters, if you followed the examples above: - - **Custom Model**: `openai/` - - For **Ollama**: `openai/qwen3-coder:30b` - - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` - - **Base URL**: `http://host.docker.internal:/v1` - Use port `11434` for Ollama, or `8000` for SGLang and vLLM. - - **API Key**: - - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) - - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) +### Constraints and Limitations + +- The webhook management table only displays resources that are accessible with your connected GitLab token +- Webhook installation requires Admin or Owner permissions on the GitLab project or group +## Next Steps -# Moonshot AI -Source: https://docs.openhands.dev/openhands/usage/llms/moonshot +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -## Using Moonshot AI with OpenHands +### Getting Started +Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md -[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. +## Accessing OpenHands Cloud -### Setup +OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, +visit [app.all-hands.dev](https://app.all-hands.dev). -1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) -2. Generate an API key from your account settings -3. Configure OpenHands to use Moonshot AI: +You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: -| Setting | Value | -| --- | --- | -| LLM Provider | `moonshot` | -| LLM Model | `kimi-k2-0711-preview` | -| API Key | Your Moonshot API key | +1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. +2. Review the permissions requested by OpenHands and authorize the application. + - OpenHands will require certain permissions from your account. To read more about these permissions, + you can click the `Learn more` link on the authorization page. +3. Review and accept the `terms of service` and select `Continue`. -### Recommended Models +## Next Steps -- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. +Once you've connected your account, you can: +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). -# OpenAI -Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms +### Jira Data Center Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md -## Configuration +# Jira Data Center Integration -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -* `LLM Provider` to `OpenAI` -* `LLM Model` to the model you will be using. -[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). -* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). +## Platform Configuration -## Using OpenAI-Compatible Endpoints +### Step 1: Create Service Account -Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). +1. **Access User Management** + - Log in to Jira Data Center as administrator + - Go to **Administration** > **User Management** -## Using an OpenAI Proxy +2. **Create User** + - Click **Create User** + - Username: `openhands-agent` + - Full Name: `OpenHands Agent` + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Password: Set a secure password + - Click **Create** -If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: -1. Enable `Advanced` options -2. Set the following: - - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) - - `Base URL` to the URL of your OpenAI proxy - - `API Key` to your OpenAI API key +3. **Assign Permissions** + - Add user to appropriate groups + - Ensure access to relevant projects + - Grant necessary project permissions +### Step 2: Generate API Token -# OpenHands -Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms +1. **Personal Access Tokens** + - Log in as the service account + - Go to **Profile** > **Personal Access Tokens** + - Click **Create token** + - Name: `OpenHands Cloud Integration` + - Expiry: Set appropriate expiration (recommend 1 year) + - Click **Create** + - **Important**: Copy and store the token securely -## Obtain Your OpenHands LLM API Key +### Step 3: Configure Webhook -1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). -2. Go to the Settings page and navigate to the `API Keys` tab. -3. Copy your `LLM API Key`. +1. **Create Webhook** + - Go to **Administration** > **System** > **WebHooks** + - Click **Create a WebHook** + - **Name**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` + - Set a suitable webhook secret + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) -![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) +--- -## Configuration +## Workspace Integration -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `OpenHands` -- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) -- `API Key` to your OpenHands LLM API key copied from above +### Step 1: Log in to OpenHands Cloud -## Using OpenHands LLM Provider in the CLI +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. -1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). -2. To select OpenHands as the LLM provider: - - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. - - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then - choose `openhands` and finally the model. +### Step 2: Configure Jira Data Center Integration -![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Data Center** section +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The personal access token from Step 2 above + - Ensure **Active** toggle is enabled -When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy - +Workspace name is the host name of your Jira Data Center instance. -## Using OpenHands LLM Provider with the SDK +Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 -You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. +Here the workspace name is **jira.all-hands.dev**. +
-### Configuration +3. **Complete OAuth Flow** + - You'll be redirected to Jira Data Center to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI -The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: +### Managing Your Integration -```bash -export LLM_API_KEY="your-openhands-api-key" -export LLM_MODEL="openhands/claude-sonnet-4-20250514" -``` +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view -### Example +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -```python -from openhands.sdk import LLM +### Screenshots -# The openhands/ prefix auto-configures the base URL -llm = LLM.load_from_env() + + +![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) + -# Or configure directly -llm = LLM( - model="openhands/claude-sonnet-4-20250514", - api_key="your-openhands-api-key", -) -``` + +![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) + -The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. + +![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) + -### Available Models + +![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) + + -When using the SDK, prefix any model from the pricing table below with `openhands/`: -- `openhands/claude-sonnet-4-20250514` -- `openhands/claude-sonnet-4-5-20250929` -- `openhands/claude-opus-4-20250514` -- `openhands/gpt-5-2025-08-07` -- etc. +### Jira Cloud Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md - -If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. - +# Jira Cloud Integration -## Pricing +## Platform Configuration -Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: +### Step 1: Create Service Account +1. **Navigate to User Management** + - Go to [Atlassian Admin](https://admin.atlassian.com/) + - Select your organization + - Go to **Directory** > **Users** -| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | -|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| -| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | -| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | -| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | -| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | -| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | -| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | -| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | -| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | -| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | -| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | -| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | -| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | -| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | -| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | -| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | +2. **Create OpenHands Service Account** + - Click **Service accounts** + - Click **Create a service account** + - Name: `OpenHands Agent` + - Click **Next** + - Select **User** role for Jira app + - Click **Create** -**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. +### Step 2: Generate API Token +1. **Access Service Account Configuration** + - Locate the created service account from above step and click on it + - Click **Create API token** + - Set the expiry to 365 days (maximum allowed value) + - Click **Next** + - In **Select token scopes** screen, filter by following values + - App: Jira + - Scope type: Classic + - Scope actions: Write, Read + - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes + - Click **Next** + - Review and create API token + - **Important**: Copy and securely store the token immediately -# OpenRouter -Source: https://docs.openhands.dev/openhands/usage/llms/openrouter +### Step 3: Configure Webhook -## Configuration +1. **Navigate to Webhook Settings** + - Go to **Jira Settings** > **System** > **WebHooks** + - Click **Create a WebHook** -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -* `LLM Provider` to `OpenRouter` -* `LLM Model` to the model you will be using. -[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). -If the model is not in the list, enable `Advanced` options, and enter it in -`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). -* `API Key` to your OpenRouter API key. +2. **Configure Webhook** + - **Name**: `OpenHands Cloud Integration` + - **Status**: Enabled + - **URL**: `https://app.all-hands.dev/integration/jira/events` + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +--- -# OpenHands GitHub Action -Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action +## Workspace Integration -## Using the Action in the OpenHands Repository +### Step 1: Log in to OpenHands Cloud -To use the OpenHands GitHub Action in a repository, you can: +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. -1. Create an issue in the repository. -2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. +### Step 2: Configure Jira Integration -The action will automatically trigger and attempt to resolve the issue. +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Cloud** section -## Installing the Action in a New Repository +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API token from Step 2 above + - Ensure **Active** toggle is enabled -To install the OpenHands GitHub Action in your own repository, follow -the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). + +Workspace name is the host name when accessing a resource in Jira Cloud. -## Usage Tips +Eg: https://all-hands.atlassian.net/browse/OH-55 -### Iterative resolution +Here the workspace name is **all-hands**. + -1. Create an issue in the repository. -2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. -3. Review the attempt to resolve the issue by checking the pull request. -4. Follow up with feedback through general comments, review comments, or inline thread comments. -5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. +3. **Complete OAuth Flow** + - You'll be redirected to Jira Cloud to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI -### Label versus Macro +### Managing Your Integration -- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. -- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view -## Advanced Settings +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -### Add custom repository settings +### Screenshots -You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). + + +![workspace-link.png](/openhands/static/img/jira-user-link.png) + -### Custom configurations + +![workspace-link.png](/openhands/static/img/jira-admin-configure.png) + -GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. -The customization options you can set are: + +![workspace-link.png](/openhands/static/img/jira-user-unlink.png) + -| **Attribute name** | **Type** | **Purpose** | **Example** | -| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | -| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | -| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | -| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | -| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | -| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | -| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | + +![workspace-link.png](/openhands/static/img/jira-admin-edit.png) + + +### Linear Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md -# Configure -Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode +# Linear Integration -## Prerequisites +## Platform Configuration -- [OpenHands is running](/openhands/usage/run-openhands/local-setup) +### Step 1: Create Service Account -## Launching the GUI Server +1. **Access Team Settings** + - Log in to Linear as a team admin + - Go to **Settings** > **Members** -### Using the CLI Command +2. **Invite Service Account** + - Click **Invite members** + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Role: **Member** (with appropriate team access) + - Send invitation -You can launch the OpenHands GUI server directly from the command line using the `serve` command: +3. **Complete Setup** + - Accept invitation from the service account email + - Complete profile setup + - Ensure access to relevant teams/workspaces - -**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` -installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker -directly (see the [Docker section](#using-docker-directly) below). - +### Step 2: Generate API Key -```bash -openhands serve -``` +1. **Access API Settings** + - Log in as the service account + - Go to **Settings** > **Security & access** -This command will: -- Check that Docker is installed and running -- Pull the required Docker images -- Launch the OpenHands GUI server at http://localhost:3000 -- Use the same configuration directory (`~/.openhands`) as the CLI mode +2. **Create Personal API Key** + - Click **Create new key** + - Name: `OpenHands Cloud Integration` + - Scopes: Select the following: + - `Read` - Read access to issues and comments + - `Create comments` - Ability to create or update comments + - Select the teams you want to provide access to, or allow access for all teams you have permissions for + - Click **Create** + - **Important**: Copy and store the API key securely -#### Mounting Your Current Directory +### Step 3: Configure Webhook -To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: +1. **Access Webhook Settings** + - Go to **Settings** > **API** > **Webhooks** + - Click **New webhook** -```bash -openhands serve --mount-cwd -``` +2. **Configure Webhook** + - **Label**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/linear/events` + - **Resource types**: Select: + - `Comment` - For comment events + - `Issue` - For issue updates (label changes) + - Select the teams you want to provide access to, or allow access for all public teams + - Click **Create webhook** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) -This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. +--- -#### Using GPU Support +## Workspace Integration -If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: +### Step 1: Log in to OpenHands Cloud -```bash -openhands serve --gpu -``` +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. -This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: +### Step 2: Configure Linear Integration -```bash -openhands serve --gpu --mount-cwd -``` +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Linear** section -**Prerequisites for GPU support:** -- NVIDIA GPU drivers must be installed on your host system -- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API key from Step 2 above + - Ensure **Active** toggle is enabled -#### Requirements + +Workspace name is the identifier after the host name when accessing a resource in Linear. -Before using the `openhands serve` command, ensure that: -- Docker is installed and running on your system -- You have internet access to pull the required Docker images -- Port 3000 is available on your system +Eg: https://linear.app/allhands/issue/OH-37 -The CLI will automatically check these requirements and provide helpful error messages if anything is missing. +Here the workspace name is **allhands**. + -### Using Docker Directly +3. **Complete OAuth Flow** + - You'll be redirected to Linear to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI -Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. +### Managing Your Integration -## Overview +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view -### Initial Setup +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -1. Upon first launch, you'll see a settings popup. -2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, - select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the - `Custom Model` text box. -3. Enter the corresponding `API Key` for your chosen provider. -4. Click `Save Changes` to apply the settings. +### Screenshots -### Settings + + +![workspace-link.png](/openhands/static/img/linear-user-link.png) + -You can use the Settings page at any time to: + +![workspace-link.png](/openhands/static/img/linear-admin-configure.png) + -- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). -- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). -- [Configure MCP servers](/openhands/usage/settings/mcp-settings). -- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), - [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) - and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). -- Set application settings like your preferred language, notifications and other preferences. -- [Manage custom secrets](/openhands/usage/settings/secrets-settings). + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + -### Key Features + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + -For an overview of the key features available inside a conversation, please refer to the -[Key Features](/openhands/usage/key-features) section of the documentation. +### Project Management Tool Integrations (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md -## Other Ways to Run Openhands -- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) -- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) +# Project Management Tool Integrations +## Overview -# Setup -Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup +OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: +- Adding `@openhands` in ticket comments +- Adding the `openhands` label to tickets -## Recommended Methods for Running Openhands on Your Local System +## Prerequisites -### System Requirements +Integration requires two levels of setup: +1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) +2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace -- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) -- Linux -- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) +### Platform-Specific Setup Guides: +- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) +- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) +- [Linear Integration (Coming soon...)](./linear-integration.md) -A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. +## Usage -### Prerequisites +Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: - +### Method 1: Comment Mention +Add a comment to any issue with `@openhands` followed by your task description: +``` +@openhands Please implement the user authentication feature described in this ticket +``` - +### Method 2: Label-based Delegation +Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. - **Docker Desktop** +### Git Repository Detection - 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). - 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. - +The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: - +#### Specifying the Target Repository - - Tested with Ubuntu 22.04. - +**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. - **Docker Desktop** +**Supported Repository Formats:** +- Full HTTPS URL: `https://github.com/owner/repository.git` +- GitHub URL without .git: `https://github.com/owner/repository` +- Owner/repository format: `owner/repository` - 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). +#### Platform-Specific Behavior - +**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. - +**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. - **WSL** +## Troubleshooting - 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). - 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. +### Platform Configuration Issues +- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) +- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings +- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions - **Ubuntu (Linux Distribution)** +### Workspace Integration Issues +- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) +- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first +- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access - 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. - 2. Restart computer when prompted. - 3. Open Ubuntu from Start menu to complete setup. - 4. Verify installation: `wsl --list` should show Ubuntu. +### General Issues +- **Agent not responding**: Check webhook logs in your platform settings and verify service account status +- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access +- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on +- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed - **Docker Desktop** +### Getting Help +For additional support, contact OpenHands Cloud support with: +- Your integration platform (Linear, Jira Cloud, or Jira Data Center) +- Workspace name +- Error logs from webhook/integration attempts +- Screenshots of configuration settings (without sensitive credentials) - 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). - 2. Open Docker Desktop, go to `Settings` and confirm the following: - - General: `Use the WSL 2 based engine` is enabled. - - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. +### Slack Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md - - The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. - + - + +OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. +While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to +validate critical information independently. + - +## Prerequisites -### Start the App +- Access to OpenHands Cloud. -#### Option 1: Using the CLI Launcher with uv (Recommended) +## Installation Steps -We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). + + -**Install uv** (if you haven't already): + **This step is for Slack admins/owners** -See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. + 1. Make sure you have permissions to install Apps to your workspace. + 2. Click the button below to install OpenHands Slack App Add to Slack + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. -**Install OpenHands**: -```bash -uv tool install openhands --python 3.12 -``` + -**Launch OpenHands**: -```bash -# Launch the GUI server -openhands serve + -# Or with GPU support (requires nvidia-docker) -openhands serve --gpu + **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** -# Or with current directory mounted -openhands serve --mount-cwd -``` + Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: + 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. + 2. Click `Install OpenHands Slack App`. + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. -This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. + Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. -**Upgrade OpenHands**: -```bash -uv tool upgrade openhands --python 3.12 -``` + - + -If you prefer to use pip and have Python 3.12+ installed: -```bash -# Install OpenHands -pip install openhands +## Working With the Slack App -# Launch the GUI server -openhands serve -``` +To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. -Note that you'll still need `uv` installed for the default MCP servers to work properly. +Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. - - -#### Option 2: Using Docker Directly +To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. +You must be the user who started the conversation. - +## Example conversation -```bash -docker run -it --rm --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e LOG_ALL_EVENTS=true \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:1.4 -``` +### Start a new conversation, and select repo - +Conversation is started by mentioning `@openhands`. -> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. +![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) -You'll find OpenHands running at http://localhost:3000! +### See agent response and send follow up messages -### Setup +Initial request is followed up by mentioning `@openhands` in a thread reply. -After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. -This can be done during the initial settings popup or by selecting the `Settings` -button (gear icon) in the UI. +![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) -If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options -and manually enter it with the correct prefix in the `Custom Model` text box. -The `Advanced` options also allow you to specify a `Base URL` if required. +## Pro tip -#### Getting an API Key +You can mention a repo name when starting a new conversation in the following formats -OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: +1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) +2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) - +The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. +If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. - +![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) -1. [Log in to OpenHands Cloud](https://app.all-hands.dev). -2. Go to the Settings page and navigate to the `API Keys` tab. -3. Copy your `LLM API Key`. +## OpenHands CLI -OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). +### OpenHands Cloud +Source: https://docs.openhands.dev/openhands/usage/cli/cloud.md - +## Overview - +The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: -1. [Create an Anthropic account](https://console.anthropic.com/). -2. [Generate an API key](https://console.anthropic.com/settings/keys). -3. [Set up billing](https://console.anthropic.com/settings/billing). +- Authenticate with your OpenHands Cloud account +- Create new cloud conversations +- Use cloud resources without the web interface - +## Authentication - +### Login -1. [Create an OpenAI account](https://platform.openai.com/). -2. [Generate an API key](https://platform.openai.com/api-keys). -3. [Set up billing](https://platform.openai.com/account/billing/overview). +Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: - +```bash +openhands login +``` - +This opens a browser window for authentication. After successful login, your credentials are stored locally. -1. Create a Google account if you don't already have one. -2. [Generate an API key](https://aistudio.google.com/apikey). -3. [Set up billing](https://aistudio.google.com/usage?tab=billing). +#### Custom Server URL - +For self-hosted or enterprise deployments: - +```bash +openhands login --server-url https://your-openhands-server.com +``` -If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. +You can also set the server URL via environment variable: - +```bash +export OPENHANDS_CLOUD_URL=https://your-openhands-server.com +openhands login +``` - +### Logout -Consider setting usage limits to control costs. +Log out from OpenHands Cloud: -#### Using a Local LLM +```bash +# Log out from all servers +openhands logout - -Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. - +# Log out from a specific server +openhands logout --server-url https://app.all-hands.dev +``` -To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. +## Creating Cloud Conversations -#### Setting Up Search Engine +Create a new conversation in OpenHands Cloud: -OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. +```bash +# With a task +openhands cloud -t "Review the codebase and suggest improvements" -To enable search functionality in OpenHands: +# From a file +openhands cloud -f task.txt +``` -1. Get a Tavily API key from [tavily.com](https://tavily.com/). -2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` +### Options -For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | -### Versions +### Examples -The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: -- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. -For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. -- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. -This version is unstable and is recommended for testing or development purposes only. +```bash +# Create a cloud conversation with a task +openhands cloud -t "Fix the authentication bug in login.py" -## Next Steps +# Create from a task file +openhands cloud -f requirements.txt -- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories -- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) -- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) -- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) +# Use a custom server +openhands cloud --server-url https://custom.server.com -t "Add unit tests" +# Combine with environment variable +export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev +openhands cloud -t "Refactor the database module" +``` -# Docker Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker +## Workflow -The **Docker sandbox** runs the agent server inside a Docker container. This is -the default and recommended option for most users. +A typical workflow with OpenHands Cloud: - - In some self-hosted deployments, the sandbox provider is controlled via the - legacy RUNTIME environment variable. Docker is the default. - +1. **Login once**: + ```bash + openhands login + ``` +2. **Create conversations as needed**: + ```bash + openhands cloud -t "Your task here" + ``` -## Why Docker? +3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server -- Isolation: reduces risk when the agent runs commands. -- Reproducibility: consistent environment across machines. +## Environment Variables -## Mounting your code into the sandbox +| Variable | Description | +|----------|-------------| +| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | -If you want OpenHands to work directly on a local repository, mount it into the -sandbox. +## Cloud vs Local -### Recommended: CLI launcher +| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | +|---------|---------------------------|---------------------| +| Compute | Cloud-hosted | Your machine | +| Persistence | Cloud storage | Local files | +| Collaboration | Share via link | Local only | +| Setup | Just login | Configure LLM & runtime | +| Cost | Subscription/usage-based | Your LLM API costs | -If you start OpenHands via: + +Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. + -```bash -openhands serve --mount-cwd -``` +## See Also -your current directory will be mounted into the sandbox workspace. +- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation +- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide +- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access -### Using SANDBOX_VOLUMES +### Command Reference +Source: https://docs.openhands.dev/openhands/usage/cli/command-reference.md -You can also configure mounts via the SANDBOX_VOLUMES environment -variable (format: host_path:container_path[:mode]): +## Basic Usage ```bash -export SANDBOX_VOLUMES=$PWD:/workspace:rw +openhands [OPTIONS] [COMMAND] ``` - - Anything mounted read-write into /workspace can be modified by the - agent. - - -## Custom sandbox images - -To customize the container image (extra tools, system deps, etc.), see -[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). - +## Global Options -# Overview -Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview +| Option | Description | +|--------|-------------| +| `-v, --version` | Show version number and exit | +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | +| `--last` | Resume the most recent conversation (use with `--resume`) | +| `--exp` | Use textual-based UI (now default, kept for compatibility) | +| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | +| `--json` | Enable JSONL output (requires `--headless`) | +| `--always-approve` | Auto-approve all actions without confirmation | +| `--llm-approve` | Use LLM-based security analyzer for action approval | +| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | +| `--exit-without-confirmation` | Exit without showing confirmation dialog | -A **sandbox** is the environment where OpenHands runs commands, edits files, and -starts servers while working on your task. +## Subcommands -In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. +### serve -## Sandbox providers +Launch the OpenHands GUI server using Docker. -OpenHands supports multiple sandbox “providers”, with different tradeoffs: +```bash +openhands serve [OPTIONS] +``` -- **Docker sandbox (recommended)** - - Runs the agent server inside a Docker container. - - Good isolation from your host machine. +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | -- **Process sandbox (unsafe, but fast)** - - Runs the agent server as a regular process on your machine. - - No container isolation. +**Examples:** +```bash +openhands serve +openhands serve --mount-cwd +openhands serve --gpu +openhands serve --mount-cwd --gpu +``` -- **Remote sandbox** - - Runs the agent server in a remote environment. - - Used by managed deployments and some hosted setups. +### web -## Selecting a provider (current behavior) +Launch the CLI as a web application accessible via browser. -In some deployments, the provider selection is still controlled via the legacy -RUNTIME environment variable: +```bash +openhands web [OPTIONS] +``` -- RUNTIME=docker (default) -- RUNTIME=process (aka legacy RUNTIME=local) -- RUNTIME=remote +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host to bind the web server to | +| `--port` | `12000` | Port to bind the web server to | +| `--debug` | `false` | Enable debug mode | - - The user-facing terminology in V1 is sandbox, but the configuration knob - may still be called RUNTIME while the migration is in progress. - +**Examples:** +```bash +openhands web +openhands web --port 8080 +openhands web --host 127.0.0.1 --port 3000 +openhands web --debug +``` -## Terminology note (V0 vs V1) +### cloud -Older documentation refers to these environments as **runtimes**. -Those legacy docs are now in the Legacy (V0) section of the Web tab. +Create a new conversation in OpenHands Cloud. +```bash +openhands cloud [OPTIONS] +``` -# Process Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/process +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | -The **Process sandbox** runs the agent server directly on your machine as a -regular process. +**Examples:** +```bash +openhands cloud -t "Fix the bug" +openhands cloud -f task.txt +openhands cloud --server-url https://custom.server.com -t "Task" +``` - - This mode provides **no sandbox isolation**. +### acp - The agent can read/write files your user account can access and execute - commands on your host system. +Start the Agent Client Protocol server for IDE integrations. - Only use this in controlled environments. - +```bash +openhands acp [OPTIONS] +``` -## When to use it +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | -- Local development when Docker is unavailable -- Some CI environments -- Debugging issues that only reproduce outside containers +**Examples:** +```bash +openhands acp +openhands acp --llm-approve +openhands acp --resume abc123def456 +openhands acp --resume --last +``` -## Choosing process mode +### mcp -In some deployments, this is selected via the legacy RUNTIME -environment variable: +Manage Model Context Protocol server configurations. ```bash -export RUNTIME=process -# (legacy alias) -# export RUNTIME=local +openhands mcp [OPTIONS] ``` -If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). +#### mcp add +Add a new MCP server. -# Remote Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote +```bash +openhands mcp add --transport [OPTIONS] [-- args...] +``` -A **remote sandbox** runs the agent server in a remote execution environment -instead of on your local machine. +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | -This is typically used by managed deployments (e.g., OpenHands Cloud) and -advanced self-hosted setups. +**Examples:** +```bash +openhands mcp add my-api --transport http https://api.example.com/mcp +openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com +openhands mcp add local --transport stdio python -- -m my_server +openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server +``` -## Selecting remote mode +#### mcp list -In some self-hosted deployments, remote sandboxes are selected via the legacy -RUNTIME environment variable: +List all configured MCP servers. ```bash -export RUNTIME=remote +openhands mcp list ``` -Remote sandboxes require additional configuration (API URL + API key). The exact -variable names depend on your deployment, but you may see legacy names like: - -- SANDBOX_REMOTE_RUNTIME_API_URL -- SANDBOX_API_KEY +#### mcp get -## Notes +Get details for a specific MCP server. -- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) - depending on the provider. -- Configuration and credentials vary by deployment. +```bash +openhands mcp get +``` -If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). +#### mcp remove +Remove an MCP server configuration. -# API Keys Settings -Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings +```bash +openhands mcp remove +``` - - These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). - +#### mcp enable -## Overview +Enable an MCP server. -Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to -OpenHands Cloud +```bash +openhands mcp enable +``` -## OpenHands LLM Key +#### mcp disable - -You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. - +Disable an MCP server. -You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), -[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will -use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. +```bash +openhands mcp disable +``` -## OpenHands API Key +### login -These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the -[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). +Authenticate with OpenHands Cloud. -### Create API Key +```bash +openhands login [OPTIONS] +``` -1. Navigate to the `Settings > API Keys` page. -2. Click `Create API Key`. -3. Give your API key a name and click `Create`. +| Option | Description | +|--------|-------------| +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | -### Delete API Key +**Examples:** +```bash +openhands login +openhands login --server-url https://enterprise.openhands.dev +``` -1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. -2. Click `Delete` to confirm removal. +### logout +Log out from OpenHands Cloud. -# Application Settings -Source: https://docs.openhands.dev/openhands/usage/settings/application-settings +```bash +openhands logout [OPTIONS] +``` -## Overview +| Option | Description | +|--------|-------------| +| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | -The Application settings allows you to customize various application-level behaviors in OpenHands, including -language preferences, notification settings, custom Git author configuration and more. +**Examples:** +```bash +openhands logout +openhands logout --server-url https://app.all-hands.dev +``` -## Setting Maximum Budget Per Conversation +## Interactive Commands -To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) -in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but -you can choose to continue the conversation with a prompt. +Commands available inside the CLI (prefix with `/`): -## Git Author Settings +| Command | Description | +|---------|-------------| +| `/help` | Display available commands | +| `/new` | Start a new conversation | +| `/history` | Toggle conversation history | +| `/confirm` | Configure confirmation settings | +| `/condense` | Condense conversation history | +| `/skills` | View loaded skills, hooks, and MCPs | +| `/feedback` | Send anonymous feedback about CLI | +| `/exit` | Exit the application | -OpenHands provides the ability to customize the Git author information used when making commits and creating -pull requests on your behalf. +## Command Palette -By default, OpenHands uses the following Git author information for all commits and pull requests: +Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: -- **Username**: `openhands` -- **Email**: `openhands@all-hands.dev` +| Option | Description | +|--------|-------------| +| **History** | Toggle conversation history panel | +| **Keys** | Show keyboard shortcuts | +| **MCP** | View MCP server configurations | +| **Maximize** | Maximize/restore window | +| **Plan** | View agent plan | +| **Quit** | Quit the application | +| **Screenshot** | Take a screenshot | +| **Settings** | Configure LLM model, API keys, and other settings | +| **Theme** | Toggle color theme | -To override the defaults: +## Changing Your Model -1. Navigate to the `Settings > Application` page. -2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. -3. Click `Save Changes` +### Via Settings UI - - When you configure a custom Git author, OpenHands will use your specified username and email as the primary author - for commits and pull requests. OpenHands will remain as a co-author. - +1. Press `Ctrl+P` to open the command palette +2. Select **Settings** +3. Choose your LLM provider and model +4. Save changes (no restart required) +### Via Configuration File -# Integrations Settings -Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings +Edit `~/.openhands/agent_settings.json` and change the `model` field: -## Overview +```json +{ + "llm": { + "model": "claude-sonnet-4-5-20250929", + "api_key": "...", + "base_url": "..." + } +} +``` -OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some -integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether -you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or -[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). +### Via Environment Variables -## OpenHands Cloud Integrations Settings +Temporarily override your model without changing saved configuration: - - These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). - +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-api-key" +openhands --override-with-envs +``` -### GitHub Settings +Changes made with `--override-with-envs` are not persisted. -- `Configure GitHub Repositories` - Allows you to -[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. +## Environment Variables -### Slack Settings +| Variable | Description | +|----------|-------------| +| `LLM_API_KEY` | API key for your LLM provider | +| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | +| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | +| `OPENHANDS_CLOUD_URL` | Default cloud server URL | +| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | -- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in - your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. +## Exit Codes -## Running on Your Own Integrations Settings +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | Error or task failed | +| `2` | Invalid arguments | - - These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). - +## Configuration Files -### Version Control Integrations +| File | Purpose | +|------|---------| +| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | +| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | +| `~/.openhands/mcp.json` | MCP server configurations | +| `~/.openhands/conversations/` | Conversation history | -#### GitHub Setup +## See Also -OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: +- [Installation](/openhands/usage/cli/installation) - Install the CLI +- [Quick Start](/openhands/usage/cli/quick-start) - Get started +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers - - +### Critic (Experimental) +Source: https://docs.openhands.dev/openhands/usage/cli/critic.md - 1. **Generate a Personal Access Token (PAT)**: - - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. - - **Tokens (classic)** - - Required scopes: - - `repo` (Full control of private repositories) - - **Fine-grained tokens** - - All Repositories (You can select specific repositories, but this will impact what returns in repo search) - - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `GitHub Token` field. - - Click `Save Changes` to apply the changes. + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + - If you're working with organizational repositories, additional setup may be required: +## Overview - 1. **Check organization requirements**: - - Organization admins may enforce specific token policies. - - Some organizations require tokens to be created with SSO enabled. - - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). - 2. **Verify organization access**: - - Go to your token settings on GitHub. - - Look for the organization under `Organization access`. - - If required, click `Enable SSO` next to your organization. - - Complete the SSO authorization process. - +If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. - - Try regenerating the token. +For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). - - **Organization Access Denied**: - - Check if SSO is required but not enabled. - - Verify organization membership. - - Contact organization admin if token policies are blocking access. - - -#### GitLab Setup +## What is the Critic? -OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: +The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: - - - 1. **Generate a Personal Access Token (PAT)**: - - On GitLab, go to `User Settings > Access Tokens`. - - Create a new token with the following scopes: - - `api` (API access) - - `read_user` (Read user information) - - `read_repository` (Read repository) - - `write_repository` (Write repository) - - Set an expiration date or leave it blank for a non-expiring token. - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `GitLab Token` field. - - Click `Save Changes` to apply the changes. +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion - 3. **(Optional): Restrict agent permissions** - - Create another PAT using Step 1 and exclude `api` scope . - - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. - - OpenHands will use the higher scope token, and the agent will use the lower scope token. - + - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. +![Critic output in CLI](./screenshots/critic-cli-output.png) - - **Access Denied**: - - Verify project access permissions. - - Check if the token has the necessary scopes. - - For group/organization repositories, ensure you have proper access. - - +## Pricing -#### BitBucket Setup - - -1. **Generate an App password**: - - On Bitbucket, go to `Account Settings > App Password`. - - Create a new password with the following scopes: - - `account`: `read` - - `repository: write` - - `pull requests: write` - - `issues: write` - - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `BitBucket Token` field. - - Click `Save Changes` to apply the changes. - +The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. - +## Disabling the Critic - +If you prefer not to use the critic feature, you can disable it in your settings: + +1. Open the command palette with `Ctrl+P` +2. Select **Settings** +3. Navigate to the **CLI Settings** tab +4. Toggle off **Enable Critic (Experimental)** +![Critic settings in CLI](./screenshots/critic-cli-settings.png) -# Language Model (LLM) Settings -Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings +### GUI Server +Source: https://docs.openhands.dev/openhands/usage/cli/gui-server.md ## Overview -The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is -supported by litellm, but it requires a powerful model to work properly. -[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some -additional LLM settings on this page. +The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. -## Basic LLM Settings +```bash +openhands serve +``` -The most popular providers and models are available in the basic settings. Some of the providers have been verified to -work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and -Mistral AI. + +This requires Docker to be installed and running on your system. + -1. Choose your preferred provider using the `LLM Provider` dropdown. -2. Choose your favorite model using the `LLM Model` dropdown. -3. Set the `API Key` for your chosen provider and model and click `Save Changes`. +## Prerequisites -This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first -restart older conversations. +- [Docker](https://docs.docker.com/get-docker/) installed and running +- Sufficient disk space for Docker images (~2GB) -## Advanced LLM Settings +## Basic Usage -Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use -this when your preferred provider or model does not exist in the basic settings dropdowns. +```bash +# Launch the GUI server +openhands serve -1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the - custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have - [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). -2. `Base URL`: If your provider has a specific base URL, specify it here. -3. `API Key`: Set the API key for your custom model. -4. Click `Save Changes` +# The server will be available at http://localhost:3000 +``` -### Memory Condensation +The command will: +1. Check Docker requirements +2. Pull the required Docker images +3. Start the OpenHands GUI server +4. Display the URL to access the interface -The memory condenser manages the language model's context by ensuring only the most important and relevant information -is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running -conversations. +## Options -- `Enable memory condensation` - Turn on this setting to activate this feature. -- `Memory condenser max history size` - The condenser will summarize the history after this many events. +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | +## Mounting Your Workspace -# Model Context Protocol (MCP) -Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings +To give OpenHands access to your local files: -## Overview +```bash +# Mount current directory +openhands serve --mount-cwd +``` -Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These -servers can provide additional functionality to the agent, such as specialized data processing, external API access, -or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). +This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. -## Supported MCPs + +Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. + -OpenHands supports the following MCP transport protocols: +## GPU Support -* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) -* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) -* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) +For tasks that benefit from GPU acceleration: -## How MCP Works +```bash +openhands serve --gpu +``` -When OpenHands starts, it: +This requires: +- NVIDIA GPU +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed +- Docker configured for GPU support -1. Reads the MCP configuration. -2. Connects to any configured SSE and SHTTP servers. -3. Starts any configured stdio servers. -4. Registers the tools provided by these servers with the agent. +## Examples -The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: +```bash +# Basic GUI server +openhands serve -1. OpenHands routes the call to the appropriate MCP server. -2. The server processes the request and returns a response. -3. OpenHands converts the response to an observation and presents it to the agent. +# Mount current project and enable GPU +cd /path/to/your/project +openhands serve --mount-cwd --gpu +``` -## Configuration +## How It Works -MCP configuration can be defined in: -* The OpenHands UI in the `Settings > MCP` page. -* The `config.toml` file under the `[mcp]` section if not using the UI. +The `openhands serve` command: -### Configuration Options +1. **Pulls Docker images**: Downloads the OpenHands runtime and application images +2. **Starts containers**: Runs the OpenHands server in a Docker container +3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` +4. **Shares settings**: Uses your `~/.openhands` directory for configuration - - - SSE servers are configured using either a string URL or an object with the following properties: +## Stopping the Server - - `url` (required) - - Type: `str` - - Description: The URL of the SSE server. +Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. - - `api_key` (optional) - - Type: `str` - - Description: API key for authentication. - - - SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: +## Comparison: GUI Server vs Web Interface - - `url` (required) - - Type: `str` - - Description: The URL of the SHTTP server. +| Feature | `openhands serve` | `openhands web` | +|---------|-------------------|-----------------| +| Interface | Full web GUI | Terminal UI in browser | +| Dependencies | Docker required | None | +| Resources | Full container (~2GB) | Lightweight | +| Features | All GUI features | CLI features only | +| Best for | Rich GUI experience | Quick terminal access | - - `api_key` (optional) - - Type: `str` - - Description: API key for authentication. +## Troubleshooting - - `timeout` (optional) - - Type: `int` - - Default: `60` - - Range: `1-3600` seconds (1 hour maximum) - - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. - - **Use Cases:** - - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. - - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. - - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. - - This timeout only applies to individual tool calls, not server connection establishment. - - - - - While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for - better reliability and performance. - +### Docker Not Running - Stdio servers are configured using an object with the following properties: +``` +❌ Docker daemon is not running. +Please start Docker and try again. +``` - - `name` (required) - - Type: `str` - - Description: A unique name for the server. +**Solution**: Start Docker Desktop or the Docker daemon. - - `command` (required) - - Type: `str` - - Description: The command to run the server. +### Permission Denied - - `args` (optional) - - Type: `list of str` - - Default: `[]` - - Description: Command-line arguments to pass to the server. +``` +Got permission denied while trying to connect to the Docker daemon socket +``` - - `env` (optional) - - Type: `dict of str to str` - - Default: `{}` - - Description: Environment variables to set for the server process. - - +**Solution**: Add your user to the docker group: +```bash +sudo usermod -aG docker $USER +# Then log out and back in +``` -#### When to Use Direct Stdio +### Port Already in Use -Direct stdio connections may still be appropriate in these scenarios: -- **Development and testing**: Quick prototyping of MCP servers. -- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. -- **Local-only environments**: When you don't want to manage additional proxy processes. +If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. -### Configuration Examples +## See Also - - - For stdio-based MCP servers, we recommend using MCP proxy tools like - [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. - [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to - HTTP/SSE endpoints. +- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide +- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access +- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details - Start the proxy servers separately: - ```bash - # Terminal 1: Filesystem server proxy - supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 +### Headless Mode +Source: https://docs.openhands.dev/openhands/usage/cli/headless.md - # Terminal 2: Fetch server proxy - supergateway --stdio "uvx mcp-server-fetch" --port 8081 - ``` +## Overview - Then configure OpenHands to use the HTTP endpoint: +Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: +- CI/CD pipelines +- Automated scripting +- Integration with other tools +- Batch processing - ```toml - [mcp] - # SSE Servers - Recommended approach using proxy tools - sse_servers = [ - # Basic SSE server with just a URL - "http://example.com:8080/mcp", +```bash +openhands --headless -t "Your task here" +``` - # SuperGateway proxy for fetch server - "http://localhost:8081/sse", +## Requirements - # External MCP service with authentication - {url="https://api.example.com/mcp/sse", api_key="your-api-key"} - ] +- Must specify a task with `--task` or `--file` - # SHTTP Servers - Modern streamable HTTP transport (recommended) - shttp_servers = [ - # Basic SHTTP server with default 60s timeout - "https://api.example.com/mcp/shttp", + +**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. + - # Server with custom timeout for heavy operations - { - url = "https://files.example.com/mcp/shttp", - api_key = "your-api-key", - timeout = 1800 # 30 minutes for large file processing - } - ] - ``` - - - - This setup is not Recommended for production. - - ```toml - [mcp] - # Direct stdio servers - use only for development/testing - stdio_servers = [ - # Basic stdio server - {name="fetch", command="uvx", args=["mcp-server-fetch"]}, +## Basic Usage - # Stdio server with environment variables - { - name="filesystem", - command="npx", - args=["@modelcontextprotocol/server-filesystem", "/"], - env={ - "DEBUG": "true" - } - } - ] - ``` +```bash +# Run a task in headless mode +openhands --headless -t "Write a Python script that prints hello world" - For production use, we recommend using proxy tools like SuperGateway. - - +# Load task from a file +openhands --headless -f task.txt +``` -Other options include: +## JSON Output Mode -- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. -- **Docker-based proxies**: Containerized solutions for better isolation. -- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. +The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: +```bash +openhands --headless --json -t "Create a simple Flask app" +``` -# Secrets Management -Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings +Each line is a JSON object representing an agent event: -## Overview +```json +{"type": "action", "action": "write", "path": "app.py", ...} +{"type": "observation", "content": "File created successfully", ...} +{"type": "action", "action": "run", "command": "python app.py", ...} +``` -OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be -accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment -variables in the agent's runtime environment. +### Use Cases for JSON Output -## Accessing the Secrets Manager +- **CI/CD pipelines**: Parse events to determine success/failure +- **Automated processing**: Feed output to other tools +- **Logging**: Capture structured logs for analysis +- **Integration**: Connect OpenHands with other systems -Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. +### Example: Capture Output to File -## Adding a New Secret -1. Click `Add a new secret`. -2. Fill in the following fields: - - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. - - **Value**: The sensitive information you want to store. - - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. -3. Click `Add secret` to save. +```bash +openhands --headless --json -t "Add unit tests" > output.jsonl +``` -## Editing a Secret +## See Also -1. Click the `Edit` button next to the secret you want to modify. -2. You can update the name and description of the secret. - - For security reasons, you cannot view or edit the value of an existing secret. If you need to change the - value, delete the secret and create a new one. - +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options -## Deleting a Secret +### JetBrains IDEs +Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md -1. Click the `Delete` button next to the secret you want to remove. -2. Select `Confirm` to delete the secret. - -## Using Secrets in the Agent - - All custom secrets are automatically exported as environment variables in the agent's runtime environment. - - You can access them in your code using standard environment variable access methods. For example, if you create a - secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or - `os.environ['OPENAI_API_KEY']` in Python. +[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. +## Supported IDEs -# Prompting Best Practices -Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices +This guide applies to all JetBrains IDEs: -## Characteristics of Good Prompts +- IntelliJ IDEA +- PyCharm +- WebStorm +- GoLand +- Rider +- CLion +- PhpStorm +- RubyMine +- DataGrip +- And other JetBrains IDEs -Good prompts are: +## Prerequisites -- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. -- **Location-specific**: Specify the locations in the codebase that should be modified, if known. -- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. +Before configuring JetBrains IDEs: -## Examples +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **JetBrains IDE version 25.3 or later** +4. **JetBrains AI Assistant enabled** in your IDE -### Good Prompt Examples + +JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. + -- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. -- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. -- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. +## Configuration -### Bad Prompt Examples +### Step 1: Create the ACP Configuration File -- Make the code better. (Too vague, not concrete) -- Rewrite the entire backend to use a different framework. (Not appropriately scoped) -- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) +Create or edit the file `$HOME/.jetbrains/acp.json`: -## Tips for Effective Prompting + + + ```bash + mkdir -p ~/.jetbrains + nano ~/.jetbrains/acp.json + ``` + + + Create the file at `C:\Users\\.jetbrains\acp.json` + + -- Be as specific as possible about the desired outcome or the problem to be solved. -- Provide context, including relevant file paths and line numbers if available. -- Break large tasks into smaller, manageable prompts. -- Include relevant error messages or logs. -- Specify the programming language or framework, if not obvious. +### Step 2: Add the Configuration -The more precise and informative your prompt, the better OpenHands can assist you. +Add the following JSON: -See [First Projects](/overview/first-projects) for more examples of helpful prompts. +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + } + } +} +``` +### Step 3: Use OpenHands in Your IDE -# Troubleshooting -Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting +Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. - -OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. - +## Advanced Configuration -### Launch docker client failed +### LLM-Approve Mode -**Description** +For automatic LLM-based approval: -When running OpenHands, the following error is seen: -``` -Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--llm-approve"], + "env": {} + } + } +} ``` -**Resolution** - -Try these in order: -* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. -* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. -* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. -* Reinstall Docker Desktop. - -### Permission Error +### Auto-Approve Mode -**Description** +For automatic approval of all actions (use with caution): -On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + } + } +} +``` -**Resolution** +### Resume a Conversation -* Check if the `~/.openhands` is owned by `root`. If so, you can: - * Change the directory's ownership: `sudo chown : ~/.openhands`. - * or update permissions on the directory: `sudo chmod 777 ~/.openhands` - * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. -* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running - OpenHands. +Resume a specific conversation: -### On Linux, Getting ConnectTimeout Error +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "openhands", + "args": ["acp", "--resume", "abc123def456"], + "env": {} + } + } +} +``` -**Description** +Resume the latest conversation: -When running on Linux, you might run into the error `ERROR:root:: timed out`. +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` -**Resolution** +### Multiple Configurations -If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that -these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker -[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. +Add multiple configurations for different use cases: -If that does not solve the issue, try incrementally adding the following parameters to the docker run command: -* `--network host` -* `-e SANDBOX_USE_HOST_NETWORK=true` -* `-e DOCKER_HOST_ADDR=127.0.0.1` +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` -### Internal Server Error. Ports are not available +### Environment Variables -**Description** +Pass environment variables to the agent: -When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP -...: bind: An attempt was made to access a socket in a -way forbidden by its access permissions.")` is encountered. +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": { + "LLM_API_KEY": "your-api-key" + } + } + } +} +``` -**Resolution** +## Troubleshooting -* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: -``` -Restart-Service -Name "winnat" -``` +### "Agent not found" or "Command failed" -### Unable to access VS Code tab via local IP +1. Verify OpenHands CLI is installed: + ```bash + openhands --version + ``` -**Description** +2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) -When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" -error, while other parts of the UI work fine. +### "AI Assistant not available" -**Resolution** +1. Ensure you have JetBrains IDE version 25.3 or later +2. Enable AI Assistant: `Settings > Plugins > AI Assistant` +3. Restart the IDE after enabling -This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. -To fix this: +### Agent doesn't respond -1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: +1. Check your LLM settings: ```bash - docker run -it --rm \ - -e SANDBOX_VSCODE_PORT=41234 \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - -p 41234:41234 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:latest + openhands + # Use /settings to configure ``` - > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. - -2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. -3. If running with the development workflow, you can set this in your `config.toml` file: - ```toml - [sandbox] - vscode_port = 41234 +2. Test ACP mode in terminal: + ```bash + openhands acp + # Should start without errors ``` -### GitHub Organization Rename Issues +### Configuration not applied -**Description** +1. Verify the config file location: `~/.jetbrains/acp.json` +2. Validate JSON syntax (no trailing commas, proper quotes) +3. Restart your JetBrains IDE -After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. +### Finding Your Conversation ID -**Resolution** +To resume conversations, first find the ID: -* Update your git remote URL: - ```bash - # Check current remote - git remote get-url origin - - # Update SSH remote - git remote set-url origin git@github.com:OpenHands/OpenHands.git - - # Or update HTTPS remote - git remote set-url origin https://github.com/OpenHands/OpenHands.git - ``` -* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` -* Find and update any hardcoded references: - ```bash - git grep -i "all-hands-ai" - git grep -i "ghcr.io/all-hands-ai" - ``` +```bash +openhands --resume +``` +This displays recent conversations with their IDs: -# COBOL Modernization -Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py +-------------------------------------------------------------------------------- +``` -Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. +## See Also - -This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). - +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs -## The COBOL Modernization Challenge +### IDE Integration Overview +Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview.md -[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. + +IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). + -The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. + +**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. + -## Overview +## What is the Agent Client Protocol (ACP)? -COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. +The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. -OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: +## Supported IDEs -1. **Understanding**: Analyze and document existing COBOL code -2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# -3. **Validation**: Ensure the modernized code behaves identically to the original +| IDE | Support Level | Setup Guide | +|-----|---------------|-------------| +| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | +| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | +| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | +| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | -In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. +## Prerequisites -## Understanding +Before using OpenHands with any IDE, you must: -A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. +1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) -Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: +2. **Configure your LLM settings** using the `/settings` command: + ```bash + openhands + # Then use /settings to configure + ``` -``` -For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. +The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. -Save the results in `business_functions.json` in the following format: +## How It Works -{ - ..., - "COBIL00C.cbl": { - "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", - "references": [ - "docs/billing.md#bill-payment", - "docs/transactions.md#transaction-action" - ], - }, - ... -} +```mermaid +graph LR + IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] + CLI -->|API Calls| LLM[LLM Provider] + CLI -->|Commands| Runtime[Sandbox Runtime] ``` -OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. +1. Your IDE launches `openhands acp` as a subprocess +2. Communication happens via JSON-RPC 2.0 over stdio +3. OpenHands uses your configured LLM and runtime settings +4. Results are displayed in your IDE's interface -## Translation +## The ACP Command -With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. +The `openhands acp` command starts OpenHands as an ACP server: -One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. +```bash +# Basic ACP server +openhands acp -An example prompt is below: +# With LLM-based approval +openhands acp --llm-approve -``` -Convert the COBOL files in `/src` to Java in `/src/java`. +# Resume a conversation +openhands acp --resume -Requirements: -1. Create a Java class for each COBOL program -2. Preserve the business logic and data structures (see `business_functions.json`) -3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) -4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) -5. Implement proper error handling with try-catch blocks -6. Add JavaDoc comments explaining the purpose of each class and method -7. In JavaDoc comments, include traceability to the original COBOL source using - the format: @source : (e.g., @source CBACT01C.cbl:73-77) -8. Create a clean, maintainable object-oriented design -9. Each Java file should be compilable and follow Java best practices +# Resume the latest conversation +openhands acp --resume --last ``` -Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. - -## Validation +### ACP Options -Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | -Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. +## Confirmation Modes -## Scaling Up +OpenHands ACP supports three confirmation modes to control how agent actions are approved: -The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. +### Always Ask (Default) -The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): +The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. -```python -while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - # Migrating agent converts COBOL to Java - migration_conversation.send_message(migration_prompt) - migration_conversation.run() - - # Critiquing agent evaluates the conversion - critique_conversation.send_message(critique_prompt) - critique_conversation.run() - - # Parse the score and decide whether to continue - current_score = parse_critique_score(critique_file) +```bash +openhands acp # defaults to always-ask mode ``` -By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: - -``` -Evaluate the quality of the COBOL to Java migration in `/src`. +### Always Approve -For each Java file, assess using the following criteria: -1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? -2. Code Quality: Is the code clean, readable, and following Java 17 conventions? -3. Completeness: Are all COBOL features properly converted? -4. Best Practices: Does it use proper OOP, error handling, and documentation? +The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. -For each instance of a criteria not met, deduct a point. +```bash +openhands acp --always-approve +``` -Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. +### LLM-Based Approval -Save the results in `critique.json` in the following format: +The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. -{ - "total_score": -12, - "files": [ - { - "cobol": "COBIL00C.cbl", - "java": "bill_payment.java", - "scores": { - "correctness": 0, - "code_quality": 0, - "completeness": -1, - "best_practices": -2 - }, - "feedback": [ - "Rename single-letter variables to meaningful names.", - "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", - ], - }, - ... - ] -} +```bash +openhands acp --llm-approve ``` -In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. +### Changing Modes During a Session -This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. +You can change the confirmation mode during an active session using slash commands: -## Try It Yourself +| Command | Description | +|---------|-------------| +| `/confirm always-ask` | Switch to always-ask mode | +| `/confirm always-approve` | Switch to always-approve mode | +| `/confirm llm-approve` | Switch to LLM-based approval mode | +| `/help` | Show all available slash commands | -The full iterative refinement example is available in the OpenHands SDK: + +The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. + -```bash -export LLM_API_KEY="your-api-key" -cd software-agent-sdk -uv run python examples/01_standalone_sdk/31_iterative_refinement.py -``` +## Choosing an IDE -For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. + + + High-performance editor with native ACP support. Best for speed and simplicity. + + + Universal terminal interface. Works with any terminal, consistent experience. + + + Popular editor with community extension. Great for VS Code users. + + + IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. + + +## Resuming Conversations in IDEs -## Related Resources +You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +```bash +openhands --resume +``` +This shows your recent conversations: -# Automated Code Review -Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py -Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service +-------------------------------------------------------------------------------- +``` -## Overview +Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. -The OpenHands PR Review workflow is a GitHub Actions workflow that: +## See Also -- **Triggers automatically** when PRs are opened or when you request a review -- **Analyzes code changes** in the context of your entire repository -- **Posts inline comments** directly on specific lines of code in the PR -- **Provides fast feedback** - typically within 2-3 minutes +- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide -## How It Works +### Toad Terminal +Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad.md -The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: +[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). -1. **Trigger**: The workflow runs when: - - A new non-draft PR is opened - - A draft PR is marked as ready for review - - The `review-this` label is added to a PR - - `openhands-agent` is requested as a reviewer +The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. -2. **Analysis**: The agent receives the complete PR diff and uses two skills: - - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices - - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API +![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) -3. **Output**: Review comments are posted directly on the PR with: - - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) - - Specific line references - - Actionable suggestions with code examples +## Why Toad? -### Review Styles +Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: -Choose between two review styles: +- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything +- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs +- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP -| Style | Description | Best For | -|-------|-------------|----------| -| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | -| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | +OpenHands is included as a recommended agent in Toad's agent store. -## Quick Start +## Prerequisites - - - Create `.github/workflows/pr-review-by-openhands.yml` in your repository: +Before using Toad with OpenHands: - ```yaml - name: PR Review by OpenHands +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` - on: - pull_request_target: - types: [opened, ready_for_review, labeled, review_requested] +## Installation - permissions: - contents: read - pull-requests: write - issues: write +Install Toad using [uv](https://docs.astral.sh/uv/): - jobs: - pr-review: - if: | - (github.event.action == 'opened' && github.event.pull_request.draft == false) || - github.event.action == 'ready_for_review' || - github.event.label.name == 'review-this' || - github.event.requested_reviewer.login == 'openhands-agent' - runs-on: ubuntu-latest - steps: - - name: Run PR Review - uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main - with: - llm-model: anthropic/claude-sonnet-4-5-20250929 - review-style: standard - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} - ``` - +```bash +uvx batrachian-toad +``` - - Go to your repository's **Settings → Secrets and variables → Actions** and add: - - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) - +For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). - - Create a `review-this` label in your repository: - 1. Go to **Issues → Labels** - 2. Click **New label** - 3. Name: `review-this` - 4. Description: `Trigger OpenHands PR review` - +## Setup - - Open a PR and either: - - Add the `review-this` label, OR - - Request `openhands-agent` as a reviewer - - +### Using the Agent Store -## Composite Action +The easiest way to set up OpenHands with Toad: -The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: +1. Launch Toad: `uvx batrachian-toad` +2. Open Toad's agent store +3. Find **OpenHands** in the list of recommended agents +4. Click **Install** to set up OpenHands +5. Select OpenHands and start a conversation -- Checking out the SDK at the specified version -- Setting up Python and dependencies -- Running the PR review agent -- Uploading logs as artifacts +The install process runs: +```bash +uv tool install openhands --python 3.12 && openhands login +``` -### Action Inputs +### Manual Configuration -| Input | Description | Required | Default | -|-------|-------------|----------|---------| -| `llm-model` | LLM model to use | Yes | - | -| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | -| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | -| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | -| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | -| `llm-api-key` | LLM API key | Yes | - | -| `github-token` | GitHub token for API access | Yes | - | +You can also launch Toad directly with OpenHands: - -Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. - +```bash +toad acp "openhands acp" +``` -## Customization +## Usage -### Repository-Specific Review Guidelines +### Basic Usage -Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: +```bash +# Launch Toad with OpenHands +toad acp "openhands acp" +``` -```markdown ---- -name: code-review -description: Custom code review guidelines for this repository -triggers: -- /codereview ---- +### With Command Line Arguments -# Repository Code Review Guidelines +Pass OpenHands CLI flags through Toad: -You are reviewing code for [Your Project Name]. Follow these guidelines: +```bash +# Use LLM-based approval mode +toad acp "openhands acp --llm-approve" -## Review Decisions +# Auto-approve all actions +toad acp "openhands acp --always-approve" +``` -### When to APPROVE -- Configuration changes following existing patterns -- Documentation-only changes -- Test-only changes without production code changes -- Simple additions following established conventions +### Resume a Conversation -### When to COMMENT -- Issues that need attention (bugs, security concerns) -- Suggestions for improvement -- Questions about design decisions +Resume a specific conversation by ID: -## Core Principles +```bash +toad acp "openhands acp --resume abc123def456" +``` -1. **[Your Principle 1]**: Description -2. **[Your Principle 2]**: Description +Resume the most recent conversation: -## What to Check +```bash +toad acp "openhands acp --resume --last" +``` -- **[Category 1]**: What to look for -- **[Category 2]**: What to look for + +Find your conversation IDs by running `openhands --resume` in a regular terminal. + -## Repository Conventions +## Advanced Configuration -- Use [your linter] for style checking -- Follow [your style guide] -- Tests should be in [your test directory] -``` +### Combined Options - -The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. - +```bash +# Resume with LLM approval +toad acp "openhands acp --resume --last --llm-approve" +``` -### Workflow Configuration +### Environment Variables -Customize the workflow by modifying the action inputs: +Pass environment variables to OpenHands: -```yaml -- name: Run PR Review - uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main - with: - # Change the LLM model - llm-model: anthropic/claude-sonnet-4-5-20250929 - # Use a custom LLM endpoint - llm-base-url: https://your-llm-proxy.example.com - # Switch to "roasted" style for brutally honest reviews - review-style: roasted - # Pin to a specific SDK version for stability - sdk-version: main - # Secrets - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} +```bash +LLM_API_KEY=your-key toad acp "openhands acp" ``` -### Trigger Customization - -Modify when reviews are triggered by editing the workflow conditions: +## Troubleshooting -```yaml -# Only trigger on label (disable auto-review on PR open) -if: github.event.label.name == 'review-this' +### "openhands" command not found -# Only trigger when specific reviewer is requested -if: github.event.requested_reviewer.login == 'openhands-agent' +Ensure OpenHands is installed: +```bash +uv tool install openhands --python 3.12 +``` -# Trigger on all PRs (including drafts) -if: | - github.event.action == 'opened' || - github.event.action == 'synchronize' +Verify it's in your PATH: +```bash +which openhands ``` -## Security Considerations +### Agent doesn't respond -The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. +1. Check your LLM settings: `openhands` then `/settings` +2. Verify your API key is valid +3. Check network connectivity to your LLM provider - -**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. +### Conversation not persisting -To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. - +Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. -## Example Reviews +## See Also -See real automated reviews in action on the OpenHands Software Agent SDK repository: +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs -| PR | Description | Review Highlights | -|----|-------------|-------------------| -| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | -| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | -| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | -| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | +### VS Code +Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md -## Troubleshooting +[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. - - - - Ensure the `LLM_API_KEY` secret is set correctly - - Check that the label name matches exactly (`review-this`) - - Verify the workflow file is in `.github/workflows/` - - Check the Actions tab for workflow run errors - - - - - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission - - Check the workflow logs for API errors - - Verify the PR is not from a fork with restricted permissions - - - - - Large PRs may take longer to analyze - - Consider splitting large PRs into smaller ones - - Check if the LLM API is experiencing delays - - + +VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. + -## Related Resources +## Prerequisites -- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script -- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews -- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows -- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud -- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills +Before configuring VS Code: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) +## Installation -# Dependency Upgrades -Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades +### Step 1: Install the Extension -Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. +1. Open VS Code +2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) +3. Search for **"VSCode ACP"** +4. Click **Install** -## Overview +Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). -OpenHands helps with dependency management by: +### Step 2: Connect to OpenHands -- **Analyzing dependencies**: Identifying outdated packages and their versions -- **Planning upgrades**: Creating upgrade strategies and migration guides -- **Implementing changes**: Updating code to handle breaking changes -- **Validating results**: Running tests and verifying functionality +1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) +2. Click **Connect** to start a session +3. Select **OpenHands** from the agent dropdown +4. Start chatting with OpenHands! -## Dependency Analysis Examples +## How It Works -### Identifying Outdated Dependencies +The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. -Start by understanding your current dependency state: +The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. + +## Verification + +Ensure OpenHands is discoverable: +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands ``` -Analyze the dependencies in this project and create a report: -1. List all direct dependencies with current and latest versions -2. Identify dependencies more than 2 major versions behind -3. Flag any dependencies with known security vulnerabilities -4. Highlight dependencies that are deprecated or unmaintained -5. Prioritize which updates are most important +If the command is not found, install OpenHands CLI: +```bash +uv tool install openhands --python 3.12 ``` -**Example output:** +## Advanced Usage -| Package | Current | Latest | Risk | Priority | -|---------|---------|--------|------|----------| -| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | -| react | 16.8.0 | 18.2.0 | Outdated | Medium | -| express | 4.17.1 | 4.18.2 | Minor update | Low | -| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | +### Custom Arguments -### Security-Related Dependency Upgrades +The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. -Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: +### Resume Conversations -- Automating vulnerability detection and remediation -- Integrating with security scanners (Snyk, Dependabot, CodeQL) -- Building automated pipelines for security fixes -- Using OpenHands agents to create pull requests automatically +To resume a conversation, you may need to: -### Compatibility Checking +1. Find your conversation ID: `openhands --resume` +2. Configure the extension to use custom arguments (if supported) +3. Or use the terminal directly: `openhands acp --resume ` -Check for compatibility issues before upgrading: + +The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. + -``` -Check compatibility for upgrading React from 16 to 18: +## Troubleshooting -1. Review our codebase for deprecated React patterns -2. List all components using lifecycle methods -3. Identify usage of string refs or findDOMNode -4. Check third-party library compatibility with React 18 -5. Estimate the effort required for migration -``` +### OpenHands Not Appearing in Dropdown -**Compatibility matrix:** +1. Verify OpenHands is installed and in PATH: + ```bash + which openhands + openhands --version + ``` -| Dependency | React 16 | React 17 | React 18 | Action Needed | -|------------|----------|----------|----------|---------------| -| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | -| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | -| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | +2. Restart VS Code after installing OpenHands -## Automated Upgrade Examples +3. Check if the extension recognizes agents: + - Look for any error messages in the extension panel + - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) -### Version Updates +### Connection Failed -Perform straightforward version updates: +1. Ensure your LLM settings are configured: + ```bash + openhands + # Use /settings to configure + ``` - - - ``` - Update all patch and minor versions in package.json: - - 1. Review each update for changelog notes - 2. Update package.json with new versions - 3. Update package-lock.json - 4. Run the test suite - 5. List any deprecation warnings - ``` - - - ``` - Update dependencies in requirements.txt: - - 1. Check each package for updates - 2. Update requirements.txt with compatible versions - 3. Update requirements-dev.txt similarly - 4. Run tests and verify functionality - 5. Note any deprecation warnings - ``` - - - ``` - Update dependencies in pom.xml: - - 1. Check for newer versions of each dependency - 2. Update version numbers in pom.xml - 3. Run mvn dependency:tree to check conflicts - 4. Run the test suite - 5. Document any API changes encountered - ``` - - +2. Check that `openhands acp` works in terminal: + ```bash + openhands acp + # Should start without errors (Ctrl+C to exit) + ``` -### Breaking Change Handling +### Extension Not Working -When major versions introduce breaking changes: +1. Update to the latest version of the extension +2. Check for VS Code updates +3. Report issues on the [extension's GitHub](https://github.com/omercnet) -``` -Upgrade axios from v0.x to v1.x and handle breaking changes: +## Limitations -1. List all breaking changes in axios 1.0 changelog -2. Find all axios usages in our codebase -3. For each breaking change: - - Show current code - - Show updated code - - Explain the change -4. Create a git commit for each logical change -5. Verify all tests pass -``` +Since this is a community extension: -**Example transformation:** +- Feature availability may vary +- Support depends on the extension maintainer +- Not all OpenHands CLI flags may be accessible through the UI -```javascript -// Before (axios 0.x) -import axios from 'axios'; -axios.defaults.baseURL = 'https://api.example.com'; -const response = await axios.get('/users', { - cancelToken: source.token -}); +For the most control over OpenHands, consider using: +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage +- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support -// After (axios 1.x) -import axios from 'axios'; -axios.defaults.baseURL = 'https://api.example.com'; -const controller = new AbortController(); -const response = await axios.get('/users', { - signal: controller.signal -}); -``` +## See Also -### Code Adaptation +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal -Adapt code to new API patterns: +### Zed IDE +Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed.md -``` -Migrate our codebase from moment.js to date-fns: +[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. -1. List all moment.js usages in our code -2. Map moment methods to date-fns equivalents -3. Update imports throughout the codebase -4. Handle any edge cases where APIs differ -5. Remove moment.js from dependencies -6. Verify all date handling still works correctly -``` + -**Migration map:** +## Prerequisites -| moment.js | date-fns | Notes | -|-----------|----------|-------| -| `moment()` | `new Date()` | Different return type | -| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | -| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | -| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | +Before configuring Zed, ensure you have: -## Testing and Validation Examples +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **Zed editor** - Download from [zed.dev](https://zed.dev/) -### Automated Test Execution +## Configuration -Run comprehensive tests after upgrades: +### Step 1: Open Agent Settings -``` -After the dependency upgrades, validate the application: +1. Open Zed +2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette +3. Search for `agent: open settings` -1. Run the full test suite (unit, integration, e2e) -2. Check test coverage hasn't decreased -3. Run type checking (if applicable) -4. Run linting with new lint rule versions -5. Build the application for production -6. Report any failures with analysis -``` +![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) -### Integration Testing +### Step 2: Add OpenHands as an Agent -Verify integrations still work: +1. On the right side, click `+ Add Agent` +2. Select `Add Custom Agent` -``` -Test our integrations after upgrading the AWS SDK: +![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) -1. Test S3 operations (upload, download, list) -2. Test DynamoDB operations (CRUD) -3. Test Lambda invocations -4. Test SQS send/receive -5. Compare behavior to before the upgrade -6. Note any subtle differences +### Step 3: Configure the Agent + +Add the following configuration to the `agent_servers` field: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": [ + "openhands", + "acp" + ], + "env": {} + } + } +} ``` -### Regression Detection +### Step 4: Save and Use -Detect regressions from upgrades: +1. Save the settings file +2. You can now use OpenHands within Zed! -``` -Check for regressions after upgrading the ORM: +![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) -1. Run database operation benchmarks -2. Compare query performance before and after -3. Verify all migrations still work -4. Check for any N+1 queries introduced -5. Validate data integrity in test database -6. Document any behavioral changes -``` +## Advanced Configuration -## Additional Examples +### LLM-Approve Mode -### Security-Driven Upgrade +For automatic LLM-based approval of actions: +```json +{ + "agent_servers": { + "OpenHands (LLM Approve)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--llm-approve" + ], + "env": {} + } + } +} ``` -We have a critical security vulnerability in jsonwebtoken. -Current: jsonwebtoken@8.5.1 -Required: jsonwebtoken@9.0.0 +### Resume a Specific Conversation -Perform the upgrade: -1. Check for breaking changes in v9 -2. Find all usages of jsonwebtoken in our code -3. Update any deprecated methods -4. Update the package version -5. Verify all JWT operations work -6. Run security tests +To resume a previous conversation: + +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "abc123def456" + ], + "env": {} + } + } +} ``` -### Framework Major Upgrade +Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. + +### Resume Latest Conversation +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "--last" + ], + "env": {} + } + } +} ``` -Upgrade our Next.js application from 12 to 14: -Key areas to address: -1. App Router migration (pages -> app) -2. New metadata API -3. Server Components by default -4. New Image component -5. Route handlers replacing API routes +### Multiple Configurations -For each area: -- Show current implementation -- Show new implementation -- Test the changes +You can add multiple OpenHands configurations for different use cases: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": ["openhands", "acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "uvx", + "args": ["openhands", "acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "uvx", + "args": ["openhands", "acp", "--resume", "--last"], + "env": {} + } + } +} ``` -### Multi-Package Coordinated Upgrade +## Troubleshooting -``` -Upgrade our React ecosystem packages together: +### Accessing Debug Logs -Current: -- react: 17.0.2 -- react-dom: 17.0.2 -- react-router-dom: 5.3.0 -- @testing-library/react: 12.1.2 +If you encounter issues: -Target: -- react: 18.2.0 -- react-dom: 18.2.0 -- react-router-dom: 6.x -- @testing-library/react: 14.x +1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) +2. Type and select `acp debug log` +3. Review the logs for errors or warnings +4. Restart the conversation to reload connections after configuration changes -Create an upgrade plan that handles all these together, -addressing breaking changes in the correct order. +### Common Issues + +**"openhands" command not found** + +Ensure OpenHands is installed and in your PATH: +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands ``` -## Related Resources +If using `uvx`, ensure uv is installed: +```bash +uv --version +``` -- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities -- [Security Guide](/sdk/guides/security) - Security best practices for AI agents -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +**Agent doesn't start** +1. Check that your LLM settings are configured: run `openhands` and verify `/settings` +2. Verify the configuration JSON syntax is valid +3. Check the ACP debug logs for detailed errors -# Incident Triage -Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage +**Conversation doesn't persist** -When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). +Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. -This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). +After making configuration changes, restart the conversation in Zed to apply them. -## Overview +## See Also -Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs -What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. +### Installation +Source: https://docs.openhands.dev/openhands/usage/cli/installation.md -OpenHands accelerates incident response by: + +**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. + -- **Automated error analysis**: AI agents investigate errors and provide detailed reports -- **Root cause identification**: Connect symptoms to underlying issues in your codebase -- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues -- **Integration with monitoring tools**: Work directly with platforms like Datadog +## Installation Methods -## Automated Datadog Error Analysis + + + Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. -The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. + **Install OpenHands:** + ```bash + uv tool install openhands --python 3.12 + ``` -### How It Works + **Run OpenHands:** + ```bash + openhands + ``` -[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. + **Upgrade OpenHands:** + ```bash + uv tool upgrade openhands --python 3.12 + ``` + + + Install the OpenHands CLI binary with the install script: -[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. + ```bash + curl -fsSL https://install.openhands.dev/install.sh | sh + ``` -### Triggering Automated Debugging + Then run: + ```bash + openhands + ``` -The GitHub Actions workflow can be triggered in two ways: + + Your system may require you to allow permissions to run the executable. -1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. + + When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple + cannot check it for malicious software." -2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. + 1. Open `System Settings`. + 2. Go to `Privacy & Security`. + 3. Scroll down to `Security` and click `Allow Anyway`. + 4. Rerun the OpenHands CLI. -### Automated Investigation Process + ![mac-security](/openhands/static/img/cli-security-mac.png) -When the workflow runs, it automatically performs the following steps: + + + + + 1. Set the following environment variable in your terminal: + - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) -1. Get detailed info from the DataDog API -2. Create or find an existing GitHub issue to track the error -3. Clone all relevant repositories to get full code context -4. Run an OpenHands agent to analyze the error and investigate the code -5. Post the findings as a comment on the GitHub issue + 2. Ensure you have configured your settings before starting: + - Set up `~/.openhands/settings.json` with your LLM configuration -The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. + 3. Run the following command: - -The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. - + ```bash + docker run -it \ + --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e SANDBOX_USER_ID=$(id -u) \ + -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/root/.openhands \ + --add-host host.docker.internal:host-gateway \ + --name openhands-cli-$(date +%Y%m%d%H%M%S) \ + python:3.12-slim \ + bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" + ``` -## Setting Up the Workflow + The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's + permissions. This prevents the agent from creating root-owned files in the mounted workspace. + + -To set up automated Datadog debugging in your own repository: +## First Run -1. Copy the workflow file to `.github/workflows/` in your repository -2. Configure the required secrets (Datadog API keys, LLM API key) -3. Customize the default queries and repository lists for your needs -4. Run the workflow manually or set up scheduled runs +The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved +for future sessions in `~/.openhands/settings.json`. -The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. +The conversation history will be saved in `~/.openhands/conversations`. -Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. + +If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the +configuration format has changed. + -## Manual Incident Investigation +## Next Steps -You can also use OpenHands directly to investigate incidents without the automated workflow. +- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers -### Log Analysis +### MCP Servers +Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md -OpenHands can analyze logs to identify patterns and anomalies: +## Overview -``` -Analyze these application logs for the incident that occurred at 14:32 UTC: +[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. -1. Identify the first error or warning that appeared -2. Trace the sequence of events leading to the failure -3. Find any correlated errors across services -4. Identify the user or request that triggered the issue -5. Summarize the timeline of events -``` +The CLI provides two ways to manage MCP servers: +1. **CLI commands** (`openhands mcp`) - Manage servers from the command line +2. **Interactive command** (`/mcp`) - View server status within a conversation -**Log analysis capabilities:** + +If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. + -| Log Type | Analysis Capabilities | -|----------|----------------------| -| Application logs | Error patterns, exception traces, timing anomalies | -| Access logs | Traffic patterns, slow requests, error responses | -| System logs | Resource exhaustion, process crashes, system errors | -| Database logs | Slow queries, deadlocks, connection issues | +## MCP Commands -### Stack Trace Analysis +### List Servers -Deep dive into stack traces: +View all configured MCP servers: +```bash +openhands mcp list ``` -Analyze this stack trace from our production error: - -[paste full stack trace] -1. Identify the exception type and message -2. Trace back to our code (not framework code) -3. Identify the likely cause -4. Check if this code path has changed recently -5. Suggest a fix -``` +### Get Server Details -**Multi-language support:** +View details for a specific server: - - - ``` - Analyze this Java exception: - - java.lang.OutOfMemoryError: Java heap space - at java.util.Arrays.copyOf(Arrays.java:3210) - at java.util.ArrayList.grow(ArrayList.java:265) - at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) - - Identify: - 1. What operation is consuming memory? - 2. Is there a memory leak or just too much data? - 3. What's the fix? - ``` - - - ``` - Analyze this Python traceback: - - Traceback (most recent call last): - File "app/api/orders.py", line 45, in create_order - order = OrderService.create(data) - File "app/services/order.py", line 89, in create - inventory.reserve(item_id, quantity) - AttributeError: 'NoneType' object has no attribute 'reserve' - - What's None and why? - ``` - - - ``` - Analyze this Node.js error: - - TypeError: Cannot read property 'map' of undefined - at processItems (/app/src/handlers/items.js:23:15) - at async handleRequest (/app/src/api/router.js:45:12) - - What's undefined and how should we handle it? - ``` - - +```bash +openhands mcp get +``` -### Root Cause Analysis +### Remove a Server -Identify the underlying cause of an incident: +Remove a server configuration: +```bash +openhands mcp remove ``` -Perform root cause analysis for this incident: -Symptoms: -- API response times increased 5x at 14:00 -- Error rate jumped from 0.1% to 15% -- Database CPU spiked to 100% +### Enable/Disable Servers -Available data: -- Application metrics (Grafana dashboard attached) -- Recent deployments: v2.3.1 deployed at 13:45 -- Database slow query log (attached) +Control which servers are active: -Identify the root cause using the 5 Whys technique. +```bash +# Enable a server +openhands mcp enable + +# Disable a server +openhands mcp disable ``` -## Common Incident Patterns +## Adding Servers -OpenHands can recognize and help diagnose these common patterns: +### HTTP/SSE Servers -- **Connection pool exhaustion**: Increasing connection errors followed by complete failure -- **Memory leaks**: Gradual memory increase leading to OOM -- **Cascading failures**: One service failure triggering others -- **Thundering herd**: Simultaneous requests overwhelming a service -- **Split brain**: Inconsistent state across distributed components +Add remote servers with HTTP or SSE transport: -## Quick Fix Generation +```bash +openhands mcp add --transport http +``` -Once the root cause is identified, generate fixes: +#### With Bearer Token Authentication +```bash +openhands mcp add my-api --transport http \ + --header "Authorization: Bearer your-token" \ + https://api.example.com/mcp ``` -We've identified the root cause: a missing null check in OrderProcessor.java line 156. -Generate a fix that: -1. Adds proper null checking -2. Logs when null is encountered -3. Returns an appropriate error response -4. Includes a unit test for the edge case -5. Is minimally invasive for a hotfix +#### With API Key Authentication + +```bash +openhands mcp add weather-api --transport http \ + --header "X-API-Key: your-api-key" \ + https://weather.api.com ``` -## Best Practices +#### With Multiple Headers -### Investigation Checklist +```bash +openhands mcp add secure-api --transport http \ + --header "Authorization: Bearer token123" \ + --header "X-Client-ID: client456" \ + https://api.example.com +``` -Use this checklist when investigating: +#### With OAuth Authentication -1. **Scope the impact** - - How many users affected? - - What functionality is broken? - - What's the business impact? +```bash +openhands mcp add notion-server --transport http \ + --auth oauth \ + https://mcp.notion.com/mcp +``` -2. **Establish timeline** - - When did it start? - - What changed around that time? - - Is it getting worse or stable? +### Stdio Servers -3. **Gather data** - - Application logs - - Infrastructure metrics - - Recent deployments - - Configuration changes +Add local servers that communicate via stdio: -4. **Form hypotheses** - - List possible causes - - Rank by likelihood - - Test systematically +```bash +openhands mcp add --transport stdio -- [args...] +``` -5. **Implement fix** - - Choose safest fix - - Test before deploying - - Monitor after deployment +#### Basic Example -### Common Pitfalls +```bash +openhands mcp add local-server --transport stdio \ + python -- -m my_mcp_server +``` - -Avoid these common incident response mistakes: +#### With Environment Variables -- **Jumping to conclusions**: Gather data before assuming the cause -- **Changing multiple things**: Make one change at a time to isolate effects -- **Not documenting**: Record all actions for the post-mortem -- **Ignoring rollback**: Always have a rollback plan before deploying fixes - +```bash +openhands mcp add local-server --transport stdio \ + --env "API_KEY=secret123" \ + --env "DATABASE_URL=postgresql://localhost/mydb" \ + python -- -m my_mcp_server --config config.json +``` - -For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. - +#### Add in Disabled State -## Related Resources +```bash +openhands mcp add my-server --transport stdio --disabled \ + node -- my-server.js +``` -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +### Command Reference +```bash +openhands mcp add --transport [options] [-- args...] +``` -# Spark Migrations -Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | -Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. +## Example: Web Search with Tavily -## Overview +Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): -Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. +```bash +openhands mcp add tavily --transport stdio \ + npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" +``` -Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. +## Manual Configuration -Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. +You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. -Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: +### Configuration Format -1. **Understanding**: Analyze the existing codebase to identify what needs to change and why -2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences -3. **Validation**: Verify that migrated pipelines produce identical results to the originals +The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): -In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. +```json +{ + "mcpServers": { + "server-name": { + "command": "command-to-run", + "args": ["arg1", "arg2"], + "env": { + "ENV_VAR": "value" + } + } + } +} +``` -## Understanding +### Example Configuration -Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. +```json +{ + "mcpServers": { + "tavily-remote": { + "command": "npx", + "args": [ + "-y", + "mcp-remote", + "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" + ] + }, + "local-tools": { + "command": "python", + "args": ["-m", "my_mcp_tools"], + "env": { + "DEBUG": "true" + } + } + } +} +``` -Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. +## Interactive `/mcp` Command -If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: +Within an OpenHands conversation, use `/mcp` to view server status: -``` -Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. +- **View active servers**: Shows which MCP servers are currently active in the conversation +- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts -Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. + +The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. + -Then, for each source file, identify +## Workflow -1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) -2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) -3. Configuration properties that have changed defaults or been renamed -4. Dependencies that need version updates +1. **Add servers** using `openhands mcp add` +2. **Start a conversation** with `openhands` +3. **Check status** with `/mcp` inside the conversation +4. **Use the tools** provided by your MCP servers -Save the results in `migration_inventory.json` in the following format: +The agent will automatically have access to tools provided by enabled MCP servers. -{ - ..., - "src/main/scala/etl/TransformJob.scala": { - "deprecated_apis": [ - {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} - ], - "behavioral_changes": [ - {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} - ], - "config_changes": [], - "risk": "medium" - }, - ... -} -``` +## Troubleshooting -Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. +### Server Not Appearing -## Migration +1. Verify the server is enabled: + ```bash + openhands mcp list + ``` -With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. +2. Check the configuration: + ```bash + openhands mcp get + ``` -To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. +3. Restart the conversation to load new configurations -The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. +### Server Fails to Start -An example prompt is below: +1. Test the command manually: + ```bash + # For stdio servers + python -m my_mcp_server + + # For HTTP servers, check the URL is reachable + curl https://api.example.com/mcp + ``` -``` -Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. +2. Check environment variables and credentials -Use `migration_inventory.json` to guide the changes. +3. Review error messages in the CLI output -For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. +### Configuration File Location -Requirements: -1. Replace all deprecated APIs with their Spark 3.0 equivalents -2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) -3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions -4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical -5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists -6. Update import statements for any relocated classes -7. Preserve all existing business logic and output schemas -``` +The MCP configuration is stored at: +- **Config file**: `~/.openhands/mcp.json` -Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. +## See Also -## Validation +- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation +- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference -Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. +### Quick Start +Source: https://docs.openhands.dev/openhands/usage/cli/quick-start.md -The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. + +**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. + -An example prompt for data-level comparison: +## Overview -``` -Validate the migrated Spark application in `/src` against the original. +The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: -1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` -2. Compare outputs: - - Row counts must match exactly - - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns - - Flag any NULL handling differences -3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments -4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory +| Mode | Command | Best For | +|------|---------|----------| +| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | +| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | +| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | +| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | +| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | -Save the results in `validation_report.json` in the following format: + -{ - "jobs": [ - { - "name": "daily_etl", - "data_match": true, - "row_count": {"v2": 1000000, "v3": 1000000}, - "column_diffs": [], - "performance": { - "duration_seconds": {"v2": 340, "v3": 285}, - "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} - } - }, - ... - ] -} -``` +## Your First Conversation -Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. +**Set up your account** (first time only): -Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. + + + ```bash + openhands login + ``` + This authenticates with OpenHands Cloud and fetches your settings. + + + The CLI will prompt you to configure your LLM provider and API key on first run. + + -## Beyond Version Upgrades +1. **Start the CLI:** + ```bash + openhands + ``` -While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: +2. **Enter a task:** + ``` + Create a Python script that prints "Hello, World!" + ``` -- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. -- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. +3. **Watch OpenHands work:** + The agent will create the file and show you the results. -In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. +## Controls -## Related Resources +Once inside the CLI, use these controls: -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +| Control | Description | +|---------|-------------| +| `Ctrl+P` | Open command palette (access Settings, MCP status) | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | +## Starting with a Task -# Vulnerability Remediation -Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation +You can start the CLI with an initial task: -Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. +```bash +# Start with a task +openhands -t "Fix the bug in auth.py" -## The Challenge +# Start with a task from a file +openhands -f task.txt +``` -The traditional approach to vulnerability remediation is manual and time-consuming: +## Resuming Conversations -1. Scan repositories for vulnerabilities -2. Review each vulnerability and its impact -3. Research the fix (usually a version upgrade) -4. Update dependency files -5. Test the changes -6. Create pull requests -7. Get reviews and merge +Resume a previous conversation: -This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. +```bash +# List recent conversations and select one +openhands --resume -**What if we could automate this entire process using AI agents?** +# Resume the most recent conversation +openhands --resume --last -## Automated Vulnerability Remediation with OpenHands +# Resume a specific conversation by ID +openhands --resume abc123def456 +``` -The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. +For more details, see [Resume Conversations](/openhands/usage/cli/resume). -OpenHands assists with vulnerability remediation by: +## Next Steps -- **Identifying vulnerabilities**: Analyzing code for common security issues -- **Understanding impact**: Explaining the risk and exploitation potential -- **Implementing fixes**: Generating secure code to address vulnerabilities -- **Validating remediation**: Verifying fixes are effective and complete + + + Learn about the interactive terminal interface + + + Use OpenHands in Zed, VS Code, or JetBrains + + + Automate tasks with scripting + + + Add tools via Model Context Protocol + + -## Two Approaches to Vulnerability Fixing +### Resume Conversations +Source: https://docs.openhands.dev/openhands/usage/cli/resume.md -### 1. Point to a GitHub Repository +## Overview -Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. +OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. -### 2. Upload Security Scanner Reports +## Listing Previous Conversations -Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. +To see a list of your recent conversations, run: -This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. +```bash +openhands --resume +``` -## Architecture Overview +This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: -A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py -The key architectural components include: + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service -- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) -- **WebSocket interface**: Enables real-time status updates on agent actions and operations -- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider -- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud + 3. mno345pqr678 (3 days ago) + Refactor the database connection module +-------------------------------------------------------------------------------- +To resume a conversation, use: openhands --resume +``` -This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. +## Resuming a Specific Conversation -## Example: Vulnerability Fixer Application +To resume a specific conversation, use the `--resume` flag with the conversation ID: -An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: +```bash +openhands --resume +``` -1. User points to a repository or uploads a security scan report -2. Agent analyzes the vulnerabilities -3. Agent creates fixes and pull requests automatically -4. User reviews and merges the changes +For example: -## Security Scanning Integration +```bash +openhands --resume abc123def456 +``` -Use OpenHands to analyze security scanner output: +## Resuming the Latest Conversation + +To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: +```bash +openhands --resume --last ``` -We ran a security scan and found these issues. Analyze each one: -1. SQL Injection in src/api/users.py:45 -2. XSS in src/templates/profile.html:23 -3. Hardcoded credential in src/config/database.py:12 -4. Path traversal in src/handlers/files.py:67 +This automatically finds and resumes the most recent conversation. -For each vulnerability: -- Explain what the vulnerability is -- Show how it could be exploited -- Rate the severity (Critical/High/Medium/Low) -- Suggest a fix +## How It Works + +When you resume a conversation: + +1. OpenHands loads the full conversation history from disk +2. The agent has access to all previous context, including: + - Your previous messages and requests + - The agent's responses and actions + - Any files that were created or modified +3. You can continue the conversation as if you never left + + +The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. + + +## Resuming in Different Modes + +### Terminal Mode + +```bash +openhands --resume abc123def456 +openhands --resume --last ``` -## Common Vulnerability Patterns +### ACP Mode (IDEs) -OpenHands can detect these common vulnerability patterns: +```bash +openhands acp --resume abc123def456 +openhands acp --resume --last +``` -| Vulnerability | Pattern | Example | -|--------------|---------|---------| -| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | -| XSS | Unescaped user input in HTML | `
${user_comment}
` | -| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | -| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | -| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | +For IDE-specific configurations, see: +- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) +- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) +- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) -## Automated Remediation +### With Confirmation Modes -### Applying Security Patches +Combine `--resume` with confirmation mode flags: -Fix identified vulnerabilities: +```bash +# Resume with LLM-based approval +openhands --resume abc123def456 --llm-approve - - - ``` - Fix the SQL injection vulnerability in src/api/users.py: - - Current code: - query = f"SELECT * FROM users WHERE id = {user_id}" - cursor.execute(query) - - Requirements: - 1. Use parameterized queries - 2. Add input validation - 3. Maintain the same functionality - 4. Add a test case for the fix - ``` - - **Fixed code:** - ```python - # Using parameterized query - query = "SELECT * FROM users WHERE id = %s" - cursor.execute(query, (user_id,)) - ``` - - - ``` - Fix the XSS vulnerability in src/templates/profile.html: - - Current code: -
${user.bio}
- - Requirements: - 1. Properly escape user content - 2. Consider Content Security Policy - 3. Handle rich text if needed - 4. Test with malicious input - ``` - - **Fixed code:** - ```html - -
{{ user.bio | escape }}
- ``` -
- - ``` - Fix the command injection in src/utils/network.py: - - Current code: - def ping_host(hostname): - os.system(f"ping -c 1 {hostname}") - - Requirements: - 1. Use safe subprocess calls - 2. Validate input format - 3. Avoid shell=True - 4. Handle errors properly - ``` - - **Fixed code:** - ```python - import subprocess - import re - - def ping_host(hostname): - # Validate hostname format - if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): - raise ValueError("Invalid hostname") - - # Use subprocess without shell - result = subprocess.run( - ["ping", "-c", "1", hostname], - capture_output=True, - text=True - ) - return result.returncode == 0 - ``` - -
+# Resume with auto-approve +openhands --resume --last --always-approve +``` -### Code-Level Vulnerability Fixes +## Tips -Fix application-level security issues: + +**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. + -``` -Fix the broken access control in our API: + +**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. + -Issue: Users can access other users' data by changing the ID in the URL. +## Storage Location -Current code: -@app.get("/api/users/{user_id}/documents") -def get_documents(user_id: int): - return db.get_documents(user_id) +Conversations are stored in: -Requirements: -1. Add authorization check -2. Verify requesting user matches or is admin -3. Return 403 for unauthorized access -4. Log access attempts -5. Add tests for authorization ``` - -**Fixed code:** - -```python -@app.get("/api/users/{user_id}/documents") -def get_documents(user_id: int, current_user: User = Depends(get_current_user)): - # Check authorization - if current_user.id != user_id and not current_user.is_admin: - logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") - raise HTTPException(status_code=403, detail="Not authorized") - - return db.get_documents(user_id) +~/.openhands/conversations/ +├── abc123def456/ +│ └── conversation.json +├── xyz789ghi012/ +│ └── conversation.json +└── ... ``` -## Security Testing - -Test your fixes thoroughly: +## See Also -``` -Create security tests for the SQL injection fix: +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference -1. Test with normal input -2. Test with SQL injection payloads: - - ' OR '1'='1 - - '; DROP TABLE users; -- - - UNION SELECT * FROM passwords -3. Test with special characters -4. Test with null/empty input -5. Verify error handling doesn't leak information -``` +### Terminal (CLI) +Source: https://docs.openhands.dev/openhands/usage/cli/terminal.md -## Automated Remediation Pipeline +## Overview -Create an end-to-end automated pipeline: +The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. +```bash +openhands ``` -Create an automated vulnerability remediation pipeline: -1. Parse Snyk/Dependabot/CodeQL alerts -2. Categorize by severity and type -3. For each vulnerability: - - Create a branch - - Apply the fix - - Run tests - - Create a PR with: - - Description of vulnerability - - Fix applied - - Test results -4. Request review from security team -5. Auto-merge low-risk fixes after tests pass -``` +## Features -## Building Your Own Vulnerability Fixer +- **Real-time interaction**: Type natural language tasks and receive instant feedback +- **Live status monitoring**: Watch the agent's progress as it works +- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more -The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. +## Command Palette -To build your own vulnerability remediation agent: +Press `Ctrl+P` to open the command palette, then select from the dropdown options: -1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent -2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) -3. Configure the agent to create pull requests automatically -4. Set up human review workflows for critical fixes +| Option | Description | +|--------|-------------| +| **Settings** | Open the settings configuration menu | +| **MCP** | View MCP server status | -As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. +## Controls -## Related Resources +| Control | Action | +|---------|--------| +| `Ctrl+P` | Open command palette | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | -- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example -- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents -- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +## Starting with a Task +Start a conversation with an initial task: -# Windows Without WSL -Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl +```bash +# Provide a task directly +openhands -t "Create a REST API for user management" - - This way of running OpenHands is not officially supported. It is maintained by the community and may not work. - +# Load task from a file +openhands -f requirements.txt +``` -# Running OpenHands GUI on Windows Without WSL +## Confirmation Modes -This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. +Control how the agent requests approval for actions: -## Prerequisites +```bash +# Default: Always ask for confirmation +openhands -1. **Windows 10/11** - A modern Windows operating system -2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) -3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet -4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) -5. **Git** - For cloning the repository and version control -6. **Node.js and npm** - For running the frontend +# Auto-approve all actions (use with caution) +openhands --always-approve -## Step 1: Install Required Software +# Use LLM-based security analyzer +openhands --llm-approve +``` -1. **Install Python 3.12 or 3.13** - - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) - - During installation, check "Add Python to PATH" - - Verify installation by opening PowerShell and running: - ```powershell - python --version - ``` +## Resuming Conversations -2. **Install PowerShell 7** - - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) - - Choose the MSI installer appropriate for your system (x64 for most modern computers) - - Run the installer with default options - - Verify installation by opening a new terminal and running: - ```powershell - pwsh --version - ``` - - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors +Resume previous conversations: -3. **Install .NET Core Runtime** - - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) - - Choose the latest .NET Core Runtime (not SDK) - - Verify installation by opening PowerShell and running: - ```powershell - dotnet --info - ``` - - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. +```bash +# List recent conversations +openhands --resume -4. **Install Git** - - Download Git from [git-scm.com](https://git-scm.com/download/win) - - Use default installation options - - Verify installation: - ```powershell - git --version - ``` +# Resume the most recent +openhands --resume --last -5. **Install Node.js and npm** - - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) - - During installation, accept the default options which will install npm as well - - Verify installation: - ```powershell - node --version - npm --version - ``` +# Resume a specific conversation +openhands --resume abc123def456 +``` -6. **Install Poetry** - - Open PowerShell as Administrator and run: - ```powershell - (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - - ``` - - Add Poetry to your PATH: - ```powershell - $env:Path += ";$env:APPDATA\Python\Scripts" - ``` - - Verify installation: - ```powershell - poetry --version - ``` +For more details, see [Resume Conversations](/openhands/usage/cli/resume). -## Step 2: Clone and Set Up OpenHands +## Tips -1. **Clone the Repository** - ```powershell - git clone https://github.com/OpenHands/OpenHands.git - cd OpenHands - ``` + +Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. + -2. **Install Dependencies** - ```powershell - poetry install - ``` + +Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. + - This will install all required dependencies, including: - - pythonnet - Required for Windows PowerShell integration - - All other OpenHands dependencies +## See Also -## Step 3: Run OpenHands +- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation -1. **Build the Frontend** - ```powershell - cd frontend - npm install - npm run build - cd .. - ``` +### Web Interface +Source: https://docs.openhands.dev/openhands/usage/cli/web-interface.md - This will build the frontend files that the backend will serve. +## Overview -2. **Start the Backend** - ```powershell - # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell - pwsh - $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" - ``` +The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: +- Access the CLI remotely +- Share your terminal session +- Use the CLI on devices without a full terminal - This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. +```bash +openhands web +``` - > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. + +This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. + - > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. +## Basic Usage -3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** - ```powershell - cd frontend - npm run dev - ``` +```bash +# Start on default port (12000) +openhands web -4. **Access the OpenHands GUI** +# Access at http://localhost:12000 +``` - Open your browser and navigate to: - ``` - http://localhost:3000 - ``` +## Options - > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host address to bind to | +| `--port` | `12000` | Port number to use | +| `--debug` | `false` | Enable debug mode | -## Installing and Running the CLI +## Examples -To install and run the OpenHands CLI on Windows without WSL, follow these steps: +```bash +# Custom port +openhands web --port 8080 -### 1. Install uv (Python Package Manager) +# Bind to localhost only (more secure) +openhands web --host 127.0.0.1 -Open PowerShell as Administrator and run: +# Enable debug mode +openhands web --debug -```powershell -powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +# Full example with custom host and port +openhands web --host 0.0.0.0 --port 3000 ``` -### 2. Install .NET SDK (Required) - -The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: +## Remote Access -```powershell -winget install Microsoft.DotNet.SDK.8 -``` +To access the web interface from another machine: -Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). +1. Start with `--host 0.0.0.0` to bind to all interfaces: + ```bash + openhands web --host 0.0.0.0 --port 12000 + ``` -After installation, restart your PowerShell session to ensure the environment variables are updated. +2. Access from another machine using the host's IP: + ``` + http://:12000 + ``` -### 3. Install and Run OpenHands + +When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. + -After installing the prerequisites, install OpenHands with: +## Use Cases -```powershell -uv tool install openhands --python 3.12 -``` +### Development on Remote Servers -Then run OpenHands: +Access OpenHands on a remote development server through your local browser: -```powershell -openhands -``` +```bash +# On remote server +openhands web --host 0.0.0.0 --port 12000 -To upgrade OpenHands in the future: +# On local machine, use SSH tunnel +ssh -L 12000:localhost:12000 user@remote-server -```powershell -uv tool upgrade openhands --python 3.12 +# Access at http://localhost:12000 ``` -### Troubleshooting CLI Issues - -#### CoreCLR Error - -If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: - -1. Install the .NET SDK as described in step 2 above -2. Verify that your system PATH includes the .NET SDK directories -3. Restart your PowerShell session completely after installing the .NET SDK -4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell +### Sharing Sessions -To verify your .NET installation, run: +Run the web interface on a shared server for team access: -```powershell -dotnet --info +```bash +openhands web --host 0.0.0.0 --port 8080 ``` -This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. +## Comparison: Web Interface vs GUI Server -If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). +| Feature | `openhands web` | `openhands serve` | +|---------|-----------------|-------------------| +| Interface | Terminal UI in browser | Full web GUI | +| Dependencies | None | Docker required | +| Resources | Lightweight | Full container | +| Best for | Quick access | Rich GUI experience | -## Limitations on Windows +## See Also -When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage +- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options -1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. +## OpenHands Software Agent SDK -2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. +### Software Agent SDK +Source: https://docs.openhands.dev/sdk.md -3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. +The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. -4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. +You can use the OpenHands Software Agent SDK for: -## Troubleshooting +- One-off tasks, like building a README for your repo +- Routine maintenance tasks, like updating dependencies +- Major tasks that involve multiple agents, like refactors and rewrites -### "System.Management.Automation" Not Found Error +You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). -If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. +Get started with some examples or keep reading to learn more. -> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. +## Features -To resolve this issue: + + + A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. + + + Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. + + + A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. + + -1. **Install the latest version of PowerShell 7** from the official Microsoft repository: - - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) - - Download and install the latest MSI package for your system architecture (x64 for most systems) - - During installation, ensure you select the following options: - - "Add PowerShell to PATH environment variable" - - "Register Windows PowerShell 7 as the default shell" - - "Enable PowerShell remoting" - - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default +## Why OpenHands Software Agent SDK? -2. **Restart your terminal or command prompt** to ensure the new PowerShell is available +### Emphasis on coding -3. **Verify the installation** by running: - ```powershell - pwsh --version - ``` +While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. - You should see output indicating PowerShell 7.x.x +While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. -4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: - ```powershell - pwsh - cd path\to\openhands - $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" - ``` +### State-of-the-Art Performance - > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". +OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: -5. **If the issue persists**, ensure that you have the .NET Runtime installed: - - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) - - Choose ".NET Runtime" (not SDK) version 6.0 or later - - After installation, verify it's properly installed by running: - ```powershell - dotnet --info - ``` - - Restart your computer after installation - - Try running OpenHands again +- Task planning and decomposition +- Automatic context compression +- Security analysis +- Strong agent-computer interfaces -6. **Ensure that the .NET Framework is properly installed** on your system: - - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off - - Make sure ".NET Framework 4.8 Advanced Services" is enabled - - Click OK and restart if prompted +OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. -This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. +### Free and Open Source +OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. -# Community -Source: https://docs.openhands.dev/overview/community +Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! -# The OpenHands Community +## Get Started -OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered world. + + + Install the SDK, run your first agent, and explore the guides. + + -## Mission +## Learn the SDK -It's very clear that AI is changing software development. We want the developer community to drive that change organically, through open source. - -So we're not just building friendly interfaces for AI-driven development. We're publishing _building blocks_ that empower developers to create new experiences, tailored to your own habits, needs, and imagination. - -## Ethos + + + Understand the SDK's architecture: agents, tools, workspaces, and more. + + + Explore the complete SDK API and source code. + + -We have two core values: **high openness** and **high agency**. While we don't expect everyone in the community to embody these values, we want to establish them as norms. +## Build with Examples -### High Openness + + + Build local agents with custom tools and capabilities. + + + Run agents on remote servers with Docker sandboxing. + + + Automate repository tasks with agent-powered workflows. + + -We welcome anyone and everyone into our community by default. You don't have to be a software developer to help us build. You don't have to be pro-AI to help us learn. +## Community -Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the fruits of our work, but the whole process of growing it. + + + Connect with the OpenHands community on Slack. + + + Contribute to the SDK or report issues on GitHub. + + -We welcome thoughtful criticism, whether it's a comment on a PR or feedback on the community as a whole. +### openhands.sdk.agent +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md -### High Agency +### class Agent -Everyone should feel empowered to contribute to OpenHands. Whether it's by making a PR, hosting an event, sharing feedback, or just asking a question, don't hold back! +Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) -OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly and love building new things. +Main agent implementation for OpenHands. -Coding, development practices, and communities are changing rapidly. We won't hesitate to change direction and make big bets. +The Agent class provides the core functionality for running AI agents that can +interact with tools, process messages, and execute actions. It inherits from +AgentBase and implements the agent execution logic. Critic-related functionality +is provided by CriticMixin. -## Relationship to All Hands +#### Example -OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.all-hands.dev/). +```pycon +>>> from openhands.sdk import LLM, Agent, Tool +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] +>>> agent = Agent(llm=llm, tools=tools) +``` -All Hands was founded by three of the first major contributors to OpenHands: -- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards -- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands -- Robert Brennan, a software engineer who architected the user-facing features of OpenHands +#### Properties -All Hands is an important part of the OpenHands ecosystem. We've raised over $20M—mainly to hire developers and researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/)) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we promise to navigate that conflict thoughtfully and transparently. +#### Methods -At some point, we may transfer custody of OpenHands to an open source foundation. But for now, the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the "benevolent" part, please: fork us. +#### init_state() +Initialize conversation state. -# Contributing -Source: https://docs.openhands.dev/overview/contributing +Invariants enforced by this method: +- If a SystemPromptEvent is already present, it must be within the first 3 -# Contributing to OpenHands + events (index 0 or 1 in practice; index 2 is included in the scan window + to detect a user message appearing before the system prompt). +- A user MessageEvent should not appear before the SystemPromptEvent. -Welcome to the OpenHands community! We're building the future of AI-powered software development, and we'd love for you to be part of this journey. +These invariants keep event ordering predictable for downstream components +(condenser, UI, etc.) and also prevent accidentally materializing the full +event history during initialization. -## Our Vision: Free as in Freedom +#### model_post_init() -The OpenHands community is built around the belief that **AI and AI agents are going to fundamentally change the way we build software**, and if this is true, we should do everything we can to make sure that the benefits provided by such powerful technology are **accessible to everyone**. +This function is meant to behave like a BaseModel method to initialise private attributes. -We believe in the power of open source to democratize access to cutting-edge AI technology. Just as the internet transformed how we share information, we envision a world where AI-powered development tools are available to every developer, regardless of their background or resources. +It takes context as an argument since that’s what pydantic-core passes when calling it. -If this resonates with you, we'd love to have you join us in our quest! +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -## What Can You Build? +#### step() -There are countless ways to contribute to OpenHands. Whether you're a seasoned developer, a researcher, a designer, or someone just getting started, there's a place for you in our community. +Taking a step in the conversation. -### Frontend & UI/UX -Make OpenHands more beautiful and user-friendly: -- **React & TypeScript Development** - Improve the web interface -- **UI/UX Design** - Enhance user experience and accessibility -- **Mobile Responsiveness** - Make OpenHands work great on all devices -- **Component Libraries** - Build reusable UI components +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with -*Small fixes are always welcome! For bigger changes, join our **#eng-ui-ux** channel in [Slack](https://openhands.dev/joinslack) first.* + LLM calls (role=”assistant”) and tool results (role=”tool”) -### Agent Development -Help make our AI agents smarter and more capable: -- **Prompt Engineering** - Improve how agents understand and respond -- **New Agent Types** - Create specialized agents for different tasks -- **Agent Evaluation** - Develop better ways to measure agent performance -- **Multi-Agent Systems** - Enable agents to work together +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step -*We use [SWE-bench](https://www.swebench.com/) to evaluate our agents. Join our [Slack](https://openhands.dev/joinslack) to learn more.* +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. -### Backend & Infrastructure -Build the foundation that powers OpenHands: -- **Python Development** - Core functionality and APIs -- **Runtime Systems** - Docker containers and sandboxes -- **Cloud Integrations** - Support for different cloud providers -- **Performance Optimization** - Make everything faster and more efficient +NOTE: state will be mutated in-place. -### Testing & Quality Assurance -Help us maintain high quality: -- **Unit Testing** - Write tests for new features -- **Integration Testing** - Ensure components work together -- **Bug Hunting** - Find and report issues -- **Performance Testing** - Identify bottlenecks and optimization opportunities +### class AgentBase -### Documentation & Education -Help others learn and contribute: -- **Technical Documentation** - API docs, guides, and tutorials -- **Video Tutorials** - Create learning content -- **Translation** - Make OpenHands accessible in more languages -- **Community Support** - Help other users and contributors +Bases: `DiscriminatedUnionMixin`, `ABC` -### Research & Innovation -Push the boundaries of what's possible: -- **Academic Research** - Publish papers using OpenHands -- **Benchmarking** - Develop new evaluation methods -- **Experimental Features** - Try cutting-edge AI techniques -- **Data Analysis** - Study how developers use AI tools +Abstract base class for OpenHands agents. -## 🚀 Getting Started +Agents are stateless and should be fully defined by their configuration. +This base class provides the common interface and functionality that all +agent implementations must follow. -Ready to contribute? Here's your path to making an impact: -### 1. Quick Wins -Start with these easy contributions: -- **Use OpenHands** and [report issues](https://github.com/OpenHands/OpenHands/issues) you encounter -- **Give feedback** using the thumbs-up/thumbs-down buttons after each session -- **Star our repository** on [GitHub](https://github.com/OpenHands/OpenHands) -- **Share OpenHands** with other developers +#### Properties -### 2. Set Up Your Development Environment -Follow our setup guide: -- **Requirements**: Linux/Mac/WSL, Docker, Python 3.12, Node.js 22+, Poetry 1.8+ -- **Quick setup**: `make build` to get everything ready -- **Configuration**: `make setup-config` to configure your LLM -- **Run locally**: `make run` to start the application +- `agent_context`: AgentContext | None +- `condenser`: CondenserBase | None +- `critic`: CriticBase | None +- `dynamic_context`: str | None + Get the dynamic per-conversation context. + This returns the context that varies between conversations, such as: + - Repository information and skills + - Runtime information (hosts, working directory) + - User-specific secrets and settings + - Conversation instructions + This content should NOT be included in the cached system prompt to enable + cross-conversation cache sharing. Instead, it is sent as a second content + block (without a cache marker) inside the system message. + * Returns: + The dynamic context string, or None if no context is configured. +- `filter_tools_regex`: str | None +- `include_default_tools`: list[str] +- `llm`: LLM +- `mcp_config`: dict[str, Any] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str + Returns the name of the Agent. +- `prompt_dir`: str + Returns the directory where this class’s module file is located. +- `security_policy_filename`: str +- `static_system_message`: str + Compute the static portion of the system message. + This returns only the base system prompt template without any dynamic + per-conversation context. This static portion can be cached and reused + across conversations for better prompt caching efficiency. + * Returns: + The rendered system prompt template without dynamic context. +- `system_message`: str + Return the combined system message (static + dynamic). +- `system_prompt_filename`: str +- `system_prompt_kwargs`: dict[str, object] +- `tools`: list[Tool] +- `tools_map`: dictstr, [ToolDefinition] + Get the initialized tools map. + :raises RuntimeError: If the agent has not been initialized. -*Full details in our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md)* +#### Methods -### 3. Find Your First Issue -Look for beginner-friendly opportunities: -- Browse [good first issues](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) -- Check our [project boards](https://github.com/OpenHands/OpenHands/projects) for organized tasks -- Ask in [Slack](https://openhands.dev/joinslack) what needs help +#### get_all_llms() -### 4. Join the Community -Connect with other contributors in our [Slack Community](https://openhands.dev/joinslack). You can connect with OpenHands contributors, maintainers, and more! +Recursively yield unique base-class LLM objects reachable from self. -## 📋 How to Contribute Code +- Returns actual object references (not copies). +- De-dupes by id(LLM). +- Cycle-safe via a visited set for all traversed objects. +- Only yields objects whose type is exactly LLM (no subclasses). +- Does not handle dataclasses. -### Understanding the Codebase -Get familiar with our architecture: -- **[Frontend](https://github.com/OpenHands/OpenHands/tree/main/frontend/README.md)** - React application -- **[Backend](https://github.com/OpenHands/OpenHands/tree/main/openhands/README.md)** - Python core -- **[Agents](https://github.com/OpenHands/OpenHands/tree/main/openhands/agenthub/README.md)** - AI agent implementations -- **[Runtime](https://github.com/OpenHands/OpenHands/tree/main/openhands/runtime/README.md)** - Execution environments -- **[Evaluation](https://github.com/OpenHands/benchmarks)** - Testing and benchmarks +#### init_state() -### Pull Request Process -We welcome all pull requests! Here's how we evaluate them: +Initialize the empty conversation state to prepare the agent for user +messages. -#### Small Improvements -- Quick review and approval for obvious improvements -- Make sure CI tests pass -- Include clear description of changes +Typically this involves adding system message -#### Core Agent Changes -We're more careful with agent changes since they affect user experience: -- **Accuracy** - Does it make the agent better at solving problems? -- **Efficiency** - Does it improve speed or reduce resource usage? -- **Code Quality** - Is the code maintainable and well-tested? +NOTE: state will be mutated in-place. -*Discuss major changes in [GitHub issues](https://github.com/OpenHands/OpenHands/issues) or [Slack](https://openhands.dev/joinslack) first!* +#### model_dump_succint() -### Pull Request Guidelines -We recommend the following for smooth reviews but they're not required. Just know that the more you follow these guidelines, the more likely you'll get your PR reviewed faster and reduce the quantity of revisions. +Like model_dump, but excludes None fields by default. -**Title Format:** -- `feat: Add new agent capability` -- `fix: Resolve memory leak in runtime` -- `docs: Update installation guide` -- `style: Fix code formatting` -- `refactor: Simplify authentication logic` -- `test: Add unit tests for parser` +#### model_post_init() -**Description:** -- Explain what the PR does and why -- Link to related issues -- Include screenshots for UI changes -- Add changelog entry for user-facing changes +This function is meant to behave like a BaseModel method to initialise private attributes. -## License +It takes context as an argument since that’s what pydantic-core passes when calling it. -OpenHands is released under the **MIT License**, which means: +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -### You Can: -- **Use** OpenHands for any purpose, including commercial projects -- **Modify** the code to fit your needs -- **Share** your modifications -- **Distribute** or sell copies of OpenHands +#### abstractmethod step() -### You Must: -- **Include** the original copyright notice and license text -- **Preserve** the license in any substantial portions you use +Taking a step in the conversation. -### No Warranty: -- OpenHands is provided "as is" without warranty -- Contributors are not liable for any damages +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with -*Full license text: [LICENSE](https://github.com/OpenHands/OpenHands/blob/main/LICENSE)* + LLM calls (role=”assistant”) and tool results (role=”tool”) -**Special Note:** Content in the `enterprise/` directory has a separate license. See `enterprise/LICENSE` for details. +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step -## Ready to make your first contribution? +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. -1. **⭐ Star** our [GitHub repository](https://github.com/OpenHands/OpenHands) -2. **🔧 Set up** your development environment using our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md) -3. **💬 Join** our [Slack community](https://openhands.dev/joinslack) to meet other contributors -4. **🎯 Find** a [good first issue](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) to work on -5. **📝 Read** our [Code of Conduct](https://github.com/OpenHands/OpenHands/blob/main/CODE_OF_CONDUCT.md) +NOTE: state will be mutated in-place. -## Need Help? +#### Deprecated +Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and +[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation +enables cross-conversation prompt caching. Will be removed in 1.16.0. -Don't hesitate to ask for help: -- **Slack**: [Join our community](https://openhands.dev/joinslack) for real-time support -- **GitHub Issues**: [Open an issue](https://github.com/OpenHands/OpenHands/issues) for bugs or feature requests -- **Email**: Contact us at [contact@openhands.dev](mailto:contact@openhands.dev) +#### WARNING +Using this property DISABLES cross-conversation prompt caching because +it combines static and dynamic content into a single string. Use +[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately +to enable caching. ---- +#### Deprecated +Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. -Thank you for considering contributing to OpenHands! Together, we're building tools that will democratize AI-powered software development and make it accessible to developers everywhere. Every contribution, no matter how small, helps us move closer to that vision. +#### verify() -Welcome to the community! 🎉 +Verify that we can resume this agent from persisted state. +We do not merge configuration between persisted and runtime Agent +instances. Instead, we verify compatibility requirements and then +continue with the runtime-provided Agent. -# FAQs -Source: https://docs.openhands.dev/overview/faqs +Compatibility requirements: +- Agent class/type must match. +- Tools must match exactly (same tool names). -## Getting Started +Tools are part of the system prompt and cannot be changed mid-conversation. +To use different tools, start a new conversation or use conversation forking +(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). -### I'm new to OpenHands. Where should I start? +All other configuration (LLM, agent_context, condenser, etc.) can be +freely changed between sessions. -1. **Quick start**: Use [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) to get started quickly with - [GitHub](/openhands/usage/cloud/github-installation), [GitLab](/openhands/usage/cloud/gitlab-installation), - [Bitbucket](/openhands/usage/cloud/bitbucket-installation), - and [Slack](/openhands/usage/cloud/slack-installation) integrations. -2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/openhands/usage/run-openhands/local-setup). -3. **First steps**: Read over the [first projects guidelines](/overview/first-projects) and - [prompting best practices](/openhands/usage/tips/prompting-best-practices) to learn the basics. +* Parameters: + * `persisted` – The agent loaded from persisted state. + * `events` – Unused, kept for API compatibility. +* Returns: + This runtime agent (self) if verification passes. +* Raises: + `ValueError` – If agent class or tools don’t match. -### Can I use OpenHands for production workloads? +### openhands.sdk.conversation +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md -OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant -deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability. +### class BaseConversation -If you're interested in running OpenHands in a multi-tenant environment, please [contact us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) about our enterprise deployment options. +Bases: `ABC` - -Using OpenHands for work? We'd love to chat! Fill out -[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) -to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide -input on our product roadmap. - +Abstract base class for conversation implementations. -## Safety and Security +This class defines the interface that all conversation implementations must follow. +Conversations manage the interaction between users and agents, handling message +exchange, execution control, and state management. -### It's doing stuff without asking, is that safe? -**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container -(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration: +#### Properties -**What's protected:** -- Your host system files and programs (unless you mount them using [this feature](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)) -- Host system resources -- Other containers and processes +- `confirmation_policy_active`: bool +- `conversation_stats`: ConversationStats +- `id`: UUID +- `is_confirmation_mode_active`: bool + Check if confirmation mode is active. + Returns True if BOTH conditions are met: + 1. The conversation state has a security analyzer set (not None) + 2. The confirmation policy is active +- `state`: ConversationStateProtocol -**Potential risks to consider:** -- The agent can access the internet from within the container. -- If you provide credentials (API keys, tokens), the agent can use them. -- Mounted files and directories can be modified or deleted. -- Network requests can be made to external services. +#### Methods -For detailed security information, see our [Runtime Architecture](/openhands/usage/architecture/runtime), -[Security Configuration](/openhands/usage/advanced/configuration-options#security-configuration), -and [Hardened Docker Installation](/openhands/usage/sandboxes/docker#hardened-docker-installation) documentation. +#### __init__() -## File Storage and Access +Initialize the base conversation with span tracking. -### Where are my files stored? +#### abstractmethod ask_agent() -Your files are stored in different locations depending on how you've configured OpenHands: +Ask the agent a simple, stateless question and get a direct LLM response. -**Default behavior (no file mounting):** -- Files created by the agent are stored inside the runtime Docker container. -- These files are temporary and will be lost when the container is removed. -- The agent works in the `/workspace` directory inside the runtime container. +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. -**When you mount your local filesystem (following [this](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)):** -- Your local files are mounted into the container's `/workspace` directory. -- Changes made by the agent are reflected in your local filesystem. -- Files persist after the container is stopped. +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent - -Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory. - +#### abstractmethod close() -## Development Tools and Environment +#### static compose_callbacks() -### How do I get the dev tools I need? +Compose multiple callbacks into a single callback function. -OpenHands comes with a basic runtime environment that includes Python and Node.js. -It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment. +* Parameters: + `callbacks` – An iterable of callback functions +* Returns: + A single callback function that calls all provided callbacks -If you would like to set things up more systematically, you can: -- **Use setup.sh**: Add a [setup.sh file](/openhands/usage/customization/repository#setup-script) file to - your repository, which will be run every time the agent starts. -- **Use a custom sandbox**: Use a [custom docker image](/openhands/usage/advanced/custom-sandbox-guide) to initialize the sandbox. +#### abstractmethod condense() -### Something's not working. Where can I get help? +Force condensation of the conversation history. -1. **Search existing issues**: Check our [GitHub issues](https://github.com/OpenHands/OpenHands/issues) to see if - others have encountered the same problem. -2. **Join our community**: Get help from other users and developers: - - [Slack community](https://openhands.dev/joinslack) -3. **Check our troubleshooting guide**: Common issues and solutions are documented in - [Troubleshooting](/openhands/usage/troubleshooting/troubleshooting). -4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/OpenHands/OpenHands/issues/new) - and fill in as much detail as possible. +This method uses the existing condensation request pattern to trigger +condensation. It adds a CondensationRequest event to the conversation +and forces the agent to take a single step to process it. +The condensation will be applied immediately and will modify the conversation +state by adding a condensation event to the history. -# First Projects -Source: https://docs.openhands.dev/overview/first-projects +* Raises: + `ValueError` – If no condenser is configured or the condenser doesn’t + handle condensation requests. -Like any tool, it works best when you know how to use it effectively. Whether you're experimenting with a small -script or making changes in a large codebase, this guide will show how to apply OpenHands in different scenarios. +#### abstractmethod execute_tool() -Let’s walk through a natural progression of using OpenHands: -- Try a simple prompt. -- Build a project from scratch. -- Add features to existing code. -- Refactor code. -- Debug and fix bugs. +Execute a tool directly without going through the agent loop. -## First Steps: Hello World +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. -Start with a small task to get familiar with how OpenHands responds to prompts. +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. -Click `New Conversation` and try prompting: -> Write a bash script hello.sh that prints "hello world!" +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop -OpenHands will generate script, set the correct permissions, and even run it for you. +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor -Now try making small changes: +#### abstractmethod generate_title() -> Modify hello.sh so that it accepts a name as the first argument, but defaults to "world". +Generate a title for the conversation based on the first user message. -You can experiment in any language. For example: - -> Convert hello.sh to a Ruby script, and run it. +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. - - Start small and iterate. This helps you understand how OpenHands interprets and responds to different prompts. - +#### static get_persistence_dir() -## Build Something from Scratch +Get the persistence directory for the conversation. -Agents excel at "greenfield" tasks, where they don’t need context about existing code. -Begin with a simple task and iterate from there. Be specific about what you want and the tech stack. +* Parameters: + * `persistence_base_dir` – Base directory for persistence. Can be a string + path or Path object. + * `conversation_id` – Unique conversation ID. +* Returns: + String path to the conversation-specific persistence directory. + Always returns a normalized string path even if a Path was provided. -Click `New Conversation` and give it a clear goal: +#### abstractmethod pause() -> Build a frontend-only TODO app in React. All state should be stored in localStorage. +#### abstractmethod reject_pending_actions() -Once the basics are working, build on it just like you would in a real project: +#### abstractmethod run() -> Allow adding an optional due date to each task. +Execute the agent to process messages and perform actions. -You can also ask OpenHands to help with version control: +This method runs the agent until it finishes processing the current +message or reaches the maximum iteration limit. -> Commit the changes and push them to a new branch called "feature/due-dates". +#### abstractmethod send_message() - - Break your goals into small, manageable tasks.. Keep pushing your changes often. This makes it easier to recover - if something goes off track. - +Send a message to the agent. -## Expand Existing Code +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. -Want to add new functionality to an existing repo? OpenHands can do that too. +#### abstractmethod set_confirmation_policy() - -If you're running OpenHands on your own, first add a -[GitHub token](/openhands/usage/settings/integrations-settings#github-setup), -[GitLab token](/openhands/usage/settings/integrations-settings#gitlab-setup) or -[Bitbucket token](/openhands/usage/settings/integrations-settings#bitbucket-setup). - +Set the confirmation policy for the conversation. -Choose your repository and branch via `Open Repository`, and press `Launch`. +#### abstractmethod set_security_analyzer() -Examples of adding new functionality: +Set the security analyzer for the conversation. -> Add a GitHub action that lints the code in this repository. +#### abstractmethod update_secrets() -> Modify ./backend/api/routes.js to add a new route that returns a list of all tasks. +### class Conversation -> Add a new React component to the ./frontend/components directory to display a list of Widgets. -> It should use the existing Widget component. +### class Conversation - - OpenHands can explore the codebase, but giving it context upfront makes it faster and less expensive. - +Bases: `object` -## Refactor Code +Factory class for creating conversation instances with OpenHands agents. -OpenHands does great at refactoring code in small chunks. Rather than rearchitecting the entire codebase, it's more -effective in focused refactoring tasks. Start by launching a conversation with -your repo and branch. Then guide it: +This factory automatically creates either a LocalConversation or RemoteConversation +based on the workspace type provided. LocalConversation runs the agent locally, +while RemoteConversation connects to a remote agent server. -> Rename all the single-letter variables in ./app.go. +* Returns: + LocalConversation if workspace is local, RemoteConversation if workspace + is remote. -> Split the `build_and_deploy_widgets` function into two functions, `build_widgets` and `deploy_widgets` in widget.php. +#### Example -> Break ./api/routes.js into separate files for each route. +```pycon +>>> from openhands.sdk import LLM, Agent, Conversation +>>> from openhands.sdk.plugin import PluginSource +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> agent = Agent(llm=llm, tools=[]) +>>> conversation = Conversation( +... agent=agent, +... workspace="./workspace", +... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], +... ) +>>> conversation.send_message("Hello!") +>>> conversation.run() +``` - - Focus on small, meaningful improvements instead of full rewrites. - +### class ConversationExecutionStatus -## Debug and Fix Bugs +Bases: `str`, `Enum` -OpenHands can help debug and fix issues, but it’s most effective when you’ve narrowed things down. +Enum representing the current execution state of the conversation. -Give it a clear description of the problem and the file(s) involved: +#### Methods -> The email field in the `/subscribe` endpoint is rejecting .io domains. Fix this. +#### DELETING = 'deleting' -> The `search_widgets` function in ./app.py is doing a case-sensitive search. Make it case-insensitive. +#### ERROR = 'error' -For bug fixing, test-driven development can be really useful. You can ask OpenHands to write a new test and iterate -until the bug is fixed: +#### FINISHED = 'finished' -> The `hello` function crashes on the empty string. Write a test that reproduces this bug, then fix the code so it passes. +#### IDLE = 'idle' - - Be as specific as possible. Include expected behavior, file names, and examples to speed things up. - +#### PAUSED = 'paused' -## Using OpenHands Effectively +#### RUNNING = 'running' -OpenHands can assist with nearly any coding task, but it takes some practice to get the best results. -Keep these tips in mind: -* Keep your tasks small. -* Be clear and specific. -* Provide relevant context. -* Commit and push frequently. +#### STUCK = 'stuck' -See [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) for more tips on how to get the most -out of OpenHands. +#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' +#### is_terminal() -# Introduction -Source: https://docs.openhands.dev/overview/introduction +Check if this status represents a terminal state. -🙌 Welcome to OpenHands, a [community](/overview/community) focused on AI-driven development. We'd love for you to [join us on Slack](https://openhands.dev/joinslack). +Terminal states indicate the run has completed and the agent is no longer +actively processing. These are: FINISHED, ERROR, STUCK. -There are a few ways to work with OpenHands: +Note: IDLE is NOT a terminal state - it’s the initial state of a conversation +before any run has started. Including IDLE would cause false positives when +the WebSocket delivers the initial state update during connection. -## OpenHands Software Agent SDK -The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below. +* Returns: + True if this is a terminal status, False otherwise. -Define agents in code, then run them locally, or scale to 1000s of agents in the cloud +### class ConversationState -[Check out the docs](https://docs.openhands.dev/sdk) or [view the source](https://github.com/All-Hands-AI/agent-sdk/) +Bases: `OpenHandsModel` -## OpenHands CLI -The CLI is the easiest way to start using OpenHands. The experience will be familiar to anyone who has worked -with e.g. Claude Code or Codex. You can power it with Claude, GPT, or any other LLM. -[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/cli-mode) or [view the source](https://github.com/OpenHands/OpenHands-CLI) +#### Properties -## OpenHands Local GUI -Use the Local GUI for running agents on your laptop. It comes with a REST API and a single-page React application. -The experience will be familiar to anyone who has used Devin or Jules. +- `activated_knowledge_skills`: list[str] +- `agent`: AgentBase +- `agent_state`: dict[str, Any] +- `blocked_actions`: dict[str, str] +- `blocked_messages`: dict[str, str] +- `confirmation_policy`: ConfirmationPolicyBase +- `env_observation_persistence_dir`: str | None + Directory for persisting environment observation files. +- `events`: [EventLog](#class-eventlog) +- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) +- `id`: UUID +- `max_iterations`: int +- `persistence_dir`: str | None +- `secret_registry`: [SecretRegistry](#class-secretregistry) +- `security_analyzer`: SecurityAnalyzerBase | None +- `stats`: ConversationStats +- `stuck_detection`: bool +- `workspace`: BaseWorkspace -[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup) or view the source in this repo. +#### Methods -## OpenHands Cloud -This is a commercial deployment of OpenHands GUI, running on hosted infrastructure. +#### acquire() -You can try it with a free by [signing in with your GitHub account](https://app.all-hands.dev). +Acquire the lock. -OpenHands Cloud comes with source-available features and integrations: -- Deeper integrations with GitHub, GitLab, and Bitbucket -- Integrations with Slack, Jira, and Linear -- Multi-user support -- RBAC and permissions -- Collaboration features (e.g., conversation sharing) -- Usage reporting -- Budgeting enforcement +* Parameters: + * `blocking` – If True, block until lock is acquired. If False, return + immediately. + * `timeout` – Maximum time to wait for lock (ignored if blocking=False). + -1 means wait indefinitely. +* Returns: + True if lock was acquired, False otherwise. -## OpenHands Enterprise -Large enterprises can work with us to self-host OpenHands Cloud in their own VPC, via Kubernetes. -OpenHands Enterprise can also work with the CLI and SDK above. +#### block_action() -OpenHands Enterprise is source-available--you can see all the source code here in the enterprise/ directory, -but you'll need to purchase a license if you want to run it for more than one month. +Persistently record a hook-blocked action. -Enterprise contracts also come with extended support and access to our research team. +#### block_message() -Learn more at [openhands.dev/enterprise](https://openhands.dev/enterprise) +Persistently record a hook-blocked user message. -## Everything Else +#### classmethod create() -Check out our [Product Roadmap](https://github.com/orgs/openhands/projects/1), and feel free to -[open up an issue](https://github.com/OpenHands/OpenHands/issues) if there's something you'd like to see! +Create a new conversation state or resume from persistence. -You might also be interested in our [evaluation infrastructure](https://github.com/OpenHands/benchmarks), our [chrome extension](https://github.com/OpenHands/openhands-chrome-extension/), or our [Theory-of-Mind module](https://github.com/OpenHands/ToM-SWE). +This factory method handles both new conversation creation and resumption +from persisted state. -All our work is available under the MIT license, except for the `enterprise/` directory in this repository (see the [enterprise license](https://github.com/OpenHands/OpenHands/blob/main/enterprise/LICENSE) for details). -The core `openhands` and `agent-server` Docker images are fully MIT-licensed as well. +New conversation: +The provided Agent is used directly. Pydantic validation happens via the +cls() constructor. -If you need help with anything, or just want to chat, [come find us on Slack](https://openhands.dev/joinslack). +Restored conversation: +The provided Agent is validated against the persisted agent using +agent.load(). Tools must match (they may have been used in conversation +history), but all other configuration can be freely changed: LLM, +agent_context, condenser, system prompts, etc. +* Parameters: + * `id` – Unique conversation identifier + * `agent` – The Agent to use (tools must match persisted on restore) + * `workspace` – Working directory for agent operations + * `persistence_dir` – Directory for persisting state and events + * `max_iterations` – Maximum iterations per run + * `stuck_detection` – Whether to enable stuck detection + * `cipher` – Optional cipher for encrypting/decrypting secrets in + persisted state. If provided, secrets are encrypted when + saving and decrypted when loading. If not provided, secrets + are redacted (lost) on serialization. +* Returns: + ConversationState ready for use +* Raises: + * `ValueError` – If conversation ID or tools mismatch on restore + * `ValidationError` – If agent or other fields fail Pydantic validation -# Model Context Protocol (MCP) -Source: https://docs.openhands.dev/overview/model-context-protocol +#### static get_unmatched_actions() -Model Context Protocol (MCP) is an open standard that allows OpenHands to communicate with external tool servers, extending the agent's capabilities with custom tools, specialized data processing, external API access, and more. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). +Find actions in the event history that don’t have matching observations. -## How MCP Works +This method identifies ActionEvents that don’t have corresponding +ObservationEvents or UserRejectObservations, which typically indicates +actions that are pending confirmation or execution. -When OpenHands starts, it: +* Parameters: + `events` – List of events to search through +* Returns: + List of ActionEvent objects that don’t have corresponding observations, + in chronological order -1. Reads the MCP configuration -2. Connects to configured servers (SSE, SHTTP, or stdio) -3. Registers tools provided by these servers with the agent -4. Routes tool calls to appropriate MCP servers during execution +#### locked() -## MCP Support Matrix +Return True if the lock is currently held by any thread. -| Platform | Support Level | Configuration Method | Documentation | -|----------|---------------|---------------------|---------------| -| **CLI** | ✅ Full Support | `~/.openhands/mcp.json` file | [CLI MCP Servers](/openhands/usage/cli/mcp-servers) | -| **SDK** | ✅ Full Support | Programmatic configuration | [SDK MCP Guide](/sdk/guides/mcp) | -| **Local GUI** | ✅ Full Support | Settings UI + config files | [Local GUI](/openhands/usage/run-openhands/local-setup) | -| **OpenHands Cloud** | ✅ Full Support | Cloud UI settings | [Cloud GUI](/openhands/usage/cloud/cloud-ui) | +#### model_config = (configuration object) -## Platform-Specific Differences +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - - - - Configuration via `~/.openhands/mcp.json` file - - Real-time status monitoring with `/mcp` command - - Supports all MCP transport protocols (SSE, SHTTP, stdio) - - Manual configuration required - - - - Programmatic configuration in code - - Full control over MCP server lifecycle - - Dynamic server registration and management - - Integration with custom tool systems - - - - Visual configuration through Settings UI - - File-based configuration backup - - Real-time server status display - - Supports all transport protocols - - - - Cloud-based configuration management - - Managed MCP server hosting options - - Team-wide configuration sharing - - Enterprise security features - - +#### model_post_init() -## Getting Started with MCP +This function is meant to behave like a BaseModel method to initialise private attributes. -- **For detailed configuration**: See [MCP Settings](/openhands/usage/settings/mcp-settings) -- **For SDK integration**: See [SDK MCP Guide](/sdk/guides/mcp) -- **For architecture details**: See [MCP Architecture](/sdk/arch/mcp) +It takes context as an argument since that’s what pydantic-core passes when calling it. +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -# Quick Start -Source: https://docs.openhands.dev/overview/quickstart +#### owned() -Get started with OpenHands in minutes. Choose the option that works best for you. +Return True if the lock is currently held by the calling thread. - - - **Recommended** +#### pop_blocked_action() - The fastest way to get started. No setup required—just sign in and start coding. +Remove and return a hook-blocked action reason, if present. - - Free usage of MiniMax M2.5 for a limited time - - No installation needed - - Managed infrastructure - - - Use OpenHands from your terminal. Perfect for automation and scripting. +#### pop_blocked_message() - - IDE integrations available - - Headless mode for CI/CD - - Lightweight installation - - - Run OpenHands locally with a web-based interface. Bring your own LLM and API key. +Remove and return a hook-blocked message reason, if present. - - Full control over your environment - - Works offline - - Docker-based setup - - +#### release() +Release the lock. -# Overview -Source: https://docs.openhands.dev/overview/skills +* Raises: + `RuntimeError` – If the current thread doesn’t own the lock. -Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. They provide consistent practices across projects and can be triggered automatically based on keywords or context. +#### set_on_state_change() - -OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers for automatic activation. See the [SDK Skills Guide](/sdk/guides/skill) for details on the SKILL.md format. - +Set a callback to be called when state changes. -## Official Skill Registry +* Parameters: + `callback` – A function that takes an Event (ConversationStateUpdateEvent) + or None to remove the callback -The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands agents. You can browse available skills, contribute your own, and learn from examples created by the community. +### class ConversationVisualizerBase -## How Skills Work +Bases: `ABC` -Skills inject additional context and rules into the agent's behavior. +Base class for conversation visualizers. -At a high level, OpenHands supports two loading models: +This abstract base class defines the interface that all conversation visualizers +must implement. Visualizers can be created before the Conversation is initialized +and will be configured with the conversation state automatically. -- **Always-on context** (e.g., `AGENTS.md`) that is injected into the system prompt at conversation start. -- **On-demand skills** that are either: - - **triggered by the user** (keyword matches), or - - **invoked by the agent** (the agent decides to look up the full skill content). +The typical usage pattern: +1. Create a visualizer instance: -## Permanent agent context (recommended) + viz = MyVisualizer() +1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) +2. Conversation automatically calls viz.initialize(state) to attach the state -For repository-wide, always-on instructions, prefer a root-level `AGENTS.md` file. +You can also pass the uninstantiated class if you don’t need extra args +: for initialization, and Conversation will create it: + : conv = Conversation(agent, visualizer=MyVisualizer) -We also support model-specific variants: -- `GEMINI.md` for Gemini -- `CLAUDE.md` for Claude +Conversation will then calls MyVisualizer() followed by initialize(state) -## Triggered and optional skills -To add optional skills that are loaded on demand: +#### Properties -- **AgentSkills standard (recommended for progressive disclosure)**: create one directory per skill and add a `SKILL.md` file. -- **Legacy/OpenHands format (simple)**: put markdown files in `.agents/skills/*.md` at the repository root. +- `conversation_stats`: ConversationStats | None + Get conversation stats from the state. - -Loaded skills take up space in the context window. On-demand skills help keep the system prompt smaller because the agent sees a summary first and reads the full content only when needed. - +#### Methods -### Example Repository Structure +#### __init__() -``` -some-repository/ -├── AGENTS.md # Permanent repository guidelines (recommended) -└── .agents/ - └── skills/ - ├── rot13-encryption/ # AgentSkills standard (progressive disclosure) - │ ├── SKILL.md - │ ├── scripts/ - │ │ └── rot13.sh - │ └── references/ - │ └── README.md - ├── another-agentskill/ # AgentSkills standard (progressive disclosure) - │ ├── SKILL.md - │ └── scripts/ - │ └── placeholder.sh - └── legacy_trigger_this.md # Legacy/OpenHands format (keyword-triggered) -``` +Initialize the visualizer base. -## Skill Loading Precedence +#### create_sub_visualizer() -For project location, paths are relative to the repository root; `.agents/skills/` is a subdirectory of the project directory. -For user home location, paths are relative to the user home: `~/` +Create a visualizer for a sub-agent during delegation. -When multiple skills share the same name, OpenHands keeps the first match in this order: +Override this method to support sub-agent visualization in multi-agent +delegation scenarios. The sub-visualizer will be used to display events +from the spawned sub-agent. -1. `.agents/skills/` (recommended) -2. `.openhands/skills/` (deprecated) -3. `.openhands/microagents/` (deprecated) +By default, returns None which means sub-agents will not have visualization. +Subclasses that support delegation (like DelegationVisualizer) should +override this method to create appropriate sub-visualizers. -Project-specific skills take precedence over user skills. +* Parameters: + `agent_id` – The identifier of the sub-agent being spawned +* Returns: + A visualizer instance for the sub-agent, or None if sub-agent + visualization is not supported -## Skill Types +#### final initialize() -Currently supported skill types: +Initialize the visualizer with conversation state. -- **[Permanent Context](/overview/skills/repo)**: Repository-wide guidelines and best practices. We recommend `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`). -- **[Keyword-Triggered Skills](/overview/skills/keyword)**: Guidelines activated by specific keywords in user prompts. -- **[Organization Skills](/overview/skills/org)**: Team or organization-wide standards. -- **[Global Skills](/overview/skills/public)**: Community-shared skills and templates. +This method is called by Conversation after the state is created, +allowing the visualizer to access conversation stats and other +state information. -### Skills Frontmatter Requirements +Subclasses should not override this method, to ensure the state is set. -Each skill file may include frontmatter that provides additional information. In some cases, this frontmatter is required: +* Parameters: + `state` – The conversation state object -| Skill Type | Required | -|-------------|----------| -| General Skills | No | -| Keyword-Triggered Skills | Yes | +#### abstractmethod on_event() -## Skills Support Matrix +Handle a conversation event. -| Platform | Support Level | Configuration Method | Implementation | Documentation | -|----------|---------------|---------------------|----------------|---------------| -| **CLI** | ✅ Full Support | `~/.agents/skills/` (user-level) and `.agents/skills/` (repo-level) | File-based markdown | [Skills Overview](/overview/skills) | -| **SDK** | ✅ Full Support | Programmatic `Skill` objects | Code-based configuration | [SDK Skills Guide](/sdk/guides/skill) | -| **Local GUI** | ✅ Full Support | `.agents/skills/` + UI | File-based with UI management | [Local Setup](/openhands/usage/run-openhands/local-setup) | -| **OpenHands Cloud** | ✅ Full Support | Cloud UI + repository integration | Managed skill library | [Cloud UI](/openhands/usage/cloud/cloud-ui) | +This method is called for each event in the conversation and should +implement the visualization logic. -## Platform-Specific Differences +* Parameters: + `event` – The event to visualize - - - - File-based configuration in two locations: - - `~/.agents/skills/` - User-level skills (all conversations). - - `.agents/skills/` - Repository-level skills (current directory) - - Markdown format for skill definitions - - Manual file management required - - Supports both general and keyword-triggered skills - - - - Programmatic `Skill` objects in code - - Dynamic skill creation and management - - Integration with custom workflows - - Full control over skill lifecycle - - - - Visual skill management through UI - - File-based storage with GUI editing - - Real-time skill status display - - Drag-and-drop skill organization - - - - Cloud-based skill library management - - Team-wide skill sharing and templates - - Organization-level skill policies - - Integrated skill marketplace - - +### class DefaultConversationVisualizer -## Learn More +Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) -- **For SDK integration**: See [SDK Skills Guide](/sdk/guides/skill) -- **For architecture details**: See [Skills Architecture](/sdk/arch/skill) -- **For specific skill types**: See [Repository Skills](/overview/skills/repo), [Keyword Skills](/overview/skills/keyword), [Organization Skills](/overview/skills/org), and [Global Skills](/overview/skills/public) +Handles visualization of conversation events with Rich formatting. +Provides Rich-formatted output with semantic dividers and complete content display. -# Keyword-Triggered Skills -Source: https://docs.openhands.dev/overview/skills/keyword +#### Methods -## Usage +#### __init__() -These skills are only loaded when a prompt includes one of the trigger words. +Initialize the visualizer. -## Frontmatter Syntax - -Frontmatter is required for keyword-triggered skills. It must be placed at the top of the file, -above the guidelines. +* Parameters: + * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles + for highlighting keywords in the visualizer. + For example: (configuration object) + * `skip_user_messages` – If True, skip displaying user messages. Useful for + scenarios where user input is not relevant to show. -Enclose the frontmatter in triple dashes (---) and include the following fields: +#### on_event() -| Field | Description | Required | Default | -|------------|--------------------------------------------------|----------|------------------| -| `triggers` | A list of keywords that activate the skill. | Yes | None | +Main event handler that displays events with Rich formatting. +### class EventLog -## Example +Bases: [`EventsListBase`](#class-eventslistbase) -Keyword-triggered skill file example located at `.agents/skills/yummy.md`: -``` ---- -triggers: -- yummyhappy -- happyyummy ---- +Persistent event log with locking for concurrent writes. -The user has said the magic word. Respond with "That was delicious!" -``` +This class provides thread-safe and process-safe event storage using +the FileStore’s locking mechanism. Events are persisted to disk and +can be accessed by index or event ID. -[See examples of keyword-triggered skills in the official OpenHands Skills Registry](https://github.com/OpenHands/extensions) +#### Methods +#### NOTE +For LocalFileStore, file locking via flock() does NOT work reliably +on NFS mounts or network filesystems. Users deploying with shared +storage should use alternative coordination mechanisms. -# Organization and User Skills -Source: https://docs.openhands.dev/overview/skills/org +#### __init__() -## Usage +#### append() -These skills can be [any type of skill](/overview/skills#skill-types) and will be loaded -accordingly. However, they are applied to all repositories belonging to the organization or user. +Append an event with locking for thread/process safety. -Add a `.agents` repository under the organization or user and create a `skills` directory and place the -skills in that directory. +* Raises: + * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. + * `ValueError` – If an event with the same ID already exists. -For GitLab organizations, use `openhands-config` as the repository name instead of `.agents`, since GitLab doesn't support repository names starting with non-alphanumeric characters. +#### get_id() -## Example +Return the event_id for a given index. -General skill file example for organization `Great-Co` located inside the `.agents` repository: -`skills/org-skill.md`: -``` -* Use type hints and error boundaries; validate inputs at system boundaries and fail with meaningful error messages. -* Document interfaces and public APIs; use implementation comments only for non-obvious logic. -* Follow the same naming convention for variables, classes, constants, etc. already used in each repository. -``` +#### get_index() -For GitLab organizations, the same skill would be located inside the `openhands-config` repository. +Return the integer index for a given event_id. -## User Skills When Running Openhands on Your Own +### class EventsListBase - - This works with CLI, headless and development modes. It does not work out of the box when running OpenHands using the docker command. - +Bases: `Sequence`[`Event`], `ABC` -When running OpenHands on your own, you can place skills in the `~/.agents/skills` folder on your local -system and OpenHands will always load it for all your conversations. Repo-level overrides live in `.agents/skills`. +Abstract base class for event lists that can be appended to. +This provides a common interface for both local EventLog and remote +RemoteEventsList implementations, avoiding circular imports in protocols. -# Global Skills -Source: https://docs.openhands.dev/overview/skills/public +#### Methods -## Global Skill Registry +#### abstractmethod append() -The official global skill registry is hosted at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands users. +Add a new event to the list. -## Contributing a Global Skill +### class LocalConversation -You can create global skills and share with the community by opening a pull request to the official skill registry. +Bases: [`BaseConversation`](#class-baseconversation) -See the [OpenHands Skill Registry](https://github.com/OpenHands/extensions) for specific instructions on how to contribute a global skill. -### Global Skills Best Practices +#### Properties -- **Clear Scope**: Keep the skill focused on a specific domain or task. -- **Explicit Instructions**: Provide clear, unambiguous guidelines. -- **Useful Examples**: Include practical examples of common use cases. -- **Safety First**: Include necessary warnings and constraints. -- **Integration Awareness**: Consider how the skill interacts with other components. +- `agent`: AgentBase +- `delete_on_close`: bool = True +- `id`: UUID + Get the unique ID of the conversation. +- `llm_registry`: LLMRegistry +- `max_iteration_per_run`: int +- `resolved_plugins`: list[ResolvedPluginSource] | None + Get the resolved plugin sources after plugins are loaded. + Returns None if plugins haven’t been loaded yet, or if no plugins + were specified. Use this for persistence to ensure conversation + resume uses the exact same plugin versions. +- `state`: [ConversationState](#class-conversationstate) + Get the conversation state. + It returns a protocol that has a subset of ConversationState methods + and properties. We will have the ability to access the same properties + of ConversationState on a remote conversation object. + But we won’t be able to access methods that mutate the state. +- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None + Get the stuck detector instance if enabled. +- `workspace`: LocalWorkspace -### Steps to Contribute a Global Skill +#### Methods -#### 1. Plan the Global Skill +#### __init__() -Before creating a global skill, consider: +Initialize the conversation. -- What specific problem or use case will it address? -- What unique capabilities or knowledge should it have? -- What trigger words make sense for activating it? -- What constraints or guidelines should it follow? +* Parameters: + * `agent` – The agent to use for the conversation. + * `workspace` – Working directory for agent operations and tool execution. + Can be a string path, Path object, or LocalWorkspace instance. + * `plugins` – Optional list of plugins to load. Each plugin is specified + with a source (github:owner/repo, git URL, or local path), + optional ref (branch/tag/commit), and optional repo_path for + monorepos. Plugins are loaded in order with these merge + semantics: skills override by name (last wins), MCP config + override by key (last wins), hooks concatenate (all run). + * `persistence_dir` – Directory for persisting conversation state and events. + Can be a string path or Path object. + * `conversation_id` – Optional ID for the conversation. If provided, will + be used to identify the conversation. The user might want to + suffix their persistent filestore with this ID. + * `callbacks` – Optional list of callback functions to handle events + * `token_callbacks` – Optional list of callbacks invoked for streaming deltas + * `hook_config` – Optional hook configuration to auto-wire session hooks. + If plugins are loaded, their hooks are combined with this config. + * `max_iteration_per_run` – Maximum number of iterations per run + * `visualizer` – -#### 2. Create File + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `stuck_detection` – Whether to enable stuck detection + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted + state. If provided, secrets are encrypted when saving and + decrypted when loading. If not provided, secrets are redacted + (lost) on serialization. -Create a new Markdown file with a descriptive name in the official skill registry: -[github.com/OpenHands/extensions](https://github.com/OpenHands/extensions) +#### ask_agent() -#### 3. Testing the Global Skill +Ask the agent a simple, stateless question and get a direct LLM response. -- Test the agent with various prompts. -- Verify trigger words activate the agent correctly. -- Ensure instructions are clear and comprehensive. -- Check for potential conflicts and overlaps with existing agents. +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. -#### 4. Submission Process +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent -Submit a pull request with: +#### close() -- The new skill file. -- Updated documentation if needed. -- Description of the agent's purpose and capabilities. +Close the conversation and clean up all tool executors. +#### condense() -# General Skills -Source: https://docs.openhands.dev/overview/skills/repo +Synchronously force condense the conversation history. -## Usage +If the agent is currently running, condense() will wait for the +ongoing step to finish before proceeding. -These skills are always loaded as part of the context. +Raises ValueError if no compatible condenser exists. -## Frontmatter Syntax +#### property conversation_stats -The frontmatter for this type of skill is optional. +#### execute_tool() -Frontmatter should be enclosed in triple dashes (---) and may include the following fields: +Execute a tool directly without going through the agent loop. -| Field | Description | Required | Default | -|-----------|-----------------------------------------|----------|----------------| -| `agent` | The agent this skill applies to | No | 'CodeActAgent' | +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. -## Creating a Repository Agent +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. -To create an effective repository agent, you can ask OpenHands to analyze your repository with a prompt like: +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop -``` -Please browse the repository, look at the documentation and relevant code, and understand the purpose of this repository. +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor -Specifically, I want you to create an `AGENTS.md` file at the repository root. This file should contain succinct information that summarizes: -1. The purpose of this repository -2. The general setup of this repo -3. A brief description of the structure of this repo +#### generate_title() -Read all the GitHub workflows under .github/ of the repository (if this folder exists) to understand the CI checks (e.g., linter, pre-commit), and include those in the `AGENTS.md` file. -``` +Generate a title for the conversation based on the first user message. -This approach helps OpenHands capture repository context efficiently, reducing the need for repeated searches during conversations and ensuring more accurate solutions. +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses self.agent.llm. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. -## Example Content +#### pause() -An `AGENTS.md` file should include: +Pause agent execution. -``` -# Repository Purpose -This project is a TODO application that allows users to track TODO items. +This method can be called from any thread to request that the agent +pause execution. The pause will take effect at the next iteration +of the run loop (between agent steps). -# Setup Instructions -To set it up, you can run `npm run build`. +Note: If called during an LLM completion, the pause will not take +effect until the current LLM call completes. -# Repository Structure -- `/src`: Core application code -- `/tests`: Test suite -- `/docs`: Documentation -- `/.github`: CI/CD workflows +#### reject_pending_actions() -# CI/CD Workflows -- `lint.yml`: Runs ESLint on all JavaScript files -- `test.yml`: Runs the test suite on pull requests +Reject all pending actions from the agent. -# Development Guidelines -Always make sure the tests are passing before committing changes. You can run the tests by running `npm run test`. -``` +This is a non-invasive method to reject actions between run() calls. +Also clears the agent_waiting_for_confirmation flag. -[See more examples of general skills at OpenHands Skills registry.](https://github.com/OpenHands/extensions) +#### run() +Runs the conversation until the agent finishes. -# Software Agent SDK -Source: https://docs.openhands.dev/sdk +In confirmation mode: +- First call: creates actions but doesn’t execute them, stops and waits +- Second call: executes pending actions (implicit confirmation) -The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. +In normal mode: +- Creates and executes actions immediately -You can use the OpenHands Software Agent SDK for: +Can be paused between steps -- One-off tasks, like building a README for your repo -- Routine maintenance tasks, like updating dependencies -- Major tasks that involve multiple agents, like refactors and rewrites +#### send_message() -You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). +Send a message to the agent. -Get started with some examples or keep reading to learn more. +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. -## Features +#### set_confirmation_policy() - - - A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. - - - Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. - - - A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. - - +Set the confirmation policy and store it in conversation state. -## Why OpenHands Software Agent SDK? +#### set_security_analyzer() -### Emphasis on coding +Set the security analyzer for the conversation. -While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. +#### update_secrets() -While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. +Add secrets to the conversation. -### State-of-the-Art Performance +* Parameters: + `secrets` – Dictionary mapping secret keys to values or no-arg callables. + SecretValue = str | Callable[[], str]. Callables are invoked lazily + when a command references the secret key. -OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: +### class RemoteConversation -- Task planning and decomposition -- Automatic context compression -- Security analysis -- Strong agent-computer interfaces +Bases: [`BaseConversation`](#class-baseconversation) -OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. -### Free and Open Source +#### Properties -OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. +- `agent`: AgentBase +- `delete_on_close`: bool = False +- `id`: UUID +- `max_iteration_per_run`: int +- `state`: RemoteState + Access to remote conversation state. +- `workspace`: RemoteWorkspace -Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! +#### Methods -## Get Started +#### __init__() - - - Install the SDK, run your first agent, and explore the guides. - - +Remote conversation proxy that talks to an agent server. -## Learn the SDK +* Parameters: + * `agent` – Agent configuration (will be sent to the server) + * `workspace` – The working directory for agent operations and tool execution. + * `plugins` – Optional list of plugins to load on the server. Each plugin + is a PluginSource specifying source, ref, and repo_path. + * `conversation_id` – Optional existing conversation id to attach to + * `callbacks` – Optional callbacks to receive events (not yet streamed) + * `max_iteration_per_run` – Max iterations configured on server + * `stuck_detection` – Whether to enable stuck detection on server + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `hook_config` – Optional hook configuration for session hooks + * `visualizer` – - - - Understand the SDK's architecture: agents, tools, workspaces, and more. - - - Explore the complete SDK API and source code. - - + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `secrets` – Optional secrets to initialize the conversation with -## Build with Examples +#### ask_agent() - - - Build local agents with custom tools and capabilities. - - - Run agents on remote servers with Docker sandboxing. - - - Automate repository tasks with agent-powered workflows. - - +Ask the agent a simple, stateless question and get a direct LLM response. -## Community +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. - - - Connect with the OpenHands community on Slack. - - - Contribute to the SDK or report issues on GitHub. - - +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent +#### close() -# openhands.sdk.agent -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent +Close the conversation and clean up resources. -### class Agent +Note: We don’t close self._client here because it’s shared with the workspace. +The workspace owns the client and will close it during its own cleanup. +Closing it here would prevent the workspace from making cleanup API calls. -Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) +#### condense() -Main agent implementation for OpenHands. +Force condensation of the conversation history. -The Agent class provides the core functionality for running AI agents that can -interact with tools, process messages, and execute actions. It inherits from -AgentBase and implements the agent execution logic. Critic-related functionality -is provided by CriticMixin. +This method sends a condensation request to the remote agent server. +The server will use the existing condensation request pattern to trigger +condensation if a condenser is configured and handles condensation requests. -#### Example +The condensation will be applied on the server side and will modify the +conversation state by adding a condensation event to the history. -```pycon ->>> from openhands.sdk import LLM, Agent, Tool ->>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) ->>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] ->>> agent = Agent(llm=llm, tools=tools) -``` +* Raises: + `HTTPError` – If the server returns an error (e.g., no condenser configured). +#### property conversation_stats -#### Properties +#### execute_tool() -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Execute a tool directly without going through the agent loop. -#### Methods +Note: This method is not yet supported for RemoteConversation. +Tool execution for remote conversations happens on the server side +during the normal agent loop. -#### init_state() +* Parameters: + * `tool_name` – The name of the tool to execute + * `action` – The action to pass to the tool executor +* Raises: + `NotImplementedError` – Always, as this feature is not yet supported + for remote conversations. -Initialize conversation state. +#### generate_title() -Invariants enforced by this method: -- If a SystemPromptEvent is already present, it must be within the first 3 +Generate a title for the conversation based on the first user message. - events (index 0 or 1 in practice; index 2 is included in the scan window - to detect a user message appearing before the system prompt). -- A user MessageEvent should not appear before the SystemPromptEvent. +* Parameters: + * `llm` – Optional LLM to use for title generation. If provided, its usage_id + will be sent to the server. If not provided, uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. -These invariants keep event ordering predictable for downstream components -(condenser, UI, etc.) and also prevent accidentally materializing the full -event history during initialization. +#### pause() -#### model_post_init() +#### reject_pending_actions() -This function is meant to behave like a BaseModel method to initialise private attributes. +#### run() -It takes context as an argument since that’s what pydantic-core passes when calling it. +Trigger a run on the server. * Parameters: - * `self` – The BaseModel instance. - * `context` – The context. + * `blocking` – If True (default), wait for the run to complete by polling + the server. If False, return immediately after triggering the run. + * `poll_interval` – Time in seconds between status polls (only used when + blocking=True). Default is 1.0 second. + * `timeout` – Maximum time in seconds to wait for the run to complete + (only used when blocking=True). Default is 3600 seconds. +* Raises: + `ConversationRunError` – If the run fails or times out. -#### step() +#### send_message() -Taking a step in the conversation. +Send a message to the agent. -Typically this involves: -1. Making a LLM call -2. Executing the tool -3. Updating the conversation state with +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. - LLM calls (role=”assistant”) and tool results (role=”tool”) +#### set_confirmation_policy() -4.1 If conversation is finished, set state.execution_status to FINISHED -4.2 Otherwise, just return, Conversation will kick off the next step +Set the confirmation policy for the conversation. -If the underlying LLM supports streaming, partial deltas are forwarded to -`on_token` before the full response is returned. +#### set_security_analyzer() -NOTE: state will be mutated in-place. +Set the security analyzer for the remote conversation. -### class AgentBase +#### property stuck_detector -Bases: `DiscriminatedUnionMixin`, `ABC` +Stuck detector for compatibility. +Not implemented for remote conversations. -Abstract base class for OpenHands agents. +#### update_secrets() -Agents are stateless and should be fully defined by their configuration. -This base class provides the common interface and functionality that all -agent implementations must follow. +### class SecretRegistry + +Bases: `OpenHandsModel` + +Manages secrets and injects them into bash commands when needed. + +The secret registry stores a mapping of secret keys to SecretSources +that retrieve the actual secret values. When a bash command is about to be +executed, it scans the command for any secret keys and injects the corresponding +environment variables. + +Secret sources will redact / encrypt their sensitive values as appropriate when +serializing, depending on the content of the context. If a context is present +and contains a ‘cipher’ object, this is used for encryption. If it contains a +boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. +Otherwise secrets are redacted. + +Additionally, it tracks the latest exported values to enable consistent masking +even when callable secrets fail on subsequent calls. #### Properties -- `agent_context`: AgentContext | None -- `condenser`: CondenserBase | None -- `critic`: CriticBase | None -- `dynamic_context`: str | None - Get the dynamic per-conversation context. - This returns the context that varies between conversations, such as: - - Repository information and skills - - Runtime information (hosts, working directory) - - User-specific secrets and settings - - Conversation instructions - This content should NOT be included in the cached system prompt to enable - cross-conversation cache sharing. Instead, it is sent as a second content - block (without a cache marker) inside the system message. - * Returns: - The dynamic context string, or None if no context is configured. -- `filter_tools_regex`: str | None -- `include_default_tools`: list[str] -- `llm`: LLM -- `mcp_config`: dict[str, Any] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: str - Returns the name of the Agent. -- `prompt_dir`: str - Returns the directory where this class’s module file is located. -- `security_policy_filename`: str -- `static_system_message`: str - Compute the static portion of the system message. - This returns only the base system prompt template without any dynamic - per-conversation context. This static portion can be cached and reused - across conversations for better prompt caching efficiency. - * Returns: - The rendered system prompt template without dynamic context. -- `system_message`: str - Return the combined system message (static + dynamic). -- `system_prompt_filename`: str -- `system_prompt_kwargs`: dict[str, object] -- `tools`: list[Tool] -- `tools_map`: dictstr, [ToolDefinition] - Get the initialized tools map. - :raises RuntimeError: If the agent has not been initialized. +- `secret_sources`: dict[str, SecretSource] #### Methods -#### get_all_llms() +#### find_secrets_in_text() -Recursively yield unique base-class LLM objects reachable from self. +Find all secret keys mentioned in the given text. -- Returns actual object references (not copies). -- De-dupes by id(LLM). -- Cycle-safe via a visited set for all traversed objects. -- Only yields objects whose type is exactly LLM (no subclasses). -- Does not handle dataclasses. +* Parameters: + `text` – The text to search for secret keys +* Returns: + Set of secret keys found in the text -#### init_state() +#### get_secrets_as_env_vars() -Initialize the empty conversation state to prepare the agent for user -messages. +Get secrets that should be exported as environment variables for a command. -Typically this involves adding system message +* Parameters: + `command` – The bash command to check for secret references +* Returns: + Dictionary of environment variables to export (key -> value) -NOTE: state will be mutated in-place. +#### mask_secrets_in_output() -#### model_dump_succint() +Mask secret values in the given text. -Like model_dump, but excludes None fields by default. +This method uses both the current exported values and attempts to get +fresh values from callables to ensure comprehensive masking. + +* Parameters: + `text` – The text to mask secrets in +* Returns: + Text with secret values replaced by `` + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. #### model_post_init() @@ -11566,2158 +11563,1897 @@ It takes context as an argument since that’s what pydantic-core passes when ca * `self` – The BaseModel instance. * `context` – The context. -#### abstractmethod step() - -Taking a step in the conversation. - -Typically this involves: -1. Making a LLM call -2. Executing the tool -3. Updating the conversation state with - - LLM calls (role=”assistant”) and tool results (role=”tool”) - -4.1 If conversation is finished, set state.execution_status to FINISHED -4.2 Otherwise, just return, Conversation will kick off the next step +#### update_secrets() -If the underlying LLM supports streaming, partial deltas are forwarded to -`on_token` before the full response is returned. +Add or update secrets in the manager. -NOTE: state will be mutated in-place. +* Parameters: + `secrets` – Dictionary mapping secret keys to either string values + or callable functions that return string values -#### Deprecated -Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and -[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation -enables cross-conversation prompt caching. Will be removed in 1.16.0. +### class StuckDetector -#### WARNING -Using this property DISABLES cross-conversation prompt caching because -it combines static and dynamic content into a single string. Use -[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately -to enable caching. +Bases: `object` -#### Deprecated -Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. +Detects when an agent is stuck in repetitive or unproductive patterns. -#### verify() +This detector analyzes the conversation history to identify various stuck patterns: +1. Repeating action-observation cycles +2. Repeating action-error cycles +3. Agent monologue (repeated messages without user input) +4. Repeating alternating action-observation patterns +5. Context window errors indicating memory issues -Verify that we can resume this agent from persisted state. -We do not merge configuration between persisted and runtime Agent -instances. Instead, we verify compatibility requirements and then -continue with the runtime-provided Agent. +#### Properties -Compatibility requirements: -- Agent class/type must match. -- Tools must match exactly (same tool names). +- `action_error_threshold`: int +- `action_observation_threshold`: int +- `alternating_pattern_threshold`: int +- `monologue_threshold`: int +- `state`: [ConversationState](#class-conversationstate) +- `thresholds`: StuckDetectionThresholds -Tools are part of the system prompt and cannot be changed mid-conversation. -To use different tools, start a new conversation or use conversation forking -(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). +#### Methods -All other configuration (LLM, agent_context, condenser, etc.) can be -freely changed between sessions. +#### __init__() -* Parameters: - * `persisted` – The agent loaded from persisted state. - * `events` – Unused, kept for API compatibility. -* Returns: - This runtime agent (self) if verification passes. -* Raises: - `ValueError` – If agent class or tools don’t match. +#### is_stuck() +Check if the agent is currently stuck. -# openhands.sdk.conversation -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation +Note: To avoid materializing potentially large file-backed event histories, +only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. +If a user message exists within this window, only events after it are checked. +Otherwise, all events in the window are analyzed. -### class BaseConversation +#### __init__() -Bases: `ABC` +### openhands.sdk.event +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md -Abstract base class for conversation implementations. +### class ActionEvent -This class defines the interface that all conversation implementations must follow. -Conversations manage the interaction between users and agents, handling message -exchange, execution control, and state management. +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) #### Properties -- `confirmation_policy_active`: bool -- `conversation_stats`: ConversationStats -- `id`: UUID -- `is_confirmation_mode_active`: bool - Check if confirmation mode is active. - Returns True if BOTH conditions are met: - 1. The conversation state has a security analyzer set (not None) - 2. The confirmation policy is active -- `state`: ConversationStateProtocol +- `action`: Action | None +- `critic_result`: CriticResult | None +- `llm_response_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str | None +- `responses_reasoning_item`: ReasoningItemModel | None +- `security_risk`: SecurityRisk +- `source`: Literal['agent', 'user', 'environment'] +- `summary`: str | None +- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] +- `thought`: Sequence[TextContent] +- `tool_call`: MessageToolCall +- `tool_call_id`: str +- `tool_name`: str +- `visualize`: Text + Return Rich Text representation of this action event. #### Methods -#### __init__() +#### to_llm_message() -Initialize the base conversation with span tracking. +Individual message - may be incomplete for multi-action batches -#### abstractmethod ask_agent() +### class AgentErrorEvent -Ask the agent a simple, stateless question and get a direct LLM response. +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. +Error triggered by the agent. -* Parameters: - `question` – A simple string question to ask the agent -* Returns: - A string response from the agent +Note: This event should not contain model “thought” or “reasoning_content”. It +represents an error produced by the agent/scaffold, not model output. -#### abstractmethod close() -#### static compose_callbacks() +#### Properties -Compose multiple callbacks into a single callback function. +- `error`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this agent error event. -* Parameters: - `callbacks` – An iterable of callback functions -* Returns: - A single callback function that calls all provided callbacks +#### Methods -#### abstractmethod condense() +#### to_llm_message() -Force condensation of the conversation history. +### class Condensation -This method uses the existing condensation request pattern to trigger -condensation. It adds a CondensationRequest event to the conversation -and forces the agent to take a single step to process it. +Bases: [`Event`](#class-event) -The condensation will be applied immediately and will modify the conversation -state by adding a condensation event to the history. +This action indicates a condensation of the conversation history is happening. -* Raises: - `ValueError` – If no condenser is configured or the condenser doesn’t - handle condensation requests. -#### abstractmethod execute_tool() +#### Properties -Execute a tool directly without going through the agent loop. +- `forgotten_event_ids`: list[[EventID](#class-eventid)] +- `has_summary_metadata`: bool + Checks if both summary and summary_offset are present. +- `llm_response_id`: [EventID](#class-eventid) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str | None +- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) + Generates a CondensationSummaryEvent. + Since summary events are not part of the main event store and are generated + dynamically, this property ensures the created event has a unique and consistent + ID based on the condensation event’s ID. + * Raises: + `ValueError` – If no summary is present. +- `summary_offset`: int | None +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. -This method allows executing tools before or outside of the normal -conversation.run() flow. It handles agent initialization automatically, -so tools can be executed before the first run() call. +#### Methods -Note: This method bypasses the agent loop, including confirmation -policies and security analyzer checks. Callers are responsible for -applying any safeguards before executing potentially destructive tools. +#### apply() -This is useful for: -- Pre-run setup operations (e.g., indexing repositories) -- Manual tool execution for environment setup -- Testing tool behavior outside the agent loop +Applies the condensation to a list of events. -* Parameters: - * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) - * `action` – The action to pass to the tool executor -* Returns: - The observation returned by the tool execution -* Raises: - * `KeyError` – If the tool is not found in the agent’s tools - * `NotImplementedError` – If the tool has no executor +This method removes events that are marked to be forgotten and returns a new +list of events. If the summary metadata is present (both summary and offset), +the corresponding CondensationSummaryEvent will be inserted at the specified +offset _after_ the forgotten events have been removed. -#### abstractmethod generate_title() +### class CondensationRequest -Generate a title for the conversation based on the first user message. +Bases: [`Event`](#class-event) -* Parameters: - * `llm` – Optional LLM to use for title generation. If not provided, - uses the agent’s LLM. - * `max_length` – Maximum length of the generated title. -* Returns: - A generated title for the conversation. -* Raises: - `ValueError` – If no user messages are found in the conversation. +This action is used to request a condensation of the conversation history. -#### static get_persistence_dir() -Get the persistence directory for the conversation. +#### Properties -* Parameters: - * `persistence_base_dir` – Base directory for persistence. Can be a string - path or Path object. - * `conversation_id` – Unique conversation ID. -* Returns: - String path to the conversation-specific persistence directory. - Always returns a normalized string path even if a Path was provided. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. -#### abstractmethod pause() +#### Methods -#### abstractmethod reject_pending_actions() +#### action -#### abstractmethod run() +The action type, namely ActionType.CONDENSATION_REQUEST. -Execute the agent to process messages and perform actions. +* Type: + str -This method runs the agent until it finishes processing the current -message or reaches the maximum iteration limit. +### class CondensationSummaryEvent -#### abstractmethod send_message() +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -Send a message to the agent. +This event represents a summary generated by a condenser. -* Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. -#### abstractmethod set_confirmation_policy() +#### Properties -Set the confirmation policy for the conversation. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str + The summary text. -#### abstractmethod set_security_analyzer() +#### Methods -Set the security analyzer for the conversation. +#### to_llm_message() -#### abstractmethod update_secrets() +### class ConversationStateUpdateEvent -### class Conversation +Bases: [`Event`](#class-event) -### class Conversation +Event that contains conversation state updates. -Bases: `object` +This event is sent via websocket whenever the conversation state changes, +allowing remote clients to stay in sync without making REST API calls. -Factory class for creating conversation instances with OpenHands agents. +All fields are serialized versions of the corresponding ConversationState fields +to ensure compatibility with websocket transmission. -This factory automatically creates either a LocalConversation or RemoteConversation -based on the workspace type provided. LocalConversation runs the agent locally, -while RemoteConversation connects to a remote agent server. -* Returns: - LocalConversation if workspace is local, RemoteConversation if workspace - is remote. +#### Properties -#### Example +- `key`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `value`: Any -```pycon ->>> from openhands.sdk import LLM, Agent, Conversation ->>> from openhands.sdk.plugin import PluginSource ->>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) ->>> agent = Agent(llm=llm, tools=[]) ->>> conversation = Conversation( -... agent=agent, -... workspace="./workspace", -... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], -... ) ->>> conversation.send_message("Hello!") ->>> conversation.run() -``` +#### Methods -### class ConversationExecutionStatus +#### classmethod from_conversation_state() -Bases: `str`, `Enum` +Create a state update event from a ConversationState object. -Enum representing the current execution state of the conversation. +This creates an event containing a snapshot of important state fields. -#### Methods +* Parameters: + * `state` – The ConversationState to serialize + * `conversation_id` – The conversation ID for the event +* Returns: + A ConversationStateUpdateEvent with serialized state data -#### DELETING = 'deleting' +#### classmethod validate_key() -#### ERROR = 'error' +#### classmethod validate_value() -#### FINISHED = 'finished' +### class Event -#### IDLE = 'idle' +Bases: `DiscriminatedUnionMixin`, `ABC` -#### PAUSED = 'paused' +Base class for all events. -#### RUNNING = 'running' -#### STUCK = 'stuck' +#### Properties -#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' +- `id`: str +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `timestamp`: str +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. +### class LLMCompletionLogEvent -#### is_terminal() +Bases: [`Event`](#class-event) -Check if this status represents a terminal state. +Event containing LLM completion log data. -Terminal states indicate the run has completed and the agent is no longer -actively processing. These are: FINISHED, ERROR, STUCK. +When an LLM is configured with log_completions=True in a remote conversation, +this event streams the completion log data back to the client through WebSocket +instead of writing it to a file inside the Docker container. -Note: IDLE is NOT a terminal state - it’s the initial state of a conversation -before any run has started. Including IDLE would cause false positives when -the WebSocket delivers the initial state update during connection. -* Returns: - True if this is a terminal status, False otherwise. +#### Properties -### class ConversationState +- `filename`: str +- `log_data`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_name`: str +- `source`: Literal['agent', 'user', 'environment'] +- `usage_id`: str +### class LLMConvertibleEvent -Bases: `OpenHandsModel` +Bases: [`Event`](#class-event), `ABC` + +Base class for events that can be converted to LLM messages. #### Properties -- `activated_knowledge_skills`: list[str] -- `agent`: AgentBase -- `agent_state`: dict[str, Any] -- `blocked_actions`: dict[str, str] -- `blocked_messages`: dict[str, str] -- `confirmation_policy`: ConfirmationPolicyBase -- `env_observation_persistence_dir`: str | None - Directory for persisting environment observation files. -- `events`: [EventLog](#class-eventlog) -- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) -- `id`: UUID -- `max_iterations`: int -- `persistence_dir`: str | None -- `secret_registry`: [SecretRegistry](#class-secretregistry) -- `security_analyzer`: SecurityAnalyzerBase | None -- `stats`: ConversationStats -- `stuck_detection`: bool -- `workspace`: BaseWorkspace +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. #### Methods -#### acquire() - -Acquire the lock. - -* Parameters: - * `blocking` – If True, block until lock is acquired. If False, return - immediately. - * `timeout` – Maximum time to wait for lock (ignored if blocking=False). - -1 means wait indefinitely. -* Returns: - True if lock was acquired, False otherwise. - -#### block_action() - -Persistently record a hook-blocked action. - -#### block_message() - -Persistently record a hook-blocked user message. - -#### classmethod create() - -Create a new conversation state or resume from persistence. - -This factory method handles both new conversation creation and resumption -from persisted state. - -New conversation: -The provided Agent is used directly. Pydantic validation happens via the -cls() constructor. - -Restored conversation: -The provided Agent is validated against the persisted agent using -agent.load(). Tools must match (they may have been used in conversation -history), but all other configuration can be freely changed: LLM, -agent_context, condenser, system prompts, etc. +#### static events_to_messages() -* Parameters: - * `id` – Unique conversation identifier - * `agent` – The Agent to use (tools must match persisted on restore) - * `workspace` – Working directory for agent operations - * `persistence_dir` – Directory for persisting state and events - * `max_iterations` – Maximum iterations per run - * `stuck_detection` – Whether to enable stuck detection - * `cipher` – Optional cipher for encrypting/decrypting secrets in - persisted state. If provided, secrets are encrypted when - saving and decrypted when loading. If not provided, secrets - are redacted (lost) on serialization. -* Returns: - ConversationState ready for use -* Raises: - * `ValueError` – If conversation ID or tools mismatch on restore - * `ValidationError` – If agent or other fields fail Pydantic validation +Convert event stream to LLM message stream, handling multi-action batches -#### static get_unmatched_actions() +#### abstractmethod to_llm_message() -Find actions in the event history that don’t have matching observations. +### class MessageEvent -This method identifies ActionEvents that don’t have corresponding -ObservationEvents or UserRejectObservations, which typically indicates -actions that are pending confirmation or execution. +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -* Parameters: - `events` – List of events to search through -* Returns: - List of ActionEvent objects that don’t have corresponding observations, - in chronological order +Message from either agent or user. -#### locked() +This is originally the “MessageAction”, but it suppose not to be tool call. -Return True if the lock is currently held by any thread. -#### model_config = (configuration object) +#### Properties -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `activated_skills`: list[str] +- `critic_result`: CriticResult | None +- `extended_content`: list[TextContent] +- `llm_message`: Message +- `llm_response_id`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str +- `sender`: str | None +- `source`: Literal['agent', 'user', 'environment'] +- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] + Return the Anthropic thinking blocks from the LLM message. +- `visualize`: Text + Return Rich Text representation of this message event. -#### model_post_init() +#### Methods -This function is meant to behave like a BaseModel method to initialise private attributes. +#### to_llm_message() -It takes context as an argument since that’s what pydantic-core passes when calling it. +### class ObservationBaseEvent -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -#### owned() +Base class for anything as a response to a tool call. -Return True if the lock is currently held by the calling thread. +Examples include tool execution, error, user reject. -#### pop_blocked_action() -Remove and return a hook-blocked action reason, if present. +#### Properties -#### pop_blocked_message() +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `tool_call_id`: str +- `tool_name`: str +### class ObservationEvent -Remove and return a hook-blocked message reason, if present. +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -#### release() -Release the lock. +#### Properties -* Raises: - `RuntimeError` – If the current thread doesn’t own the lock. +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `observation`: Observation +- `visualize`: Text + Return Rich Text representation of this observation event. -#### set_on_state_change() +#### Methods -Set a callback to be called when state changes. +#### to_llm_message() -* Parameters: - `callback` – A function that takes an Event (ConversationStateUpdateEvent) - or None to remove the callback +### class PauseEvent -### class ConversationVisualizerBase +Bases: [`Event`](#class-event) -Bases: `ABC` +Event indicating that the agent execution was paused by user request. -Base class for conversation visualizers. -This abstract base class defines the interface that all conversation visualizers -must implement. Visualizers can be created before the Conversation is initialized -and will be configured with the conversation state automatically. +#### Properties -The typical usage pattern: -1. Create a visualizer instance: +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this pause event. +### class SystemPromptEvent - viz = MyVisualizer() -1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) -2. Conversation automatically calls viz.initialize(state) to attach the state +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -You can also pass the uninstantiated class if you don’t need extra args -: for initialization, and Conversation will create it: - : conv = Conversation(agent, visualizer=MyVisualizer) +System prompt added by the agent. -Conversation will then calls MyVisualizer() followed by initialize(state) +The system prompt can optionally include dynamic context that varies between +conversations. When `dynamic_context` is provided, it is included as a +second content block in the same system message. Cache markers are NOT +applied here - they are applied by `LLM._apply_prompt_caching()` when +caching is enabled, ensuring provider-specific cache control is only added +when appropriate. #### Properties -- `conversation_stats`: ConversationStats | None - Get conversation stats from the state. +- `dynamic_context`: TextContent | None +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `system_prompt`: TextContent +- `tools`: list[ToolDefinition] +- `visualize`: Text + Return Rich Text representation of this system prompt event. #### Methods -#### __init__() - -Initialize the visualizer base. - -#### create_sub_visualizer() - -Create a visualizer for a sub-agent during delegation. - -Override this method to support sub-agent visualization in multi-agent -delegation scenarios. The sub-visualizer will be used to display events -from the spawned sub-agent. - -By default, returns None which means sub-agents will not have visualization. -Subclasses that support delegation (like DelegationVisualizer) should -override this method to create appropriate sub-visualizers. - -* Parameters: - `agent_id` – The identifier of the sub-agent being spawned -* Returns: - A visualizer instance for the sub-agent, or None if sub-agent - visualization is not supported +#### system_prompt -#### final initialize() +The static system prompt text (cacheable across conversations) -Initialize the visualizer with conversation state. +* Type: + openhands.sdk.llm.message.TextContent -This method is called by Conversation after the state is created, -allowing the visualizer to access conversation stats and other -state information. +#### tools -Subclasses should not override this method, to ensure the state is set. +List of available tools -* Parameters: - `state` – The conversation state object +* Type: + list[openhands.sdk.tool.tool.ToolDefinition] -#### abstractmethod on_event() +#### dynamic_context -Handle a conversation event. +Optional per-conversation context (hosts, repo info, etc.) +Sent as a second TextContent block inside the system message. -This method is called for each event in the conversation and should -implement the visualization logic. +* Type: + openhands.sdk.llm.message.TextContent | None -* Parameters: - `event` – The event to visualize +#### to_llm_message() -### class DefaultConversationVisualizer +Convert to a single system LLM message. -Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) +When `dynamic_context` is present the message contains two content +blocks: the static prompt followed by the dynamic context. Cache markers +are NOT applied here - they are applied by `LLM._apply_prompt_caching()` +when caching is enabled, which marks the static block (index 0) and leaves +the dynamic block (index 1) unmarked for cross-conversation cache sharing. -Handles visualization of conversation events with Rich formatting. +### class TokenEvent -Provides Rich-formatted output with semantic dividers and complete content display. +Bases: [`Event`](#class-event) -#### Methods +Event from VLLM representing token IDs used in LLM interaction. -#### __init__() -Initialize the visualizer. +#### Properties -* Parameters: - * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles - for highlighting keywords in the visualizer. - For example: (configuration object) - * `skip_user_messages` – If True, skip displaying user messages. Useful for - scenarios where user input is not relevant to show. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `prompt_token_ids`: list[int] +- `response_token_ids`: list[int] +- `source`: Literal['agent', 'user', 'environment'] +### class UserRejectObservation -#### on_event() +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -Main event handler that displays events with Rich formatting. +Observation when an action is rejected by user or hook. -### class EventLog +This event is emitted when: +- User rejects an action during confirmation mode (rejection_source=”user”) +- A PreToolUse hook blocks an action (rejection_source=”hook”) -Bases: [`EventsListBase`](#class-eventslistbase) -Persistent event log with locking for concurrent writes. +#### Properties -This class provides thread-safe and process-safe event storage using -the FileStore’s locking mechanism. Events are persisted to disk and -can be accessed by index or event ID. +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `rejection_reason`: str +- `rejection_source`: Literal['user', 'hook'] +- `visualize`: Text + Return Rich Text representation of this user rejection event. #### Methods -#### NOTE -For LocalFileStore, file locking via flock() does NOT work reliably -on NFS mounts or network filesystems. Users deploying with shared -storage should use alternative coordination mechanisms. - -#### __init__() - -#### append() - -Append an event with locking for thread/process safety. - -* Raises: - * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. - * `ValueError` – If an event with the same ID already exists. - -#### get_id() - -Return the event_id for a given index. - -#### get_index() - -Return the integer index for a given event_id. - -### class EventsListBase - -Bases: `Sequence`[`Event`], `ABC` - -Abstract base class for event lists that can be appended to. - -This provides a common interface for both local EventLog and remote -RemoteEventsList implementations, avoiding circular imports in protocols. - -#### Methods +#### to_llm_message() -#### abstractmethod append() +### openhands.sdk.llm +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md -Add a new event to the list. +### class CredentialStore -### class LocalConversation +Bases: `object` -Bases: [`BaseConversation`](#class-baseconversation) +Store and retrieve OAuth credentials for LLM providers. #### Properties -- `agent`: AgentBase -- `delete_on_close`: bool = True -- `id`: UUID - Get the unique ID of the conversation. -- `llm_registry`: LLMRegistry -- `max_iteration_per_run`: int -- `resolved_plugins`: list[ResolvedPluginSource] | None - Get the resolved plugin sources after plugins are loaded. - Returns None if plugins haven’t been loaded yet, or if no plugins - were specified. Use this for persistence to ensure conversation - resume uses the exact same plugin versions. -- `state`: [ConversationState](#class-conversationstate) - Get the conversation state. - It returns a protocol that has a subset of ConversationState methods - and properties. We will have the ability to access the same properties - of ConversationState on a remote conversation object. - But we won’t be able to access methods that mutate the state. -- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None - Get the stuck detector instance if enabled. -- `workspace`: LocalWorkspace +- `credentials_dir`: Path + Get the credentials directory, creating it if necessary. #### Methods #### __init__() -Initialize the conversation. +Initialize the credential store. * Parameters: - * `agent` – The agent to use for the conversation. - * `workspace` – Working directory for agent operations and tool execution. - Can be a string path, Path object, or LocalWorkspace instance. - * `plugins` – Optional list of plugins to load. Each plugin is specified - with a source (github:owner/repo, git URL, or local path), - optional ref (branch/tag/commit), and optional repo_path for - monorepos. Plugins are loaded in order with these merge - semantics: skills override by name (last wins), MCP config - override by key (last wins), hooks concatenate (all run). - * `persistence_dir` – Directory for persisting conversation state and events. - Can be a string path or Path object. - * `conversation_id` – Optional ID for the conversation. If provided, will - be used to identify the conversation. The user might want to - suffix their persistent filestore with this ID. - * `callbacks` – Optional list of callback functions to handle events - * `token_callbacks` – Optional list of callbacks invoked for streaming deltas - * `hook_config` – Optional hook configuration to auto-wire session hooks. - If plugins are loaded, their hooks are combined with this config. - * `max_iteration_per_run` – Maximum number of iterations per run - * `visualizer` – - - Visualization configuration. Can be: - - ConversationVisualizerBase subclass: Class to instantiate - > (default: ConversationVisualizer) - - ConversationVisualizerBase instance: Use custom visualizer - - None: No visualization - * `stuck_detection` – Whether to enable stuck detection - * `stuck_detection_thresholds` – Optional configuration for stuck detection - thresholds. Can be a StuckDetectionThresholds instance or - a dict with keys: ‘action_observation’, ‘action_error’, - ‘monologue’, ‘alternating_pattern’. Values are integers - representing the number of repetitions before triggering. - * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted - state. If provided, secrets are encrypted when saving and - decrypted when loading. If not provided, secrets are redacted - (lost) on serialization. - -#### ask_agent() + `credentials_dir` – Optional custom directory for storing credentials. + Defaults to ~/.local/share/openhands/auth/ -Ask the agent a simple, stateless question and get a direct LLM response. +#### delete() -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. +Delete stored credentials for a vendor. * Parameters: - `question` – A simple string question to ask the agent + `vendor` – The vendor/provider name * Returns: - A string response from the agent - -#### close() - -Close the conversation and clean up all tool executors. - -#### condense() - -Synchronously force condense the conversation history. - -If the agent is currently running, condense() will wait for the -ongoing step to finish before proceeding. - -Raises ValueError if no compatible condenser exists. - -#### property conversation_stats + True if credentials were deleted, False if they didn’t exist -#### execute_tool() +#### get() -Execute a tool directly without going through the agent loop. +Get stored credentials for a vendor. -This method allows executing tools before or outside of the normal -conversation.run() flow. It handles agent initialization automatically, -so tools can be executed before the first run() call. +* Parameters: + `vendor` – The vendor/provider name (e.g., ‘openai’) +* Returns: + OAuthCredentials if found and valid, None otherwise -Note: This method bypasses the agent loop, including confirmation -policies and security analyzer checks. Callers are responsible for -applying any safeguards before executing potentially destructive tools. +#### save() -This is useful for: -- Pre-run setup operations (e.g., indexing repositories) -- Manual tool execution for environment setup -- Testing tool behavior outside the agent loop +Save credentials for a vendor. * Parameters: - * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) - * `action` – The action to pass to the tool executor -* Returns: - The observation returned by the tool execution -* Raises: - * `KeyError` – If the tool is not found in the agent’s tools - * `NotImplementedError` – If the tool has no executor + `credentials` – The OAuth credentials to save -#### generate_title() +#### update_tokens() -Generate a title for the conversation based on the first user message. +Update tokens for an existing credential. * Parameters: - * `llm` – Optional LLM to use for title generation. If not provided, - uses self.agent.llm. - * `max_length` – Maximum length of the generated title. + * `vendor` – The vendor/provider name + * `access_token` – New access token + * `refresh_token` – New refresh token (if provided) + * `expires_in` – Token expiry in seconds * Returns: - A generated title for the conversation. -* Raises: - `ValueError` – If no user messages are found in the conversation. + Updated credentials, or None if no existing credentials found -#### pause() +### class ImageContent -Pause agent execution. +Bases: `BaseContent` -This method can be called from any thread to request that the agent -pause execution. The pause will take effect at the next iteration -of the run loop (between agent steps). -Note: If called during an LLM completion, the pause will not take -effect until the current LLM call completes. +#### Properties -#### reject_pending_actions() +- `image_urls`: list[str] +- `type`: Literal['image'] -Reject all pending actions from the agent. +#### Methods -This is a non-invasive method to reject actions between run() calls. -Also clears the agent_waiting_for_confirmation flag. +#### model_config = (configuration object) -#### run() +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Runs the conversation until the agent finishes. +#### to_llm_dict() -In confirmation mode: -- First call: creates actions but doesn’t execute them, stops and waits -- Second call: executes pending actions (implicit confirmation) +Convert to LLM API format. -In normal mode: -- Creates and executes actions immediately +### class LLM -Can be paused between steps +Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` -#### send_message() +Language model interface for OpenHands agents. -Send a message to the agent. +The LLM class provides a unified interface for interacting with various +language models through the litellm library. It handles model configuration, +API authentication, +retry logic, and tool calling capabilities. -* Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. +#### Example -#### set_confirmation_policy() +```pycon +>>> from openhands.sdk import LLM +>>> from pydantic import SecretStr +>>> llm = LLM( +... model="claude-sonnet-4-20250514", +... api_key=SecretStr("your-api-key"), +... usage_id="my-agent" +... ) +>>> # Use with agent or conversation +``` -Set the confirmation policy and store it in conversation state. -#### set_security_analyzer() +#### Properties -Set the security analyzer for the conversation. +- `api_key`: str | SecretStr | None +- `api_version`: str | None +- `aws_access_key_id`: str | SecretStr | None +- `aws_region_name`: str | None +- `aws_secret_access_key`: str | SecretStr | None +- `base_url`: str | None +- `caching_prompt`: bool +- `custom_tokenizer`: str | None +- `disable_stop_word`: bool | None +- `disable_vision`: bool | None +- `drop_params`: bool +- `enable_encrypted_reasoning`: bool +- `extended_thinking_budget`: int | None +- `extra_headers`: dict[str, str] | None +- `force_string_serializer`: bool | None +- `input_cost_per_token`: float | None +- `is_subscription`: bool + Check if this LLM uses subscription-based authentication. + Returns True when the LLM was created via LLM.subscription_login(), + which uses the ChatGPT subscription Codex backend rather than the + standard OpenAI API. + * Returns: + True if using subscription-based transport, False otherwise. + * Return type: + bool +- `litellm_extra_body`: dict[str, Any] +- `log_completions`: bool +- `log_completions_folder`: str +- `max_input_tokens`: int | None +- `max_message_chars`: int +- `max_output_tokens`: int | None +- `metrics`: [Metrics](#class-metrics) + Get usage metrics for this LLM instance. + * Returns: + Metrics object containing token usage, costs, and other statistics. +- `model`: str +- `model_canonical_name`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_info`: dict | None + Returns the model info dictionary. +- `modify_params`: bool +- `native_tool_calling`: bool +- `num_retries`: int +- `ollama_base_url`: str | None +- `openrouter_app_name`: str +- `openrouter_site_url`: str +- `output_cost_per_token`: float | None +- `prompt_cache_retention`: str | None +- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None +- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None +- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] +- `retry_max_wait`: int +- `retry_min_wait`: int +- `retry_multiplier`: float +- `safety_settings`: list[dict[str, str]] | None +- `seed`: int | None +- `stream`: bool +- `telemetry`: Telemetry + Get telemetry handler for this LLM instance. + * Returns: + Telemetry object for managing logging and metrics callbacks. +- `temperature`: float | None +- `timeout`: int | None +- `top_k`: float | None +- `top_p`: float | None +- `usage_id`: str -#### update_secrets() +#### Methods -Add secrets to the conversation. +#### completion() + +Generate a completion from the language model. + +This is the method for getting responses from the model via Completion API. +It handles message formatting, tool calling, and response processing. * Parameters: - `secrets` – Dictionary mapping secret keys to values or no-arg callables. - SecretValue = str | Callable[[], str]. Callables are invoked lazily - when a command references the secret key. + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API +* Returns: + LLMResponse containing the model’s response and metadata. -### class RemoteConversation +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -Bases: [`BaseConversation`](#class-baseconversation) +* Raises: + `ValueError` – If streaming is requested (not supported). +#### format_messages_for_llm() -#### Properties +Formats Message objects for LLM consumption. -- `agent`: AgentBase -- `delete_on_close`: bool = False -- `id`: UUID -- `max_iteration_per_run`: int -- `state`: RemoteState - Access to remote conversation state. -- `workspace`: RemoteWorkspace +#### format_messages_for_responses() -#### Methods +Prepare (instructions, input[]) for the OpenAI Responses API. -#### __init__() +- Skips prompt caching flags and string serializer concerns +- Uses Message.to_responses_value to get either instructions (system) + or input items (others) +- Concatenates system instructions into a single instructions string +- For subscription mode, system prompts are prepended to user content -Remote conversation proxy that talks to an agent server. +#### get_token_count() -* Parameters: - * `agent` – Agent configuration (will be sent to the server) - * `workspace` – The working directory for agent operations and tool execution. - * `plugins` – Optional list of plugins to load on the server. Each plugin - is a PluginSource specifying source, ref, and repo_path. - * `conversation_id` – Optional existing conversation id to attach to - * `callbacks` – Optional callbacks to receive events (not yet streamed) - * `max_iteration_per_run` – Max iterations configured on server - * `stuck_detection` – Whether to enable stuck detection on server - * `stuck_detection_thresholds` – Optional configuration for stuck detection - thresholds. Can be a StuckDetectionThresholds instance or - a dict with keys: ‘action_observation’, ‘action_error’, - ‘monologue’, ‘alternating_pattern’. Values are integers - representing the number of repetitions before triggering. - * `hook_config` – Optional hook configuration for session hooks - * `visualizer` – +#### is_caching_prompt_active() - Visualization configuration. Can be: - - ConversationVisualizerBase subclass: Class to instantiate - > (default: ConversationVisualizer) - - ConversationVisualizerBase instance: Use custom visualizer - - None: No visualization - * `secrets` – Optional secrets to initialize the conversation with +Check if prompt caching is supported and enabled for current model. -#### ask_agent() +* Returns: + True if prompt caching is supported and enabled for the given + : model. +* Return type: + boolean -Ask the agent a simple, stateless question and get a direct LLM response. +#### classmethod load_from_env() -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. +#### classmethod load_from_json() + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. * Parameters: - `question` – A simple string question to ask the agent -* Returns: - A string response from the agent + * `self` – The BaseModel instance. + * `context` – The context. -#### close() +#### reset_metrics() -Close the conversation and clean up resources. +Reset metrics and telemetry to fresh instances. -Note: We don’t close self._client here because it’s shared with the workspace. -The workspace owns the client and will close it during its own cleanup. -Closing it here would prevent the workspace from making cleanup API calls. +This is used by the LLMRegistry to ensure each registered LLM has +independent metrics, preventing metrics from being shared between +LLMs that were created via model_copy(). -#### condense() +When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), +Pydantic’s model_copy() does a shallow copy of private attributes by default, +causing the original and copied LLM to share the same Metrics object. +This method allows the registry to fix this by resetting metrics to None, +which will be lazily recreated when accessed. -Force condensation of the conversation history. +#### responses() -This method sends a condensation request to the remote agent server. -The server will use the existing condensation request pattern to trigger -condensation if a condenser is configured and handles condensation requests. +Alternative invocation path using OpenAI Responses API via LiteLLM. -The condensation will be applied on the server side and will modify the -conversation state by adding a condensation event to the history. +Maps Message[] -> (instructions, input[]) and returns LLMResponse. -* Raises: - `HTTPError` – If the server returns an error (e.g., no condenser configured). +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `include` – Optional list of fields to include in response + * `store` – Whether to store the conversation + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming deltas + kwargs* – Additional arguments passed to the API -#### property conversation_stats +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -#### execute_tool() +#### restore_metrics() -Execute a tool directly without going through the agent loop. +#### classmethod subscription_login() -Note: This method is not yet supported for RemoteConversation. -Tool execution for remote conversations happens on the server side -during the normal agent loop. +Authenticate with a subscription service and return an LLM instance. -* Parameters: - * `tool_name` – The name of the tool to execute - * `action` – The action to pass to the tool executor -* Raises: - `NotImplementedError` – Always, as this feature is not yet supported - for remote conversations. +This method provides subscription-based access to LLM models that are +available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather +than API credits. It handles credential caching, token refresh, and +the OAuth login flow. -#### generate_title() +Currently supported vendors: +- “openai”: ChatGPT Plus/Pro subscription for Codex models -Generate a title for the conversation based on the first user message. +Supported OpenAI models: +- gpt-5.1-codex-max +- gpt-5.1-codex-mini +- gpt-5.2 +- gpt-5.2-codex * Parameters: - * `llm` – Optional LLM to use for title generation. If provided, its usage_id - will be sent to the server. If not provided, uses the agent’s LLM. - * `max_length` – Maximum length of the generated title. + * `vendor` – The vendor/provider. Currently only “openai” is supported. + * `model` – The model to use. Must be supported by the vendor’s + subscription service. + * `force_login` – If True, always perform a fresh login even if valid + credentials exist. + * `open_browser` – Whether to automatically open the browser for the + OAuth login flow. + llm_kwargs* – Additional arguments to pass to the LLM constructor. * Returns: - A generated title for the conversation. + An LLM instance configured for subscription-based access. +* Raises: + * `ValueError` – If the vendor or model is not supported. + * `RuntimeError` – If authentication fails. -#### pause() +#### uses_responses_api() -#### reject_pending_actions() +Whether this model uses the OpenAI Responses API path. -#### run() +#### vision_is_active() -Trigger a run on the server. +### class LLMProfileStore + +Bases: `object` + +Standalone utility for persisting LLM configurations. + +#### Methods + +#### __init__() + +Initialize the profile store. * Parameters: - * `blocking` – If True (default), wait for the run to complete by polling - the server. If False, return immediately after triggering the run. - * `poll_interval` – Time in seconds between status polls (only used when - blocking=True). Default is 1.0 second. - * `timeout` – Maximum time in seconds to wait for the run to complete - (only used when blocking=True). Default is 3600 seconds. -* Raises: - `ConversationRunError` – If the run fails or times out. + `base_dir` – Path to the directory where the profiles are stored. + If None is provided, the default directory is used, i.e., + ~/.openhands/profiles. -#### send_message() +#### delete() -Send a message to the agent. +Delete an existing profile. + +If the profile is not present in the profile directory, it does nothing. * Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. + `name` – Name of the profile to delete. +* Raises: + `TimeoutError` – If the lock cannot be acquired. -#### set_confirmation_policy() +#### list() -Set the confirmation policy for the conversation. +Returns a list of all profiles stored. -#### set_security_analyzer() +* Returns: + List of profile filenames (e.g., [“default.json”, “gpt4.json”]). -Set the security analyzer for the remote conversation. +#### load() -#### property stuck_detector +Load an LLM instance from the given profile name. -Stuck detector for compatibility. -Not implemented for remote conversations. +* Parameters: + `name` – Name of the profile to load. +* Returns: + An LLM instance constructed from the profile configuration. +* Raises: + * `FileNotFoundError` – If the profile name does not exist. + * `ValueError` – If the profile file is corrupted or invalid. + * `TimeoutError` – If the lock cannot be acquired. -#### update_secrets() +#### save() -### class SecretRegistry +Save a profile to the profile directory. -Bases: `OpenHandsModel` +Note that if a profile name already exists, it will be overwritten. -Manages secrets and injects them into bash commands when needed. +* Parameters: + * `name` – Name of the profile to save. + * `llm` – LLM instance to save + * `include_secrets` – Whether to include the profile secrets. Defaults to False. +* Raises: + `TimeoutError` – If the lock cannot be acquired. -The secret registry stores a mapping of secret keys to SecretSources -that retrieve the actual secret values. When a bash command is about to be -executed, it scans the command for any secret keys and injects the corresponding -environment variables. +### class LLMRegistry -Secret sources will redact / encrypt their sensitive values as appropriate when -serializing, depending on the content of the context. If a context is present -and contains a ‘cipher’ object, this is used for encryption. If it contains a -boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. -Otherwise secrets are redacted. +Bases: `object` -Additionally, it tracks the latest exported values to enable consistent masking -even when callable secrets fail on subsequent calls. +A minimal LLM registry for managing LLM instances by usage ID. + +This registry provides a simple way to manage multiple LLM instances, +avoiding the need to recreate LLMs with the same configuration. + +The registry also ensures that each registered LLM has independent metrics, +preventing metrics from being shared between LLMs that were created via +model_copy(). This is important for scenarios like creating a condenser LLM +from an agent LLM, where each should track its own usage independently. #### Properties -- `secret_sources`: dict[str, SecretSource] +- `registry_id`: str +- `retry_listener`: Callable[[int, int], None] | None +- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None +- `usage_to_llm`: MappingProxyType + Access the internal usage-ID-to-LLM mapping (read-only view). #### Methods -#### find_secrets_in_text() +#### __init__() -Find all secret keys mentioned in the given text. +Initialize the LLM registry. * Parameters: - `text` – The text to search for secret keys -* Returns: - Set of secret keys found in the text + `retry_listener` – Optional callback for retry events. -#### get_secrets_as_env_vars() +#### add() -Get secrets that should be exported as environment variables for a command. +Add an LLM instance to the registry. -* Parameters: - `command` – The bash command to check for secret references -* Returns: - Dictionary of environment variables to export (key -> value) +This method ensures that the LLM has independent metrics before +registering it. If the LLM’s metrics are shared with another +registered LLM (e.g., due to model_copy()), fresh metrics will +be created automatically. -#### mask_secrets_in_output() +* Parameters: + `llm` – The LLM instance to register. +* Raises: + `ValueError` – If llm.usage_id already exists in the registry. -Mask secret values in the given text. +#### get() -This method uses both the current exported values and attempts to get -fresh values from callables to ensure comprehensive masking. +Get an LLM instance from the registry. * Parameters: - `text` – The text to mask secrets in + `usage_id` – Unique identifier for the LLM usage slot. * Returns: - Text with secret values replaced by `` - -#### model_config = (configuration object) + The LLM instance. +* Raises: + `KeyError` – If usage_id is not found in the registry. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +#### list_usage_ids() -#### model_post_init() +List all registered usage IDs. -This function is meant to behave like a BaseModel method to initialise private attributes. +#### notify() -It takes context as an argument since that’s what pydantic-core passes when calling it. +Notify subscribers of registry events. * Parameters: - * `self` – The BaseModel instance. - * `context` – The context. + `event` – The registry event to notify about. -#### update_secrets() +#### subscribe() -Add or update secrets in the manager. +Subscribe to registry events. * Parameters: - `secrets` – Dictionary mapping secret keys to either string values - or callable functions that return string values + `callback` – Function to call when LLMs are created or updated. -### class StuckDetector +### class LLMResponse -Bases: `object` +Bases: `BaseModel` -Detects when an agent is stuck in repetitive or unproductive patterns. +Result of an LLM completion request. -This detector analyzes the conversation history to identify various stuck patterns: -1. Repeating action-observation cycles -2. Repeating action-error cycles -3. Agent monologue (repeated messages without user input) -4. Repeating alternating action-observation patterns -5. Context window errors indicating memory issues +This type provides a clean interface for LLM completion results, exposing +only OpenHands-native types to consumers while preserving access to the +raw LiteLLM response for internal use. #### Properties -- `action_error_threshold`: int -- `action_observation_threshold`: int -- `alternating_pattern_threshold`: int -- `monologue_threshold`: int -- `state`: [ConversationState](#class-conversationstate) -- `thresholds`: StuckDetectionThresholds +- `id`: str + Get the response ID from the underlying LLM response. + This property provides a clean interface to access the response ID, + supporting both completion mode (ModelResponse) and response API modes + (ResponsesAPIResponse). + * Returns: + The response ID from the LLM response +- `message`: [Message](#class-message) +- `metrics`: [MetricsSnapshot](#class-metricssnapshot) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `raw_response`: ModelResponse | ResponsesAPIResponse #### Methods -#### __init__() +#### message -#### is_stuck() +The completion message converted to OpenHands Message type -Check if the agent is currently stuck. +* Type: + [openhands.sdk.llm.message.Message](#class-message) -Note: To avoid materializing potentially large file-backed event histories, -only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. -If a user message exists within this window, only events after it are checked. -Otherwise, all events in the window are analyzed. +#### metrics -#### __init__() +Snapshot of metrics from the completion request +* Type: + [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) -# openhands.sdk.event -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event +#### raw_response -### class ActionEvent +The original LiteLLM response (ModelResponse or +ResponsesAPIResponse) for internal use -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +* Type: + litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse + +### class Message + +Bases: `BaseModel` #### Properties -- `action`: Action | None -- `critic_result`: CriticResult | None -- `llm_response_id`: str +- `contains_image`: bool +- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] - `model_config`: = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str | None - `reasoning_content`: str | None -- `responses_reasoning_item`: ReasoningItemModel | None -- `security_risk`: SecurityRisk -- `source`: Literal['agent', 'user', 'environment'] -- `summary`: str | None -- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] -- `thought`: Sequence[TextContent] -- `tool_call`: MessageToolCall -- `tool_call_id`: str -- `tool_name`: str -- `visualize`: Text - Return Rich Text representation of this action event. +- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None +- `role`: Literal['user', 'system', 'assistant', 'tool'] +- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] +- `tool_call_id`: str | None +- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None #### Methods -#### to_llm_message() +#### classmethod from_llm_chat_message() -Individual message - may be incomplete for multi-action batches - -### class AgentErrorEvent - -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) - -Error triggered by the agent. - -Note: This event should not contain model “thought” or “reasoning_content”. It -represents an error produced by the agent/scaffold, not model output. +Convert a LiteLLMMessage (Chat Completions) to our Message class. +Provider-agnostic mapping for reasoning: +- Prefer message.reasoning_content if present (LiteLLM normalized field) +- Extract thinking_blocks from content array (Anthropic-specific) -#### Properties +#### classmethod from_llm_responses_output() -- `error`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `visualize`: Text - Return Rich Text representation of this agent error event. +Convert OpenAI Responses API output items into a single assistant Message. -#### Methods +Policy (non-stream): +- Collect assistant text by concatenating output_text parts from message items +- Normalize function_call items to MessageToolCall list -#### to_llm_message() +#### to_chat_dict() -### class Condensation +Serialize message for OpenAI Chat Completions. -Bases: [`Event`](#class-event) +* Parameters: + * `cache_enabled` – Whether prompt caching is active. + * `vision_enabled` – Whether vision/image processing is enabled. + * `function_calling_enabled` – Whether native function calling is enabled. + * `force_string_serializer` – Force string serializer instead of list format. + * `send_reasoning_content` – Whether to include reasoning_content in output. -This action indicates a condensation of the conversation history is happening. +Chooses the appropriate content serializer and then injects threading keys: +- Assistant tool call turn: role == “assistant” and self.tool_calls +- Tool result turn: role == “tool” and self.tool_call_id (with name) +#### to_responses_dict() -#### Properties +Serialize message for OpenAI Responses (input parameter). -- `forgotten_event_ids`: list[[EventID](#class-eventid)] -- `has_summary_metadata`: bool - Checks if both summary and summary_offset are present. -- `llm_response_id`: [EventID](#class-eventid) -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `summary`: str | None -- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) - Generates a CondensationSummaryEvent. - Since summary events are not part of the main event store and are generated - dynamically, this property ensures the created event has a unique and consistent - ID based on the condensation event’s ID. - * Raises: - `ValueError` – If no summary is present. -- `summary_offset`: int | None -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. +Produces a list of “input” items for the Responses API: +- system: returns [], system content is expected in ‘instructions’ +- user: one ‘message’ item with content parts -> input_text / input_image +(when vision enabled) +- assistant: emits prior assistant content as input_text, +and function_call items for tool_calls +- tool: emits function_call_output items (one per TextContent) +with matching call_id -#### Methods +#### to_responses_value() -#### apply() +Return serialized form. -Applies the condensation to a list of events. +Either an instructions string (for system) or input items (for other roles). -This method removes events that are marked to be forgotten and returns a new -list of events. If the summary metadata is present (both summary and offset), -the corresponding CondensationSummaryEvent will be inserted at the specified -offset _after_ the forgotten events have been removed. +### class MessageToolCall -### class CondensationRequest +Bases: `BaseModel` -Bases: [`Event`](#class-event) +Transport-agnostic tool call representation. -This action is used to request a condensation of the conversation history. +One canonical id is used for linking across actions/observations and +for Responses function_call_output call_id. #### Properties -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. +- `arguments`: str +- `id`: str +- `name`: str +- `origin`: Literal['completion', 'responses'] +- `costs`: list[Cost] +- `response_latencies`: list[ResponseLatency] +- `token_usages`: list[TokenUsage] #### Methods -#### action - -The action type, namely ActionType.CONDENSATION_REQUEST. - -* Type: - str - -### class CondensationSummaryEvent +#### classmethod from_chat_tool_call() -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +Create a MessageToolCall from a Chat Completions tool call. -This event represents a summary generated by a condenser. +#### classmethod from_responses_function_call() +Create a MessageToolCall from a typed OpenAI Responses function_call item. -#### Properties +Note: OpenAI Responses function_call.arguments is already a JSON string. -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `summary`: str - The summary text. +#### model_config = (configuration object) -#### Methods +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### to_llm_message() +#### to_chat_dict() -### class ConversationStateUpdateEvent +Serialize to OpenAI Chat Completions tool_calls format. -Bases: [`Event`](#class-event) +#### to_responses_dict() -Event that contains conversation state updates. +Serialize to OpenAI Responses ‘function_call’ input item format. -This event is sent via websocket whenever the conversation state changes, -allowing remote clients to stay in sync without making REST API calls. +#### add_cost() -All fields are serialized versions of the corresponding ConversationState fields -to ensure compatibility with websocket transmission. +#### add_response_latency() +#### add_token_usage() -#### Properties +Add a single usage record. -- `key`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `value`: Any +#### deep_copy() -#### Methods +Create a deep copy of the Metrics object. -#### classmethod from_conversation_state() +#### diff() -Create a state update event from a ConversationState object. +Calculate the difference between current metrics and a baseline. -This creates an event containing a snapshot of important state fields. +This is useful for tracking metrics for specific operations like delegates. * Parameters: - * `state` – The ConversationState to serialize - * `conversation_id` – The conversation ID for the event + `baseline` – A metrics object representing the baseline state * Returns: - A ConversationStateUpdateEvent with serialized state data + A new Metrics object containing only the differences since the baseline -#### classmethod validate_key() +#### get() -#### classmethod validate_value() +Return the metrics in a dictionary. -### class Event +#### get_snapshot() -Bases: `DiscriminatedUnionMixin`, `ABC` +Get a snapshot of the current metrics without the detailed lists. -Base class for all events. +#### initialize_accumulated_token_usage() +#### log() -#### Properties +Log the metrics. -- `id`: str -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `timestamp`: str -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. -### class LLMCompletionLogEvent +#### merge() -Bases: [`Event`](#class-event) +Merge ‘other’ metrics into this one. -Event containing LLM completion log data. +#### model_config = (configuration object) -When an LLM is configured with log_completions=True in a remote conversation, -this event streams the completion log data back to the client through WebSocket -instead of writing it to a file inside the Docker container. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +#### classmethod validate_accumulated_cost() -#### Properties +### class MetricsSnapshot -- `filename`: str -- `log_data`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `model_name`: str -- `source`: Literal['agent', 'user', 'environment'] -- `usage_id`: str -### class LLMConvertibleEvent +Bases: `BaseModel` -Bases: [`Event`](#class-event), `ABC` +A snapshot of metrics at a point in time. -Base class for events that can be converted to LLM messages. +Does not include lists of individual costs, latencies, or token usages. #### Properties -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `accumulated_cost`: float +- `accumulated_token_usage`: TokenUsage | None +- `max_budget_per_task`: float | None +- `model_name`: str #### Methods -#### static events_to_messages() - -Convert event stream to LLM message stream, handling multi-action batches - -#### abstractmethod to_llm_message() +#### model_config = (configuration object) -### class MessageEvent +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +### class OAuthCredentials -Message from either agent or user. +Bases: `BaseModel` -This is originally the “MessageAction”, but it suppose not to be tool call. +OAuth credentials for subscription-based LLM access. #### Properties -- `activated_skills`: list[str] -- `critic_result`: CriticResult | None -- `extended_content`: list[TextContent] -- `llm_message`: Message -- `llm_response_id`: str | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `reasoning_content`: str -- `sender`: str | None -- `source`: Literal['agent', 'user', 'environment'] -- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] - Return the Anthropic thinking blocks from the LLM message. -- `visualize`: Text - Return Rich Text representation of this message event. +- `access_token`: str +- `expires_at`: int +- `refresh_token`: str +- `type`: Literal['oauth'] +- `vendor`: str #### Methods -#### to_llm_message() - -### class ObservationBaseEvent - -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +#### is_expired() -Base class for anything as a response to a tool call. +Check if the access token is expired. -Examples include tool execution, error, user reject. +#### model_config = (configuration object) +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### Properties +### class OpenAISubscriptionAuth -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `tool_call_id`: str -- `tool_name`: str -### class ObservationEvent +Bases: `object` -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) +Handle OAuth authentication for OpenAI ChatGPT subscription access. #### Properties -- `action_id`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `observation`: Observation -- `visualize`: Text - Return Rich Text representation of this observation event. +- `vendor`: str + Get the vendor name. #### Methods -#### to_llm_message() - -### class PauseEvent +#### __init__() -Bases: [`Event`](#class-event) +Initialize the OpenAI subscription auth handler. -Event indicating that the agent execution was paused by user request. +* Parameters: + * `credential_store` – Optional custom credential store. + * `oauth_port` – Port for the local OAuth callback server. +#### create_llm() -#### Properties +Create an LLM instance configured for Codex subscription access. -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `visualize`: Text - Return Rich Text representation of this pause event. -### class SystemPromptEvent +* Parameters: + * `model` – The model to use (must be in OPENAI_CODEX_MODELS). + * `credentials` – OAuth credentials to use. If None, uses stored credentials. + * `instructions` – Optional instructions for the Codex model. + llm_kwargs* – Additional arguments to pass to LLM constructor. +* Returns: + An LLM instance configured for Codex access. +* Raises: + `ValueError` – If the model is not supported or no credentials available. -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +#### get_credentials() -System prompt added by the agent. +Get stored credentials if they exist. -The system prompt can optionally include dynamic context that varies between -conversations. When `dynamic_context` is provided, it is included as a -second content block in the same system message. Cache markers are NOT -applied here - they are applied by `LLM._apply_prompt_caching()` when -caching is enabled, ensuring provider-specific cache control is only added -when appropriate. +#### has_valid_credentials() +Check if valid (non-expired) credentials exist. -#### Properties +#### async login() -- `dynamic_context`: TextContent | None -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `system_prompt`: TextContent -- `tools`: list[ToolDefinition] -- `visualize`: Text - Return Rich Text representation of this system prompt event. +Perform OAuth login flow. -#### Methods +This starts a local HTTP server to handle the OAuth callback, +opens the browser for user authentication, and waits for the +callback with the authorization code. -#### system_prompt +* Parameters: + `open_browser` – Whether to automatically open the browser. +* Returns: + The obtained OAuth credentials. +* Raises: + `RuntimeError` – If the OAuth flow fails or times out. -The static system prompt text (cacheable across conversations) +#### logout() -* Type: - openhands.sdk.llm.message.TextContent +Remove stored credentials. -#### tools +* Returns: + True if credentials were removed, False if none existed. -List of available tools +#### async refresh_if_needed() -* Type: - list[openhands.sdk.tool.tool.ToolDefinition] +Refresh credentials if they are expired. -#### dynamic_context +* Returns: + Updated credentials, or None if no credentials exist. +* Raises: + `RuntimeError` – If token refresh fails. -Optional per-conversation context (hosts, repo info, etc.) -Sent as a second TextContent block inside the system message. +### class ReasoningItemModel -* Type: - openhands.sdk.llm.message.TextContent | None +Bases: `BaseModel` -#### to_llm_message() +OpenAI Responses reasoning item (non-stream, subset we consume). -Convert to a single system LLM message. +Do not log or render encrypted_content. -When `dynamic_context` is present the message contains two content -blocks: the static prompt followed by the dynamic context. Cache markers -are NOT applied here - they are applied by `LLM._apply_prompt_caching()` -when caching is enabled, which marks the static block (index 0) and leaves -the dynamic block (index 1) unmarked for cross-conversation cache sharing. -### class TokenEvent +#### Properties -Bases: [`Event`](#class-event) +- `content`: list[str] | None +- `encrypted_content`: str | None +- `id`: str | None +- `status`: str | None +- `summary`: list[str] -Event from VLLM representing token IDs used in LLM interaction. +#### Methods +#### model_config = (configuration object) -#### Properties +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `prompt_token_ids`: list[int] -- `response_token_ids`: list[int] -- `source`: Literal['agent', 'user', 'environment'] -### class UserRejectObservation +### class RedactedThinkingBlock -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) +Bases: `BaseModel` -Observation when an action is rejected by user or hook. +Redacted thinking block for previous responses without extended thinking. -This event is emitted when: -- User rejects an action during confirmation mode (rejection_source=”user”) -- A PreToolUse hook blocks an action (rejection_source=”hook”) +This is used as a placeholder for assistant messages that were generated +before extended thinking was enabled. #### Properties -- `action_id`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `rejection_reason`: str -- `rejection_source`: Literal['user', 'hook'] -- `visualize`: Text - Return Rich Text representation of this user rejection event. +- `data`: str +- `type`: Literal['redacted_thinking'] #### Methods -#### to_llm_message() +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RegistryEvent -# openhands.sdk.llm -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm +Bases: `BaseModel` -### class CredentialStore -Bases: `object` +#### Properties -Store and retrieve OAuth credentials for LLM providers. +- `llm`: [LLM](#class-llm) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RouterLLM + +Bases: [`LLM`](#class-llm) + +Base class for multiple LLM acting as a unified LLM. +This class provides a foundation for implementing model routing by +inheriting from LLM, allowing routers to work with multiple underlying +LLM models while presenting a unified LLM interface to consumers. +Key features: +- Works with multiple LLMs configured via llms_for_routing +- Delegates all other operations/properties to the selected LLM +- Provides routing interface through select_llm() method #### Properties -- `credentials_dir`: Path - Get the credentials directory, creating it if necessary. +- `active_llm`: [LLM](#class-llm) | None +- `llms_for_routing`: dict[str, [LLM](#class-llm)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `router_name`: str #### Methods -#### __init__() +#### completion() -Initialize the credential store. +This method intercepts completion calls and routes them to the appropriate +underlying LLM based on the routing logic implemented in select_llm(). * Parameters: - `credentials_dir` – Optional custom directory for storing credentials. - Defaults to ~/.local/share/openhands/auth/ - -#### delete() + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API -Delete stored credentials for a vendor. +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -* Parameters: - `vendor` – The vendor/provider name -* Returns: - True if credentials were deleted, False if they didn’t exist +#### model_post_init() -#### get() +This function is meant to behave like a BaseModel method to initialise private attributes. -Get stored credentials for a vendor. +It takes context as an argument since that’s what pydantic-core passes when calling it. * Parameters: - `vendor` – The vendor/provider name (e.g., ‘openai’) -* Returns: - OAuthCredentials if found and valid, None otherwise + * `self` – The BaseModel instance. + * `context` – The context. -#### save() +#### abstractmethod select_llm() -Save credentials for a vendor. +Select which LLM to use based on messages and events. + +This method implements the core routing logic for the RouterLLM. +Subclasses should analyze the provided messages to determine which +LLM from llms_for_routing is most appropriate for handling the request. * Parameters: - `credentials` – The OAuth credentials to save + `messages` – List of messages in the conversation that can be used + to inform the routing decision. +* Returns: + The key/name of the LLM to use from llms_for_routing dictionary. -#### update_tokens() +#### classmethod set_placeholder_model() -Update tokens for an existing credential. +Guarantee model exists before LLM base validation runs. -* Parameters: - * `vendor` – The vendor/provider name - * `access_token` – New access token - * `refresh_token` – New refresh token (if provided) - * `expires_in` – Token expiry in seconds -* Returns: - Updated credentials, or None if no existing credentials found +#### classmethod validate_llms_not_empty() -### class ImageContent +### class TextContent Bases: `BaseContent` #### Properties -- `image_urls`: list[str] -- `type`: Literal['image'] +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str +- `type`: Literal['text'] #### Methods -#### model_config = (configuration object) - -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - #### to_llm_dict() Convert to LLM API format. -### class LLM - -Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` - -Language model interface for OpenHands agents. +### class ThinkingBlock -The LLM class provides a unified interface for interacting with various -language models through the litellm library. It handles model configuration, -API authentication, -retry logic, and tool calling capabilities. +Bases: `BaseModel` -#### Example +Anthropic thinking block for extended thinking feature. -```pycon ->>> from openhands.sdk import LLM ->>> from pydantic import SecretStr ->>> llm = LLM( -... model="claude-sonnet-4-20250514", -... api_key=SecretStr("your-api-key"), -... usage_id="my-agent" -... ) ->>> # Use with agent or conversation -``` +This represents the raw thinking blocks returned by Anthropic models +when extended thinking is enabled. These blocks must be preserved +and passed back to the API for tool use scenarios. #### Properties -- `api_key`: str | SecretStr | None -- `api_version`: str | None -- `aws_access_key_id`: str | SecretStr | None -- `aws_region_name`: str | None -- `aws_secret_access_key`: str | SecretStr | None -- `base_url`: str | None -- `caching_prompt`: bool -- `custom_tokenizer`: str | None -- `disable_stop_word`: bool | None -- `disable_vision`: bool | None -- `drop_params`: bool -- `enable_encrypted_reasoning`: bool -- `extended_thinking_budget`: int | None -- `extra_headers`: dict[str, str] | None -- `force_string_serializer`: bool | None -- `input_cost_per_token`: float | None -- `is_subscription`: bool - Check if this LLM uses subscription-based authentication. - Returns True when the LLM was created via LLM.subscription_login(), - which uses the ChatGPT subscription Codex backend rather than the - standard OpenAI API. - * Returns: - True if using subscription-based transport, False otherwise. - * Return type: - bool -- `litellm_extra_body`: dict[str, Any] -- `log_completions`: bool -- `log_completions_folder`: str -- `max_input_tokens`: int | None -- `max_message_chars`: int -- `max_output_tokens`: int | None -- `metrics`: [Metrics](#class-metrics) - Get usage metrics for this LLM instance. - * Returns: - Metrics object containing token usage, costs, and other statistics. -- `model`: str -- `model_canonical_name`: str | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `model_info`: dict | None - Returns the model info dictionary. -- `modify_params`: bool -- `native_tool_calling`: bool -- `num_retries`: int -- `ollama_base_url`: str | None -- `openrouter_app_name`: str -- `openrouter_site_url`: str -- `output_cost_per_token`: float | None -- `prompt_cache_retention`: str | None -- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None -- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None -- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] -- `retry_max_wait`: int -- `retry_min_wait`: int -- `retry_multiplier`: float -- `safety_settings`: list[dict[str, str]] | None -- `seed`: int | None -- `stream`: bool -- `telemetry`: Telemetry - Get telemetry handler for this LLM instance. - * Returns: - Telemetry object for managing logging and metrics callbacks. -- `temperature`: float | None -- `timeout`: int | None -- `top_k`: float | None -- `top_p`: float | None -- `usage_id`: str +- `signature`: str | None +- `thinking`: str +- `type`: Literal['thinking'] #### Methods -#### completion() - -Generate a completion from the language model. - -This is the method for getting responses from the model via Completion API. -It handles message formatting, tool calling, and response processing. - -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `_return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming tokens - kwargs* – Additional arguments passed to the LLM API -* Returns: - LLMResponse containing the model’s response and metadata. +#### model_config = (configuration object) -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -* Raises: - `ValueError` – If streaming is requested (not supported). +### openhands.sdk.security +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md -#### format_messages_for_llm() +### class AlwaysConfirm -Formats Message objects for LLM consumption. +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -#### format_messages_for_responses() +#### Methods -Prepare (instructions, input[]) for the OpenAI Responses API. +#### model_config = (configuration object) -- Skips prompt caching flags and string serializer concerns -- Uses Message.to_responses_value to get either instructions (system) - or input items (others) -- Concatenates system instructions into a single instructions string -- For subscription mode, system prompts are prepended to user content +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### get_token_count() +#### should_confirm() -#### is_caching_prompt_active() +Determine if an action with the given risk level requires confirmation. -Check if prompt caching is supported and enabled for current model. +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. * Returns: - True if prompt caching is supported and enabled for the given - : model. -* Return type: - boolean + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -#### classmethod load_from_env() +### class ConfirmRisky -#### classmethod load_from_json() +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -#### model_post_init() -This function is meant to behave like a BaseModel method to initialise private attributes. +#### Properties -It takes context as an argument since that’s what pydantic-core passes when calling it. +- `confirm_unknown`: bool +- `threshold`: [SecurityRisk](#class-securityrisk) -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +#### Methods -#### reset_metrics() +#### model_config = (configuration object) -Reset metrics and telemetry to fresh instances. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -This is used by the LLMRegistry to ensure each registered LLM has -independent metrics, preventing metrics from being shared between -LLMs that were created via model_copy(). +#### should_confirm() -When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), -Pydantic’s model_copy() does a shallow copy of private attributes by default, -causing the original and copied LLM to share the same Metrics object. -This method allows the registry to fix this by resetting metrics to None, -which will be lazily recreated when accessed. +Determine if an action with the given risk level requires confirmation. -#### responses() +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. -Alternative invocation path using OpenAI Responses API via LiteLLM. +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -Maps Message[] -> (instructions, input[]) and returns LLMResponse. +#### classmethod validate_threshold() -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `include` – Optional list of fields to include in response - * `store` – Whether to store the conversation - * `_return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming deltas - kwargs* – Additional arguments passed to the API +### class ConfirmationPolicyBase -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +Bases: `DiscriminatedUnionMixin`, `ABC` -#### restore_metrics() +#### Methods -#### classmethod subscription_login() +#### model_config = (configuration object) -Authenticate with a subscription service and return an LLM instance. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -This method provides subscription-based access to LLM models that are -available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather -than API credits. It handles credential caching, token refresh, and -the OAuth login flow. +#### abstractmethod should_confirm() -Currently supported vendors: -- “openai”: ChatGPT Plus/Pro subscription for Codex models +Determine if an action with the given risk level requires confirmation. -Supported OpenAI models: -- gpt-5.1-codex-max -- gpt-5.1-codex-mini -- gpt-5.2 -- gpt-5.2-codex +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. * Parameters: - * `vendor` – The vendor/provider. Currently only “openai” is supported. - * `model` – The model to use. Must be supported by the vendor’s - subscription service. - * `force_login` – If True, always perform a fresh login even if valid - credentials exist. - * `open_browser` – Whether to automatically open the browser for the - OAuth login flow. - llm_kwargs* – Additional arguments to pass to the LLM constructor. + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. * Returns: - An LLM instance configured for subscription-based access. -* Raises: - * `ValueError` – If the vendor or model is not supported. - * `RuntimeError` – If authentication fails. + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -#### uses_responses_api() +### class GraySwanAnalyzer -Whether this model uses the OpenAI Responses API path. +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) -#### vision_is_active() +Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. -### class LLMProfileStore +This analyzer sends conversation history and pending actions to the GraySwan +Cygnal API for security analysis. The API returns a violation score which is +mapped to SecurityRisk levels. -Bases: `object` +Environment Variables: +: GRAYSWAN_API_KEY: Required API key for GraySwan authentication + GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy -Standalone utility for persisting LLM configurations. +#### Example -#### Methods +```pycon +>>> from openhands.sdk.security.grayswan import GraySwanAnalyzer +>>> analyzer = GraySwanAnalyzer() +>>> risk = analyzer.security_risk(action_event) +``` -#### __init__() -Initialize the profile store. +#### Properties -* Parameters: - `base_dir` – Path to the directory where the profiles are stored. - If None is provided, the default directory is used, i.e., - ~/.openhands/profiles. +- `api_key`: SecretStr | None +- `api_url`: str +- `history_limit`: int +- `low_threshold`: float +- `max_message_chars`: int +- `medium_threshold`: float +- `policy_id`: str | None +- `timeout`: float -#### delete() +#### Methods -Delete an existing profile. +#### close() -If the profile is not present in the profile directory, it does nothing. +Clean up resources. -* Parameters: - `name` – Name of the profile to delete. -* Raises: - `TimeoutError` – If the lock cannot be acquired. +#### model_config = (configuration object) -#### list() +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Returns a list of all profiles stored. +#### model_post_init() -* Returns: - List of profile filenames (e.g., [“default.json”, “gpt4.json”]). +Initialize the analyzer after model creation. -#### load() +#### security_risk() -Load an LLM instance from the given profile name. +Analyze action for security risks using GraySwan API. + +This method converts the conversation history and the pending action +to OpenAI message format and sends them to the GraySwan Cygnal API +for security analysis. * Parameters: - `name` – Name of the profile to load. + `action` – The ActionEvent to analyze * Returns: - An LLM instance constructed from the profile configuration. -* Raises: - * `FileNotFoundError` – If the profile name does not exist. - * `ValueError` – If the profile file is corrupted or invalid. - * `TimeoutError` – If the lock cannot be acquired. - -#### save() + SecurityRisk level based on GraySwan analysis -Save a profile to the profile directory. +#### set_events() -Note that if a profile name already exists, it will be overwritten. +Set the events for context when analyzing actions. * Parameters: - * `name` – Name of the profile to save. - * `llm` – LLM instance to save - * `include_secrets` – Whether to include the profile secrets. Defaults to False. -* Raises: - `TimeoutError` – If the lock cannot be acquired. - -### class LLMRegistry + `events` – Sequence of events to use as context for security analysis -Bases: `object` +#### validate_thresholds() -A minimal LLM registry for managing LLM instances by usage ID. +Validate that thresholds are properly ordered. -This registry provides a simple way to manage multiple LLM instances, -avoiding the need to recreate LLMs with the same configuration. +### class LLMSecurityAnalyzer -The registry also ensures that each registered LLM has independent metrics, -preventing metrics from being shared between LLMs that were created via -model_copy(). This is important for scenarios like creating a condenser LLM -from an agent LLM, where each should track its own usage independently. +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) +LLM-based security analyzer. -#### Properties +This analyzer respects the security_risk attribute that can be set by the LLM +when generating actions, similar to OpenHands’ LLMRiskAnalyzer. -- `registry_id`: str -- `retry_listener`: Callable[[int, int], None] | None -- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None -- `usage_to_llm`: MappingProxyType - Access the internal usage-ID-to-LLM mapping (read-only view). +It provides a lightweight security analysis approach that leverages the LLM’s +understanding of action context and potential risks. #### Methods -#### __init__() +#### model_config = (configuration object) -Initialize the LLM registry. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -* Parameters: - `retry_listener` – Optional callback for retry events. +#### security_risk() -#### add() +Evaluate security risk based on LLM-provided assessment. -Add an LLM instance to the registry. +This method checks if the action has a security_risk attribute set by the LLM +and returns it. The LLM may not always provide this attribute but it defaults to +UNKNOWN if not explicitly set. -This method ensures that the LLM has independent metrics before -registering it. If the LLM’s metrics are shared with another -registered LLM (e.g., due to model_copy()), fresh metrics will -be created automatically. +### class NeverConfirm -* Parameters: - `llm` – The LLM instance to register. -* Raises: - `ValueError` – If llm.usage_id already exists in the registry. +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -#### get() +#### Methods -Get an LLM instance from the registry. +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. * Parameters: - `usage_id` – Unique identifier for the LLM usage slot. + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. * Returns: - The LLM instance. -* Raises: - `KeyError` – If usage_id is not found in the registry. + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -#### list_usage_ids() +### class SecurityAnalyzerBase -List all registered usage IDs. +Bases: `DiscriminatedUnionMixin`, `ABC` -#### notify() +Abstract base class for security analyzers. -Notify subscribers of registry events. +Security analyzers evaluate the risk of actions before they are executed +and can influence the conversation flow based on security policies. -* Parameters: - `event` – The registry event to notify about. +This is adapted from OpenHands SecurityAnalyzer but designed to work +with the agent-sdk’s conversation-based architecture. -#### subscribe() +#### Methods -Subscribe to registry events. +#### analyze_event() -* Parameters: - `callback` – Function to call when LLMs are created or updated. +Analyze an event for security risks. -### class LLMResponse +This is a convenience method that checks if the event is an action +and calls security_risk() if it is. Non-action events return None. -Bases: `BaseModel` +* Parameters: + `event` – The event to analyze +* Returns: + ActionSecurityRisk if event is an action, None otherwise -Result of an LLM completion request. +#### analyze_pending_actions() -This type provides a clean interface for LLM completion results, exposing -only OpenHands-native types to consumers while preserving access to the -raw LiteLLM response for internal use. +Analyze all pending actions in a conversation. +This method gets all unmatched actions from the conversation state +and analyzes each one for security risks. -#### Properties +* Parameters: + `conversation` – The conversation to analyze +* Returns: + List of tuples containing (action, risk_level) for each pending action -- `id`: str - Get the response ID from the underlying LLM response. - This property provides a clean interface to access the response ID, - supporting both completion mode (ModelResponse) and response API modes - (ResponsesAPIResponse). - * Returns: - The response ID from the LLM response -- `message`: [Message](#class-message) -- `metrics`: [MetricsSnapshot](#class-metricssnapshot) -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `raw_response`: ModelResponse | ResponsesAPIResponse +#### model_config = (configuration object) -#### Methods +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### message +#### abstractmethod security_risk() -The completion message converted to OpenHands Message type +Evaluate the security risk of an ActionEvent. -* Type: - [openhands.sdk.llm.message.Message](#class-message) +This is the core method that analyzes an ActionEvent and returns its risk level. +Implementations should examine the action’s content, context, and potential +impact to determine the appropriate risk level. -#### metrics +* Parameters: + `action` – The ActionEvent to analyze for security risks +* Returns: + ActionSecurityRisk enum indicating the risk level -Snapshot of metrics from the completion request +#### should_require_confirmation() -* Type: - [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) +Determine if an action should require user confirmation. -#### raw_response +This implements the default confirmation logic based on risk level +and confirmation mode settings. -The original LiteLLM response (ModelResponse or -ResponsesAPIResponse) for internal use +* Parameters: + * `risk` – The security risk level of the action + * `confirmation_mode` – Whether confirmation mode is enabled +* Returns: + True if confirmation is required, False otherwise -* Type: - litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse +### class SecurityRisk -### class Message +Bases: `str`, `Enum` -Bases: `BaseModel` +Security risk levels for actions. + +Based on OpenHands security risk levels but adapted for agent-sdk. +Integer values allow for easy comparison and ordering. #### Properties -- `contains_image`: bool -- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: str | None -- `reasoning_content`: str | None -- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None -- `role`: Literal['user', 'system', 'assistant', 'tool'] -- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] -- `tool_call_id`: str | None -- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None +- `description`: str + Get a human-readable description of the risk level. +- `visualize`: Text + Return Rich Text representation of this risk level. #### Methods -#### classmethod from_llm_chat_message() +#### HIGH = 'HIGH' -Convert a LiteLLMMessage (Chat Completions) to our Message class. +#### LOW = 'LOW' -Provider-agnostic mapping for reasoning: -- Prefer message.reasoning_content if present (LiteLLM normalized field) -- Extract thinking_blocks from content array (Anthropic-specific) +#### MEDIUM = 'MEDIUM' -#### classmethod from_llm_responses_output() +#### UNKNOWN = 'UNKNOWN' -Convert OpenAI Responses API output items into a single assistant Message. +#### get_color() -Policy (non-stream): -- Collect assistant text by concatenating output_text parts from message items -- Normalize function_call items to MessageToolCall list +Get the color for displaying this risk level in Rich text. -#### to_chat_dict() +#### is_riskier() -Serialize message for OpenAI Chat Completions. +Check if this risk level is riskier than another. -* Parameters: - * `cache_enabled` – Whether prompt caching is active. - * `vision_enabled` – Whether vision/image processing is enabled. - * `function_calling_enabled` – Whether native function calling is enabled. - * `force_string_serializer` – Force string serializer instead of list format. - * `send_reasoning_content` – Whether to include reasoning_content in output. +Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is +less risky than HIGH. UNKNOWN is not comparable to any other level. -Chooses the appropriate content serializer and then injects threading keys: -- Assistant tool call turn: role == “assistant” and self.tool_calls -- Tool result turn: role == “tool” and self.tool_call_id (with name) +To make this act like a standard well-ordered domain, we reflexively consider +risk levels to be riskier than themselves. That is: -#### to_responses_dict() + for risk_level in list(SecurityRisk): + : assert risk_level.is_riskier(risk_level) -Serialize message for OpenAI Responses (input parameter). + # More concretely: + assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) + assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) + assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) -Produces a list of “input” items for the Responses API: -- system: returns [], system content is expected in ‘instructions’ -- user: one ‘message’ item with content parts -> input_text / input_image -(when vision enabled) -- assistant: emits prior assistant content as input_text, -and function_call items for tool_calls -- tool: emits function_call_output items (one per TextContent) -with matching call_id +This can be disabled by setting the reflexive parameter to False. -#### to_responses_value() +* Parameters: + other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. + reflexive (bool*) – Whether the relationship is reflexive. +* Raises: + `ValueError` – If either risk level is UNKNOWN. -Return serialized form. +### openhands.sdk.tool +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md -Either an instructions string (for system) or input items (for other roles). +### class Action -### class MessageToolCall +Bases: `Schema`, `ABC` -Bases: `BaseModel` +Base schema for input action. -Transport-agnostic tool call representation. -One canonical id is used for linking across actions/observations and -for Responses function_call_output call_id. +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `visualize`: Text + Return Rich Text representation of this action. + This method can be overridden by subclasses to customize visualization. + The base implementation displays all action fields systematically. +### class ExecutableTool + +Bases: `Protocol` + +Protocol for tools that are guaranteed to have a non-None executor. + +This eliminates the need for runtime None checks and type narrowing +when working with tools that are known to be executable. #### Properties -- `arguments`: str -- `id`: str +- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] - `name`: str -- `origin`: Literal['completion', 'responses'] -- `costs`: list[Cost] -- `response_latencies`: list[ResponseLatency] -- `token_usages`: list[TokenUsage] #### Methods -#### classmethod from_chat_tool_call() +#### __init__() -Create a MessageToolCall from a Chat Completions tool call. +### class FinishTool -#### classmethod from_responses_function_call() +Bases: `ToolDefinition[FinishAction, FinishObservation]` -Create a MessageToolCall from a typed OpenAI Responses function_call item. +Tool for signaling the completion of a task or conversation. -Note: OpenAI Responses function_call.arguments is already a JSON string. -#### model_config = (configuration object) +#### Properties -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### to_chat_dict() +#### Methods -Serialize to OpenAI Chat Completions tool_calls format. +#### classmethod create() -#### to_responses_dict() +Create FinishTool instance. -Serialize to OpenAI Responses ‘function_call’ input item format. +* Parameters: + * `conv_state` – Optional conversation state (not used by FinishTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single FinishTool instance. +* Raises: + `ValueError` – If any parameters are provided. -#### add_cost() +#### name = 'finish' -#### add_response_latency() +### class Observation -#### add_token_usage() +Bases: `Schema`, `ABC` -Add a single usage record. +Base schema for output observation. -#### deep_copy() -Create a deep copy of the Metrics object. +#### Properties -#### diff() +- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' +- `content`: list[TextContent | ImageContent] +- `is_error`: bool +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str + Extract all text content from the observation. + * Returns: + Concatenated text from all TextContent items in content. +- `to_llm_content`: Sequence[TextContent | ImageContent] + Default content formatting for converting observation to LLM readable content. + Subclasses can override to provide richer content (e.g., images, diffs). +- `visualize`: Text + Return Rich Text representation of this observation. + Subclasses can override for custom visualization; by default we show the + same text that would be sent to the LLM. -Calculate the difference between current metrics and a baseline. +#### Methods -This is useful for tracking metrics for specific operations like delegates. +#### classmethod from_text() + +Utility to create an Observation from a simple text string. * Parameters: - `baseline` – A metrics object representing the baseline state + * `text` – The text content to include in the observation. + * `is_error` – Whether this observation represents an error. + kwargs* – Additional fields for the observation subclass. * Returns: - A new Metrics object containing only the differences since the baseline - -#### get() + An Observation instance with the text wrapped in a TextContent. -Return the metrics in a dictionary. +### class ThinkTool -#### get_snapshot() +Bases: `ToolDefinition[ThinkAction, ThinkObservation]` -Get a snapshot of the current metrics without the detailed lists. +Tool for logging thoughts without making changes. -#### initialize_accumulated_token_usage() -#### log() +#### Properties -Log the metrics. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### merge() +#### Methods -Merge ‘other’ metrics into this one. +#### classmethod create() -#### model_config = (configuration object) +Create ThinkTool instance. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +* Parameters: + * `conv_state` – Optional conversation state (not used by ThinkTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single ThinkTool instance. +* Raises: + `ValueError` – If any parameters are provided. -#### classmethod validate_accumulated_cost() +#### name = 'think' -### class MetricsSnapshot +### class Tool Bases: `BaseModel` -A snapshot of metrics at a point in time. +Defines a tool to be initialized for the agent. -Does not include lists of individual costs, latencies, or token usages. +This is only used in agent-sdk for type schema for server use. #### Properties -- `accumulated_cost`: float -- `accumulated_token_usage`: TokenUsage | None -- `max_budget_per_task`: float | None -- `model_name`: str +- `name`: str +- `params`: dict[str, Any] #### Methods @@ -13725,309 +13461,424 @@ Does not include lists of individual costs, latencies, or token usages. Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### class OAuthCredentials +#### classmethod validate_name() + +Validate that name is not empty. + +#### classmethod validate_params() + +Convert None params to empty dict. + +### class ToolAnnotations Bases: `BaseModel` -OAuth credentials for subscription-based LLM access. +Annotations to provide hints about the tool’s behavior. +Based on Model Context Protocol (MCP) spec: +[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) -#### Properties -- `access_token`: str -- `expires_at`: int -- `refresh_token`: str -- `type`: Literal['oauth'] -- `vendor`: str +#### Properties -#### Methods +- `destructiveHint`: bool +- `idempotentHint`: bool +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `openWorldHint`: bool +- `readOnlyHint`: bool +- `title`: str | None +### class ToolDefinition -#### is_expired() +Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` -Check if the access token is expired. +Base class for all tool implementations. -#### model_config = (configuration object) +This class serves as a base for the discriminated union of all tool types. +All tools must inherit from this class and implement the .create() method for +proper initialization with executors and parameters. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Features: +- Normalize input/output schemas (class or dict) into both model+schema. +- Validate inputs before execute. +- Coerce outputs only if an output model is defined; else return vanilla JSON. +- Export MCP tool description. -### class OpenAISubscriptionAuth +#### Examples -Bases: `object` +Simple tool with no parameters: +: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): + : @classmethod + def create(cls, conv_state=None, + `
` + ``` + ** + ``` + `
` + params): + `
` + > return [cls(name=”finish”, …, executor=FinishExecutor())] -Handle OAuth authentication for OpenAI ChatGPT subscription access. +Complex tool with initialization parameters: +: class TerminalTool(ToolDefinition[TerminalAction, + : TerminalObservation]): + @classmethod + def create(cls, conv_state, + `
` + ``` + ** + ``` + `
` + params): + `
` + > executor = TerminalExecutor( + > : working_dir=conv_state.workspace.working_dir, + > `
` + > ``` + > ** + > ``` + > `
` + > params, + `
` + > ) + > return [cls(name=”terminal”, …, executor=executor)] #### Properties -- `vendor`: str - Get the vendor name. +- `action_type`: type[[Action](#class-action)] +- `annotations`: [ToolAnnotations](#class-toolannotations) | None +- `description`: str +- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] +- `meta`: dict[str, Any] | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: ClassVar[str] = '' +- `observation_type`: type[[Observation](#class-observation)] | None +- `title`: str #### Methods -#### __init__() +#### action_from_arguments() -Initialize the OpenAI subscription auth handler. +Create an action from parsed arguments. + +This method can be overridden by subclasses to provide custom logic +for creating actions from arguments (e.g., for MCP tools). * Parameters: - * `credential_store` – Optional custom credential store. - * `oauth_port` – Port for the local OAuth callback server. + `arguments` – The parsed arguments from the tool call. +* Returns: + The action instance created from the arguments. -#### create_llm() +#### as_executable() -Create an LLM instance configured for Codex subscription access. +Return this tool as an ExecutableTool, ensuring it has an executor. + +This method eliminates the need for runtime None checks by guaranteeing +that the returned tool has a non-None executor. -* Parameters: - * `model` – The model to use (must be in OPENAI_CODEX_MODELS). - * `credentials` – OAuth credentials to use. If None, uses stored credentials. - * `instructions` – Optional instructions for the Codex model. - llm_kwargs* – Additional arguments to pass to LLM constructor. * Returns: - An LLM instance configured for Codex access. + This tool instance, typed as ExecutableTool. * Raises: - `ValueError` – If the model is not supported or no credentials available. - -#### get_credentials() + `NotImplementedError` – If the tool has no executor. -Get stored credentials if they exist. +#### abstractmethod classmethod create() -#### has_valid_credentials() +Create a sequence of Tool instances. -Check if valid (non-expired) credentials exist. +This method must be implemented by all subclasses to provide custom +initialization logic, typically initializing the executor with parameters +from conv_state and other optional parameters. -#### async login() +* Parameters: + args** – Variable positional arguments (typically conv_state as first arg). + kwargs* – Optional parameters for tool initialization. +* Returns: + A sequence of Tool instances. Even single tools are returned as a sequence + to provide a consistent interface and eliminate union return types. -Perform OAuth login flow. +#### classmethod resolve_kind() -This starts a local HTTP server to handle the OAuth callback, -opens the browser for user authentication, and waits for the -callback with the authorization code. +Resolve a kind string to its corresponding tool class. * Parameters: - `open_browser` – Whether to automatically open the browser. + `kind` – The name of the tool class to resolve * Returns: - The obtained OAuth credentials. + The tool class corresponding to the kind * Raises: - `RuntimeError` – If the OAuth flow fails or times out. + `ValueError` – If the kind is unknown -#### logout() +#### set_executor() -Remove stored credentials. +Create a new Tool instance with the given executor. -* Returns: - True if credentials were removed, False if none existed. +#### to_mcp_tool() -#### async refresh_if_needed() +Convert a Tool to an MCP tool definition. -Refresh credentials if they are expired. +Allow overriding input/output schemas (usually by subclasses). -* Returns: - Updated credentials, or None if no credentials exist. -* Raises: - `RuntimeError` – If token refresh fails. +* Parameters: + * `input_schema` – Optionally override the input schema. + * `output_schema` – Optionally override the output schema. -### class ReasoningItemModel +#### to_openai_tool() -Bases: `BaseModel` +Convert a Tool to an OpenAI tool. -OpenAI Responses reasoning item (non-stream, subset we consume). +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + to the action schema for LLM to predict. This is useful for + tools that may have safety risks, so the LLM can reason about + the risk level before calling the tool. + * `action_type` – Optionally override the action_type to use for the schema. + This is useful for MCPTool to use a dynamically created action type + based on the tool’s input schema. -Do not log or render encrypted_content. +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. +#### to_responses_tool() -#### Properties +Convert a Tool to a Responses API function tool (LiteLLM typed). -- `content`: list[str] | None -- `encrypted_content`: str | None -- `id`: str | None -- `status`: str | None -- `summary`: list[str] +For Responses API, function tools expect top-level keys: +(JSON configuration object) -#### Methods +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + * `action_type` – Optional override for the action type -#### model_config = (configuration object) +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class ToolExecutor -### class RedactedThinkingBlock +Bases: `ABC`, `Generic` -Bases: `BaseModel` +Executor function type for a Tool. -Redacted thinking block for previous responses without extended thinking. +#### Methods -This is used as a placeholder for assistant messages that were generated -before extended thinking was enabled. +#### close() +Close the executor and clean up resources. -#### Properties +Default implementation does nothing. Subclasses should override +this method to perform cleanup (e.g., closing connections, +terminating processes, etc.). -- `data`: str -- `type`: Literal['redacted_thinking'] +### openhands.sdk.utils +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md -#### Methods +Utility functions for the OpenHands SDK. -#### model_config = (configuration object) +### deprecated() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Return a decorator that deprecates a callable with explicit metadata. -### class RegistryEvent +Use this helper when you can annotate a function, method, or property with +@deprecated(…). It transparently forwards to `deprecation.deprecated()` +while filling in the SDK’s current version metadata unless custom values are +supplied. -Bases: `BaseModel` +### maybe_truncate() +Truncate the middle of content if it exceeds the specified length. -#### Properties +Keeps the head and tail of the content to preserve context at both ends. +Optionally saves the full content to a file for later investigation. -- `llm`: [LLM](#class-llm) -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### class RouterLLM +* Parameters: + * `content` – The text content to potentially truncate + * `truncate_after` – Maximum length before truncation. If None, no truncation occurs + * `truncate_notice` – Notice to insert in the middle when content is truncated + * `save_dir` – Working directory to save full content file in + * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) +* Returns: + Original content if under limit, or truncated content with head and tail + preserved and reference to saved file if applicable -Bases: [`LLM`](#class-llm) +### sanitize_openhands_mentions() -Base class for multiple LLM acting as a unified LLM. -This class provides a foundation for implementing model routing by -inheriting from LLM, allowing routers to work with multiple underlying -LLM models while presenting a unified LLM interface to consumers. -Key features: -- Works with multiple LLMs configured via llms_for_routing -- Delegates all other operations/properties to the selected LLM -- Provides routing interface through select_llm() method +Sanitize @OpenHands mentions in text to prevent self-mention loops. +This function inserts a zero-width joiner (ZWJ) after the @ symbol in +@OpenHands mentions, making them non-clickable in GitHub comments while +preserving readability. The original case of the mention is preserved. -#### Properties +* Parameters: + `text` – The text to sanitize +* Returns: + Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) -- `active_llm`: [LLM](#class-llm) | None -- `llms_for_routing`: dict[str, [LLM](#class-llm)] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `router_name`: str +### Examples -#### Methods +```pycon +>>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") +'Thanks @u200dOpenHands for the help!' +>>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") +'Check @u200dopenhands and @u200dOPENHANDS' +>>> sanitize_openhands_mentions("No mention here") +'No mention here' +``` -#### completion() +### sanitized_env() -This method intercepts completion calls and routes them to the appropriate -underlying LLM based on the routing logic implemented in select_llm(). +Return a copy of env with sanitized values. -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming tokens - kwargs* – Additional arguments passed to the LLM API +PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored +libraries win. This function restores the original value so that subprocess +will not use them. -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +### warn_deprecated() -#### model_post_init() +Emit a deprecation warning for dynamic access to a legacy feature. -This function is meant to behave like a BaseModel method to initialise private attributes. +Prefer this helper when a decorator is not practical—e.g. attribute accessors, +data migrations, or other runtime paths that must conditionally warn. Provide +explicit version metadata so the SDK reports consistent messages and upgrades +to `deprecation.UnsupportedWarning` after the removal threshold. -It takes context as an argument since that’s what pydantic-core passes when calling it. +### openhands.sdk.workspace +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +### class BaseWorkspace -#### abstractmethod select_llm() +Bases: `DiscriminatedUnionMixin`, `ABC` -Select which LLM to use based on messages and events. +Abstract base class for workspace implementations. -This method implements the core routing logic for the RouterLLM. -Subclasses should analyze the provided messages to determine which -LLM from llms_for_routing is most appropriate for handling the request. +Workspaces provide a sandboxed environment where agents can execute commands, +read/write files, and perform other operations. All workspace implementations +support the context manager protocol for safe resource management. -* Parameters: - `messages` – List of messages in the conversation that can be used - to inform the routing decision. -* Returns: - The key/name of the LLM to use from llms_for_routing dictionary. +#### Example -#### classmethod set_placeholder_model() +```pycon +>>> with workspace: +... result = workspace.execute_command("echo 'hello'") +... content = workspace.read_file("example.txt") +``` -Guarantee model exists before LLM base validation runs. -#### classmethod validate_llms_not_empty() +#### Properties -### class TextContent +- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] -Bases: `BaseContent` +#### Methods +#### abstractmethod execute_command() -#### Properties +Execute a bash command on the system. -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `text`: str -- `type`: Literal['text'] +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory for the command (optional) + * `timeout` – Timeout in seconds (defaults to 30.0) +* Returns: + Result containing stdout, stderr, exit_code, and other + : metadata +* Return type: + [CommandResult](#class-commandresult) +* Raises: + `Exception` – If command execution fails -#### Methods +#### abstractmethod file_download() -#### to_llm_dict() +Download a file from the system. -Convert to LLM API format. +* Parameters: + * `source_path` – Path to the source file on the system + * `destination_path` – Path where the file should be downloaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file download fails -### class ThinkingBlock +#### abstractmethod file_upload() -Bases: `BaseModel` +Upload a file to the system. -Anthropic thinking block for extended thinking feature. +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be uploaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file upload fails -This represents the raw thinking blocks returned by Anthropic models -when extended thinking is enabled. These blocks must be preserved -and passed back to the API for tool use scenarios. +#### abstractmethod git_changes() +Get the git changes for the repository at the path given. -#### Properties +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed -- `signature`: str | None -- `thinking`: str -- `type`: Literal['thinking'] +#### abstractmethod git_diff() -#### Methods +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed #### model_config = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +#### pause() -# openhands.sdk.security -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security - -### class AlwaysConfirm - -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +Pause the workspace to conserve resources. -#### Methods +For local workspaces, this is a no-op. +For container-based workspaces, this pauses the container. -#### model_config = (configuration object) +* Raises: + `NotImplementedError` – If the workspace type does not support pausing. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +#### resume() -#### should_confirm() +Resume a paused workspace. -Determine if an action with the given risk level requires confirmation. +For local workspaces, this is a no-op. +For container-based workspaces, this resumes the container. -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +* Raises: + `NotImplementedError` – If the workspace type does not support resuming. -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. +### class CommandResult -### class ConfirmRisky +Bases: `BaseModel` -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +Result of executing a command in the workspace. #### Properties -- `confirm_unknown`: bool -- `threshold`: [SecurityRisk](#class-securityrisk) +- `command`: str +- `exit_code`: int +- `stderr`: str +- `stdout`: str +- `timeout_occurred`: bool #### Methods @@ -14035,25 +13886,20 @@ Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### should_confirm() - -Determine if an action with the given risk level requires confirmation. +### class FileOperationResult -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +Bases: `BaseModel` -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. +Result of a file upload or download operation. -#### classmethod validate_threshold() -### class ConfirmationPolicyBase +#### Properties -Bases: `DiscriminatedUnionMixin`, `ABC` +- `destination_path`: str +- `error`: str | None +- `file_size`: int | None +- `source_path`: str +- `success`: bool #### Methods @@ -14061,5434 +13907,4559 @@ Bases: `DiscriminatedUnionMixin`, `ABC` Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### abstractmethod should_confirm() - -Determine if an action with the given risk level requires confirmation. - -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. - -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. - -### class GraySwanAnalyzer - -Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) +### class LocalWorkspace -Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. +Bases: [`BaseWorkspace`](#class-baseworkspace) -This analyzer sends conversation history and pending actions to the GraySwan -Cygnal API for security analysis. The API returns a violation score which is -mapped to SecurityRisk levels. +Local workspace implementation that operates on the host filesystem. -Environment Variables: -: GRAYSWAN_API_KEY: Required API key for GraySwan authentication - GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy +LocalWorkspace provides direct access to the local filesystem and command execution +environment. It’s suitable for development and testing scenarios where the agent +should operate directly on the host system. #### Example ```pycon ->>> from openhands.sdk.security.grayswan import GraySwanAnalyzer ->>> analyzer = GraySwanAnalyzer() ->>> risk = analyzer.security_risk(action_event) +>>> workspace = LocalWorkspace(working_dir="/path/to/project") +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") ``` +#### Methods -#### Properties - -- `api_key`: SecretStr | None -- `api_url`: str -- `history_limit`: int -- `low_threshold`: float -- `max_message_chars`: int -- `medium_threshold`: float -- `policy_id`: str | None -- `timeout`: float +#### __init__() -#### Methods +Create a new model by parsing and validating input data from keyword arguments. -#### close() +Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be +validated to form a valid model. -Clean up resources. +self is explicitly positional-only to allow self as a field name. -#### model_config = (configuration object) +#### execute_command() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Execute a bash command locally. -#### model_post_init() +Uses the shared shell execution utility to run commands with proper +timeout handling, output streaming, and error management. -Initialize the analyzer after model creation. +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, command, and + : timeout_occurred +* Return type: + [CommandResult](#class-commandresult) -#### security_risk() +#### file_download() -Analyze action for security risks using GraySwan API. +Download (copy) a file locally. -This method converts the conversation history and the pending action -to OpenAI message format and sends them to the GraySwan Cygnal API -for security analysis. +For local systems, file download is implemented as a file copy operation +using shutil.copy2 to preserve metadata. * Parameters: - `action` – The ActionEvent to analyze + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied * Returns: - SecurityRisk level based on GraySwan analysis - -#### set_events() + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) -Set the events for context when analyzing actions. +#### file_upload() -* Parameters: - `events` – Sequence of events to use as context for security analysis +Upload (copy) a file locally. -#### validate_thresholds() +For local systems, file upload is implemented as a file copy operation +using shutil.copy2 to preserve metadata. -Validate that thresholds are properly ordered. +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) -### class LLMSecurityAnalyzer +#### git_changes() -Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) +Get the git changes for the repository at the path given. -LLM-based security analyzer. +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed -This analyzer respects the security_risk attribute that can be set by the LLM -when generating actions, similar to OpenHands’ LLMRiskAnalyzer. +#### git_diff() -It provides a lightweight security analysis approach that leverages the LLM’s -understanding of action context and potential risks. +Get the git diff for the file at the path given. -#### Methods +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed #### model_config = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### security_risk() - -Evaluate security risk based on LLM-provided assessment. - -This method checks if the action has a security_risk attribute set by the LLM -and returns it. The LLM may not always provide this attribute but it defaults to -UNKNOWN if not explicitly set. +#### pause() -### class NeverConfirm +Pause the workspace (no-op for local workspaces). -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +Local workspaces have nothing to pause since they operate directly +on the host filesystem. -#### Methods +#### resume() -#### model_config = (configuration object) +Resume the workspace (no-op for local workspaces). -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Local workspaces have nothing to resume since they operate directly +on the host filesystem. -#### should_confirm() +### class RemoteWorkspace -Determine if an action with the given risk level requires confirmation. +Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +Remote workspace implementation that connects to an OpenHands agent server. -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. +RemoteWorkspace provides access to a sandboxed environment running on a remote +OpenHands agent server. This is the recommended approach for production deployments +as it provides better isolation and security. -### class SecurityAnalyzerBase +#### Example -Bases: `DiscriminatedUnionMixin`, `ABC` +```pycon +>>> workspace = RemoteWorkspace( +... host="https://agent-server.example.com", +... working_dir="/workspace" +... ) +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` -Abstract base class for security analyzers. -Security analyzers evaluate the risk of actions before they are executed -and can influence the conversation flow based on security policies. +#### Properties -This is adapted from OpenHands SecurityAnalyzer but designed to work -with the agent-sdk’s conversation-based architecture. +- `alive`: bool + Check if the remote workspace is alive by querying the health endpoint. + * Returns: + True if the health endpoint returns a successful response, False otherwise. +- `client`: Client #### Methods -#### analyze_event() +#### execute_command() -Analyze an event for security risks. +Execute a bash command on the remote system. -This is a convenience method that checks if the event is an action -and calls security_risk() if it is. Non-action events return None. +This method starts a bash command via the remote agent server API, +then polls for the output until the command completes. * Parameters: - `event` – The event to analyze + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds * Returns: - ActionSecurityRisk if event is an action, None otherwise + Result with stdout, stderr, exit_code, and other metadata +* Return type: + [CommandResult](#class-commandresult) -#### analyze_pending_actions() +#### file_download() -Analyze all pending actions in a conversation. +Download a file from the remote system. -This method gets all unmatched actions from the conversation state -and analyzes each one for security risks. +Requests the file from the remote system via HTTP API and saves it locally. * Parameters: - `conversation` – The conversation to analyze + * `source_path` – Path to the source file on remote system + * `destination_path` – Path where the file should be saved locally * Returns: - List of tuples containing (action, risk_level) for each pending action - -#### model_config = (configuration object) - -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) -#### abstractmethod security_risk() +#### file_upload() -Evaluate the security risk of an ActionEvent. +Upload a file to the remote system. -This is the core method that analyzes an ActionEvent and returns its risk level. -Implementations should examine the action’s content, context, and potential -impact to determine the appropriate risk level. +Reads the local file and sends it to the remote system via HTTP API. * Parameters: - `action` – The ActionEvent to analyze for security risks + * `source_path` – Path to the local source file + * `destination_path` – Path where the file should be uploaded on remote system * Returns: - ActionSecurityRisk enum indicating the risk level - -#### should_require_confirmation() + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) -Determine if an action should require user confirmation. +#### git_changes() -This implements the default confirmation logic based on risk level -and confirmation mode settings. +Get the git changes for the repository at the path given. * Parameters: - * `risk` – The security risk level of the action - * `confirmation_mode` – Whether confirmation mode is enabled + `path` – Path to the git repository * Returns: - True if confirmation is required, False otherwise - -### class SecurityRisk - -Bases: `str`, `Enum` - -Security risk levels for actions. - -Based on OpenHands security risk levels but adapted for agent-sdk. -Integer values allow for easy comparison and ordering. - - -#### Properties - -- `description`: str - Get a human-readable description of the risk level. -- `visualize`: Text - Return Rich Text representation of this risk level. + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed -#### Methods +#### git_diff() -#### HIGH = 'HIGH' +Get the git diff for the file at the path given. -#### LOW = 'LOW' +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed -#### MEDIUM = 'MEDIUM' +#### model_config = (configuration object) -#### UNKNOWN = 'UNKNOWN' +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### get_color() +#### model_post_init() -Get the color for displaying this risk level in Rich text. +Override this method to perform additional initialization after __init__ and model_construct. +This is useful if you want to do some validation that requires the entire model to be initialized. -#### is_riskier() +#### reset_client() -Check if this risk level is riskier than another. +Reset the HTTP client to force re-initialization. -Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is -less risky than HIGH. UNKNOWN is not comparable to any other level. +This is useful when connection parameters (host, api_key) have changed +and the client needs to be recreated with new values. -To make this act like a standard well-ordered domain, we reflexively consider -risk levels to be riskier than themselves. That is: +### class Workspace - for risk_level in list(SecurityRisk): - : assert risk_level.is_riskier(risk_level) +### class Workspace - # More concretely: - assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) - assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) - assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) +Bases: `object` -This can be disabled by setting the reflexive parameter to False. +Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. -* Parameters: - other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. - reflexive (bool*) – Whether the relationship is reflexive. -* Raises: - `ValueError` – If either risk level is UNKNOWN. +Usage: +: - Workspace(working_dir=…) -> LocalWorkspace + - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace +### Agent +Source: https://docs.openhands.dev/sdk/arch/agent.md -# openhands.sdk.tool -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool +The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. -### class Action +**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) -Bases: `Schema`, `ABC` +## Core Responsibilities -Base schema for input action. +The Agent system has four primary responsibilities: +1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history +2. **Tool Orchestration** - Select and execute tools, handle results and errors +3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) +4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) -#### Properties +## Architecture -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `visualize`: Text - Return Rich Text representation of this action. - This method can be overridden by subclasses to customize visualization. - The base implementation displays all action fields systematically. -### class ExecutableTool +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +flowchart TB + subgraph Input[" "] + Events["Event History"] + Context["Agent Context
Skills + Prompts"] + end + + subgraph Core["Agent Core"] + Condense["Condenser
History compression"] + Reason["LLM Query
Generate actions"] + Security["Security Analyzer
Risk assessment"] + end + + subgraph Execution[" "] + Tools["Tool Executor
Action → Observation"] + Results["Observation Events"] + end + + Events --> Condense + Context -.->|Skills| Reason + Condense --> Reason + Reason --> Security + Security --> Tools + Tools --> Results + Results -.->|Feedback| Events + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Reason primary + class Condense,Security secondary + class Tools tertiary +``` -Bases: `Protocol` +### Key Components -Protocol for tools that are guaranteed to have a non-None executor. +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | +| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | +| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | +| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | +| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | -This eliminates the need for runtime None checks and type narrowing -when working with tools that are known to be executable. +## Reasoning-Action Loop +The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: -#### Properties +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% +flowchart TB + Start["step() called"] + Pending{"Pending
actions?"} + ExecutePending["Execute pending actions"] + + HasCondenser{"Has
condenser?"} + Condense["Call condenser.condense()"] + CondenseResult{"Result
type?"} + EmitCondensation["Emit Condensation event"] + UseView["Use View events"] + UseRaw["Use raw events"] + + Query["Query LLM with messages"] + ContextExceeded{"Context
window
exceeded?"} + EmitRequest["Emit CondensationRequest"] + + Parse{"Response
type?"} + CreateActions["Create ActionEvents"] + CreateMessage["Create MessageEvent"] + + Confirmation{"Need
confirmation?"} + SetWaiting["Set WAITING_FOR_CONFIRMATION"] + + Execute["Execute actions"] + Observe["Create ObservationEvents"] + + Return["Return"] + + Start --> Pending + Pending -->|Yes| ExecutePending --> Return + Pending -->|No| HasCondenser + + HasCondenser -->|Yes| Condense + HasCondenser -->|No| UseRaw + Condense --> CondenseResult + CondenseResult -->|Condensation| EmitCondensation --> Return + CondenseResult -->|View| UseView --> Query + UseRaw --> Query + + Query --> ContextExceeded + ContextExceeded -->|Yes| EmitRequest --> Return + ContextExceeded -->|No| Parse + + Parse -->|Tool calls| CreateActions + Parse -->|Message| CreateMessage --> Return + + CreateActions --> Confirmation + Confirmation -->|Yes| SetWaiting --> Return + Confirmation -->|No| Execute + + Execute --> Observe + Observe --> Return + + style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] -- `name`: str +**Step Execution Flow:** -#### Methods +1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return +2. **Condensation:** If condenser exists: + - Call `condenser.condense()` with current event view + - If returns `View`: use condensed events for LLM query (continue in same step) + - If returns `Condensation`: emit event and return (will be processed next step) +3. **LLM Query:** Query LLM with messages from event history + - If context window exceeded: emit `CondensationRequest` and return +4. **Response Parsing:** Parse LLM response into events + - Tool calls → create `ActionEvent`(s) + - Text message → create `MessageEvent` and return +5. **Confirmation Check:** If actions need user approval: + - Set conversation status to `WAITING_FOR_CONFIRMATION` and return +6. **Action Execution:** Execute tools and create `ObservationEvent`(s) -#### __init__() +**Key Characteristics:** +- **Stateless:** Agent holds no mutable state between steps +- **Event-Driven:** Reads from event history, writes new events +- **Interruptible:** Each step is atomic and can be paused/resumed -### class FinishTool +## Agent Context -Bases: `ToolDefinition[FinishAction, FinishObservation]` +The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: -Tool for signaling the completion of a task or conversation. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Context["AgentContext"] + + subgraph Skills["Skills"] + Repo["repo
Always active"] + Knowledge["knowledge
Trigger-based"] + end + SystemAug["System prompt prefix/suffix
Per-conversation"] + System["Prompt template
Per-conversation"] + + subgraph Application["Applied to LLM"] + SysPrompt["System Prompt"] + UserMsg["User Messages"] + end + + Context --> Skills + Context --> SystemAug + Repo --> SysPrompt + Knowledge -.->|When triggered| UserMsg + System --> SysPrompt + SystemAug --> SysPrompt + + style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` +| Skill Type | Activation | Use Case | +|------------|------------|----------| +| **repo** | Always included | Project-specific context, conventions | +| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | -#### Properties +Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### Methods +## Tool Execution -#### classmethod create() +Tools follow a **strict action-observation pattern**: -Create FinishTool instance. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + LLM["LLM generates tool_call"] + Convert["Convert to ActionEvent"] + + Decision{"Confirmation
mode?"} + Defer["Store as pending"] + + Execute["Execute tool"] + Success{"Success?"} + + Obs["ObservationEvent
with result"] + Error["ObservationEvent
with error"] + + LLM --> Convert + Convert --> Decision + + Decision -->|Yes| Defer + Decision -->|No| Execute + + Execute --> Success + Success -->|Yes| Obs + Success -->|No| Error + + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -* Parameters: - * `conv_state` – Optional conversation state (not used by FinishTool). - params* – Additional parameters (none supported). -* Returns: - A sequence containing a single FinishTool instance. -* Raises: - `ValueError` – If any parameters are provided. +**Execution Modes:** -#### name = 'finish' +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | -### class Observation +**Security Integration:** -Bases: `Schema`, `ABC` +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation -Base schema for output observation. +## Component Relationships +### How Agent Interacts -#### Properties +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' -- `content`: list[TextContent | ImageContent] -- `is_error`: bool -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `text`: str - Extract all text content from the observation. - * Returns: - Concatenated text from all TextContent items in content. -- `to_llm_content`: Sequence[TextContent | ImageContent] - Default content formatting for converting observation to LLM readable content. - Subclasses can override to provide richer content (e.g., images, diffs). -- `visualize`: Text - Return Rich Text representation of this observation. - Subclasses can override for custom visualization; by default we show the - same text that would be sent to the LLM. +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries -#### Methods -#### classmethod from_text() +## See Also -Utility to create an Observation from a simple text string. +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction -* Parameters: - * `text` – The text content to include in the observation. - * `is_error` – Whether this observation represents an error. - kwargs* – Additional fields for the observation subclass. -* Returns: - An Observation instance with the text wrapped in a TextContent. +### Agent Server Package +Source: https://docs.openhands.dev/sdk/arch/agent-server.md -### class ThinkTool +The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. -Bases: `ToolDefinition[ThinkAction, ThinkObservation]` +**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) -Tool for logging thoughts without making changes. +## Purpose +The Agent Server enables: +- **Remote execution**: Clients interact with agents via HTTP API +- **Multi-user isolation**: Each user gets isolated workspace +- **Container orchestration**: Manages Docker containers for workspaces +- **Centralized management**: Monitor and control all agents +- **Scalability**: Horizontal scaling with multiple servers -#### Properties +## Architecture Overview -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +```mermaid +graph TB + Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] + + API --> Auth[Authentication] + API --> Router[API Router] + + Router --> WS[Workspace Manager] + Router --> Conv[Conversation Handler] + + WS --> Docker[Docker Manager] + Docker --> C1[Container 1
User A] + Docker --> C2[Container 2
User B] + Docker --> C3[Container 3
User C] + + Conv --> Agent[Software Agent SDK] + Agent --> C1 + Agent --> C2 + Agent --> C3 + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style WS fill:#e8f5e8 + style Docker fill:#f3e5f5 + style Agent fill:#fce4ec +``` -#### Methods +### Key Components -#### classmethod create() +**1. FastAPI Server** +- HTTP REST API endpoints +- Authentication and authorization +- Request validation +- WebSocket support for streaming -Create ThinkTool instance. +**2. Workspace Manager** +- Creates and manages Docker containers +- Isolates workspaces per user +- Handles container lifecycle +- Manages resource limits -* Parameters: - * `conv_state` – Optional conversation state (not used by ThinkTool). - params* – Additional parameters (none supported). -* Returns: - A sequence containing a single ThinkTool instance. -* Raises: - `ValueError` – If any parameters are provided. +**3. Conversation Handler** +- Routes requests to appropriate workspace +- Manages conversation state +- Handles concurrent requests +- Supports streaming responses -#### name = 'think' +**4. Docker Manager** +- Interfaces with Docker daemon +- Builds and pulls images +- Creates and destroys containers +- Monitors container health -### class Tool +## Design Decisions -Bases: `BaseModel` +### Why HTTP API? -Defines a tool to be initialized for the agent. +Alternative approaches considered: +- **gRPC**: More efficient but harder for web clients +- **WebSockets only**: Good for streaming but not RESTful +- **HTTP + WebSockets**: Best of both worlds -This is only used in agent-sdk for type schema for server use. +**Decision**: HTTP REST for operations, WebSockets for streaming +- ✅ Works from any client (web, mobile, CLI) +- ✅ Easy to debug (curl, Postman) +- ✅ Standard authentication (API keys, OAuth) +- ✅ Streaming where needed +### Why Container Per User? -#### Properties +Alternative approaches: +- **Shared container**: Multiple users in one container +- **Container per session**: New container each conversation +- **Container per user**: One container per user (chosen) -- `name`: str -- `params`: dict[str, Any] +**Decision**: Container per user +- ✅ Strong isolation between users +- ✅ Persistent workspace across sessions +- ✅ Better resource management +- ⚠️ More containers, but worth it for isolation -#### Methods +### Why FastAPI? -#### model_config = (configuration object) +Alternative frameworks: +- **Flask**: Simpler but less type-safe +- **Django**: Too heavyweight +- **FastAPI**: Modern, fast, type-safe (chosen) -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +**Decision**: FastAPI +- ✅ Automatic API documentation (OpenAPI) +- ✅ Type validation with Pydantic +- ✅ Async support for performance +- ✅ WebSocket support built-in -#### classmethod validate_name() +## API Design -Validate that name is not empty. +### Key Endpoints -#### classmethod validate_params() +**Workspace Management** +``` +POST /workspaces Create new workspace +GET /workspaces/{id} Get workspace info +DELETE /workspaces/{id} Delete workspace +POST /workspaces/{id}/execute Execute command +``` -Convert None params to empty dict. +**Conversation Management** +``` +POST /conversations Create conversation +GET /conversations/{id} Get conversation +POST /conversations/{id}/messages Send message +GET /conversations/{id}/stream Stream responses (WebSocket) +``` -### class ToolAnnotations +**Health & Monitoring** +``` +GET /health Server health check +GET /metrics Prometheus metrics +``` -Bases: `BaseModel` +### Authentication -Annotations to provide hints about the tool’s behavior. +**API Key Authentication** +```bash +curl -H "Authorization: Bearer YOUR_API_KEY" \ + https://agent-server.example.com/conversations +``` -Based on Model Context Protocol (MCP) spec: -[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) +**Per-user workspace isolation** +- API key → user ID mapping +- Each user gets separate workspace +- Users can't access each other's workspaces +### Streaming Responses -#### Properties +**WebSocket for real-time updates** +```python +async with websocket_connect(url) as ws: + # Send message + await ws.send_json({"message": "Hello"}) + + # Receive events + async for event in ws: + if event["type"] == "message": + print(event["content"]) +``` -- `destructiveHint`: bool -- `idempotentHint`: bool -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `openWorldHint`: bool -- `readOnlyHint`: bool -- `title`: str | None -### class ToolDefinition +**Why streaming?** +- Real-time feedback to users +- Show agent thinking process +- Better UX for long-running tasks -Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` +## Deployment Models -Base class for all tool implementations. +### 1. Local Development -This class serves as a base for the discriminated union of all tool types. -All tools must inherit from this class and implement the .create() method for -proper initialization with executors and parameters. +Run server locally for testing: +```bash +# Start server +openhands-agent-server --port 8000 -Features: -- Normalize input/output schemas (class or dict) into both model+schema. -- Validate inputs before execute. -- Coerce outputs only if an output model is defined; else return vanilla JSON. -- Export MCP tool description. +# Or with Docker +docker run -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` -#### Examples +**Use case**: Development and testing -Simple tool with no parameters: -: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): - : @classmethod - def create(cls, conv_state=None, - `
` - ``` - ** - ``` - `
` - params): - `
` - > return [cls(name=”finish”, …, executor=FinishExecutor())] +### 2. Single-Server Deployment -Complex tool with initialization parameters: -: class TerminalTool(ToolDefinition[TerminalAction, - : TerminalObservation]): - @classmethod - def create(cls, conv_state, - `
` - ``` - ** - ``` - `
` - params): - `
` - > executor = TerminalExecutor( - > : working_dir=conv_state.workspace.working_dir, - > `
` - > ``` - > ** - > ``` - > `
` - > params, - `
` - > ) - > return [cls(name=”terminal”, …, executor=executor)] - - -#### Properties - -- `action_type`: type[[Action](#class-action)] -- `annotations`: [ToolAnnotations](#class-toolannotations) | None -- `description`: str -- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] -- `meta`: dict[str, Any] | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: ClassVar[str] = '' -- `observation_type`: type[[Observation](#class-observation)] | None -- `title`: str - -#### Methods - -#### action_from_arguments() - -Create an action from parsed arguments. - -This method can be overridden by subclasses to provide custom logic -for creating actions from arguments (e.g., for MCP tools). +Deploy on one server (VPS, EC2, etc.): +```bash +# Install +pip install openhands-agent-server -* Parameters: - `arguments` – The parsed arguments from the tool call. -* Returns: - The action instance created from the arguments. +# Run with systemd/supervisor +openhands-agent-server \ + --host 0.0.0.0 \ + --port 8000 \ + --workers 4 +``` -#### as_executable() +**Use case**: Small deployments, prototypes, MVPs -Return this tool as an ExecutableTool, ensuring it has an executor. +### 3. Multi-Server Deployment -This method eliminates the need for runtime None checks by guaranteeing -that the returned tool has a non-None executor. +Scale horizontally with load balancer: +``` + Load Balancer + | + +-------------+-------------+ + | | | + Server 1 Server 2 Server 3 + (Agents) (Agents) (Agents) + | | | + +-------------+-------------+ + | + Shared State Store + (Database, Redis, etc.) +``` -* Returns: - This tool instance, typed as ExecutableTool. -* Raises: - `NotImplementedError` – If the tool has no executor. +**Use case**: Production SaaS, high traffic, need redundancy -#### abstractmethod classmethod create() +### 4. Kubernetes Deployment -Create a sequence of Tool instances. +Container orchestration with Kubernetes: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + template: + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 +``` -This method must be implemented by all subclasses to provide custom -initialization logic, typically initializing the executor with parameters -from conv_state and other optional parameters. +**Use case**: Enterprise deployments, auto-scaling, high availability -* Parameters: - args** – Variable positional arguments (typically conv_state as first arg). - kwargs* – Optional parameters for tool initialization. -* Returns: - A sequence of Tool instances. Even single tools are returned as a sequence - to provide a consistent interface and eliminate union return types. +## Resource Management -#### classmethod resolve_kind() +### Container Limits -Resolve a kind string to its corresponding tool class. +Set per-workspace resource limits: +```python +# In server configuration +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "2g", # 2GB RAM + "cpus": "2", # 2 CPU cores + "disk": "10g" # 10GB disk + }, + "timeout": 300, # 5 min timeout +} +``` -* Parameters: - `kind` – The name of the tool class to resolve -* Returns: - The tool class corresponding to the kind -* Raises: - `ValueError` – If the kind is unknown +**Why limit resources?** +- Prevent one user from consuming all resources +- Fair usage across users +- Protect server from runaway processes +- Cost control -#### set_executor() +### Cleanup & Garbage Collection -Create a new Tool instance with the given executor. +**Container lifecycle**: +- Containers created on first use +- Kept alive between requests (warm) +- Cleaned up after inactivity timeout +- Force cleanup on server shutdown -#### to_mcp_tool() +**Storage management**: +- Old workspaces deleted automatically +- Disk usage monitored +- Alerts when approaching limits -Convert a Tool to an MCP tool definition. +## Security Considerations -Allow overriding input/output schemas (usually by subclasses). +### Multi-Tenant Isolation -* Parameters: - * `input_schema` – Optionally override the input schema. - * `output_schema` – Optionally override the output schema. +**Container isolation**: +- Each user gets separate container +- Containers can't communicate +- Network isolation (optional) +- File system isolation -#### to_openai_tool() +**API isolation**: +- API keys mapped to users +- Users can only access their workspaces +- Server validates all permissions -Convert a Tool to an OpenAI tool. +### Input Validation -* Parameters: - * `add_security_risk_prediction` – Whether to add a security_risk field - to the action schema for LLM to predict. This is useful for - tools that may have safety risks, so the LLM can reason about - the risk level before calling the tool. - * `action_type` – Optionally override the action_type to use for the schema. - This is useful for MCPTool to use a dynamically created action type - based on the tool’s input schema. +**Server validates**: +- API request schemas +- Command injection attempts +- Path traversal attempts +- File size limits -#### NOTE -Summary field is always added to the schema for transparency and -explainability of agent actions. +**Defense in depth**: +- API validation +- Container validation +- Docker security features +- OS-level security -#### to_responses_tool() +### Network Security -Convert a Tool to a Responses API function tool (LiteLLM typed). +**Best practices**: +- HTTPS only (TLS certificates) +- Firewall rules (only port 443/8000) +- Rate limiting +- DDoS protection -For Responses API, function tools expect top-level keys: -(JSON configuration object) +**Container networking**: +```python +# Disable network for workspace +WORKSPACE_CONFIG = { + "network_mode": "none" # No network access +} -* Parameters: - * `add_security_risk_prediction` – Whether to add a security_risk field - * `action_type` – Optional override for the action type +# Or allow specific hosts +WORKSPACE_CONFIG = { + "allowed_hosts": ["api.example.com"] +} +``` -#### NOTE -Summary field is always added to the schema for transparency and -explainability of agent actions. +## Monitoring & Observability -### class ToolExecutor +### Health Checks -Bases: `ABC`, `Generic` +```bash +# Simple health check +curl https://agent-server.example.com/health -Executor function type for a Tool. +# Response +{ + "status": "healthy", + "docker": "connected", + "workspaces": 15, + "uptime": 86400 +} +``` -#### Methods +### Metrics -#### close() +**Prometheus metrics**: +- Request count and latency +- Active workspaces +- Container resource usage +- Error rates -Close the executor and clean up resources. +**Logging**: +- Structured JSON logs +- Per-request tracing +- Workspace events +- Error tracking -Default implementation does nothing. Subclasses should override -this method to perform cleanup (e.g., closing connections, -terminating processes, etc.). +### Alerting +**Alert on**: +- Server down +- High error rate +- Resource exhaustion +- Container failures -# openhands.sdk.utils -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils +## Client SDK -Utility functions for the OpenHands SDK. +Python SDK for interacting with Agent Server: -### deprecated() +```python +from openhands.client import AgentServerClient -Return a decorator that deprecates a callable with explicit metadata. +client = AgentServerClient( + url="https://agent-server.example.com", + api_key="your-api-key" +) -Use this helper when you can annotate a function, method, or property with -@deprecated(…). It transparently forwards to `deprecation.deprecated()` -while filling in the SDK’s current version metadata unless custom values are -supplied. +# Create conversation +conversation = client.create_conversation() -### maybe_truncate() +# Send message +response = client.send_message( + conversation_id=conversation.id, + message="Hello, agent!" +) -Truncate the middle of content if it exceeds the specified length. +# Stream responses +for event in client.stream_conversation(conversation.id): + if event.type == "message": + print(event.content) +``` -Keeps the head and tail of the content to preserve context at both ends. -Optionally saves the full content to a file for later investigation. +**Client handles**: +- Authentication +- Request/response serialization +- Error handling +- Streaming +- Retries -* Parameters: - * `content` – The text content to potentially truncate - * `truncate_after` – Maximum length before truncation. If None, no truncation occurs - * `truncate_notice` – Notice to insert in the middle when content is truncated - * `save_dir` – Working directory to save full content file in - * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) -* Returns: - Original content if under limit, or truncated content with head and tail - preserved and reference to saved file if applicable +## Cost Considerations -### sanitize_openhands_mentions() +### Server Costs -Sanitize @OpenHands mentions in text to prevent self-mention loops. +**Compute**: CPU and memory for containers +- Each active workspace = 1 container +- Typically 1-2 GB RAM per workspace +- 0.5-1 CPU core per workspace -This function inserts a zero-width joiner (ZWJ) after the @ symbol in -@OpenHands mentions, making them non-clickable in GitHub comments while -preserving readability. The original case of the mention is preserved. +**Storage**: Workspace files and conversation state +- ~1-10 GB per workspace (depends on usage) +- Conversation history in database -* Parameters: - `text` – The text to sanitize -* Returns: - Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) +**Network**: API requests and responses +- Minimal (mostly text) +- Streaming adds bandwidth -### Examples +### Cost Optimization -```pycon ->>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") -'Thanks @u200dOpenHands for the help!' ->>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") -'Check @u200dopenhands and @u200dOPENHANDS' ->>> sanitize_openhands_mentions("No mention here") -'No mention here' +**1. Idle timeout**: Shutdown containers after inactivity +```python +WORKSPACE_CONFIG = { + "idle_timeout": 3600 # 1 hour +} ``` -### sanitized_env() - -Return a copy of env with sanitized values. - -PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored -libraries win. This function restores the original value so that subprocess -will not use them. - -### warn_deprecated() - -Emit a deprecation warning for dynamic access to a legacy feature. - -Prefer this helper when a decorator is not practical—e.g. attribute accessors, -data migrations, or other runtime paths that must conditionally warn. Provide -explicit version metadata so the SDK reports consistent messages and upgrades -to `deprecation.UnsupportedWarning` after the removal threshold. +**2. Resource limits**: Don't over-provision +```python +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "1g", # Smaller limit + "cpus": "0.5" # Fractional CPU + } +} +``` +**3. Shared resources**: Use single server for multiple low-traffic apps -# openhands.sdk.workspace -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace +**4. Auto-scaling**: Scale servers based on demand -### class BaseWorkspace +## When to Use Agent Server -Bases: `DiscriminatedUnionMixin`, `ABC` +### Use Agent Server When: -Abstract base class for workspace implementations. +✅ **Multi-user system**: Web app with many users +✅ **Remote clients**: Mobile app, web frontend +✅ **Centralized management**: Need to monitor all agents +✅ **Workspace isolation**: Users shouldn't interfere +✅ **SaaS product**: Building agent-as-a-service +✅ **Scaling**: Need to handle concurrent users -Workspaces provide a sandboxed environment where agents can execute commands, -read/write files, and perform other operations. All workspace implementations -support the context manager protocol for safe resource management. +**Examples**: +- Chatbot platforms +- Code assistant web apps +- Agent marketplaces +- Enterprise agent deployments -#### Example +### Use Standalone SDK When: -```pycon ->>> with workspace: -... result = workspace.execute_command("echo 'hello'") -... content = workspace.read_file("example.txt") -``` +✅ **Single-user**: Personal tool or script +✅ **Local execution**: Running on your machine +✅ **Full control**: Need programmatic access +✅ **Simpler deployment**: No server management +✅ **Lower latency**: No network overhead +**Examples**: +- CLI tools +- Automation scripts +- Local development +- Desktop applications -#### Properties +### Hybrid Approach -- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] +Use SDK locally but RemoteAPIWorkspace for execution: +- Agent logic in your Python code +- Execution happens on remote server +- Best of both worlds -#### Methods +## Building Custom Agent Server -#### abstractmethod execute_command() +The server is extensible for custom needs: -Execute a bash command on the system. +**Custom authentication**: +```python +from openhands.agent_server import AgentServer -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory for the command (optional) - * `timeout` – Timeout in seconds (defaults to 30.0) -* Returns: - Result containing stdout, stderr, exit_code, and other - : metadata -* Return type: - [CommandResult](#class-commandresult) -* Raises: - `Exception` – If command execution fails +class CustomAgentServer(AgentServer): + async def authenticate(self, request): + # Custom auth logic + return await oauth_verify(request) +``` -#### abstractmethod file_download() +**Custom workspace configuration**: +```python +server = AgentServer( + workspace_factory=lambda user: DockerWorkspace( + image=f"custom-image-{user.tier}", + resource_limits=user.resource_limits + ) +) +``` -Download a file from the system. +**Custom middleware**: +```python +@server.middleware +async def logging_middleware(request, call_next): + # Custom logging + response = await call_next(request) + return response +``` -* Parameters: - * `source_path` – Path to the source file on the system - * `destination_path` – Path where the file should be downloaded -* Returns: - Result containing success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) -* Raises: - `Exception` – If file download fails +## Next Steps -#### abstractmethod file_upload() +### For Usage Examples -Upload a file to the system. +- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API +- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be uploaded -* Returns: - Result containing success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) -* Raises: - `Exception` – If file upload fails +### For Related Architecture -#### abstractmethod git_changes() +- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details +- [SDK Architecture](/sdk/arch/sdk) - Core framework +- [Architecture Overview](/sdk/arch/overview) - System design -Get the git changes for the repository at the path given. +### For Implementation Details -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples -#### abstractmethod git_diff() +### Condenser +Source: https://docs.openhands.dev/sdk/arch/condenser.md -Get the git diff for the file at the path given. +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed +**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) -#### model_config = (configuration object) +## Core Responsibilities -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +The Condenser system has four primary responsibilities: -#### pause() +1. **History Compression** - Reduce event lists to fit within context windows +2. **Threshold Detection** - Determine when condensation should trigger +3. **Summary Generation** - Create meaningful summaries via LLM or heuristics +4. **View Management** - Transform event history into LLM-ready views -Pause the workspace to conserve resources. +## Architecture -For local workspaces, this is a no-op. -For container-based workspaces, this pauses the container. - -* Raises: - `NotImplementedError` – If the workspace type does not support pausing. - -#### resume() - -Resume a paused workspace. - -For local workspaces, this is a no-op. -For container-based workspaces, this resumes the container. - -* Raises: - `NotImplementedError` – If the workspace type does not support resuming. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["CondenserBase
Abstract base"] + end + + subgraph Implementations["Concrete Implementations"] + NoOp["NoOpCondenser
No compression"] + LLM["LLMSummarizingCondenser
LLM-based"] + Pipeline["PipelineCondenser
Multi-stage"] + end + + subgraph Process["Condensation Process"] + View["View
Event history"] + Check["should_condense()?"] + Condense["get_condensation()"] + Result["View | Condensation"] + end + + subgraph Output["Condensation Output"] + CondEvent["Condensation Event
Summary metadata"] + NewView["Condensed View
Reduced tokens"] + end + + Base --> NoOp + Base --> LLM + Base --> Pipeline + + View --> Check + Check -->|Yes| Condense + Check -->|No| Result + Condense --> CondEvent + CondEvent --> NewView + NewView --> Result + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class LLM,Pipeline secondary + class Check,Condense tertiary +``` -### class CommandResult +### Key Components -Bases: `BaseModel` +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | +| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | +| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | +| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | +| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | +| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | -Result of executing a command in the workspace. +## Condenser Types +### NoOpCondenser -#### Properties +Pass-through condenser that performs no compression: -- `command`: str -- `exit_code`: int -- `stderr`: str -- `stdout`: str -- `timeout_occurred`: bool +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["View"] + NoOp["NoOpCondenser"] + Same["Same View"] + + View --> NoOp --> Same + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -#### Methods +### LLMSummarizingCondenser -#### model_config = (configuration object) +Uses an LLM to generate summaries of conversation history: -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + View["Long View
120+ events"] + Check["Threshold
exceeded?"] + Summarize["LLM Summarization"] + Summary["Summary Text"] + Metadata["Condensation Event"] + AddToHistory["Add to History"] + NextStep["Next Step: View.from_events()"] + NewView["Condensed View"] + + View --> Check + Check -->|Yes| Summarize + Summarize --> Summary + Summary --> Metadata + Metadata --> AddToHistory + AddToHistory --> NextStep + NextStep --> NewView + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -### class FileOperationResult +**Process:** +1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) +2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) +3. **LLM Call:** Generate summary of middle events using dedicated LLM +4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` +5. **Add to History:** Agent adds `Condensation` to event log and returns early +6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary -Bases: `BaseModel` +**Configuration:** +- **`max_size`:** Event count threshold before condensation triggers (default: 120) +- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) +- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) -Result of a file upload or download operation. +### PipelineCondenser +Chains multiple condensers in sequence: -#### Properties +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["Original View"] + C1["Condenser 1"] + C2["Condenser 2"] + C3["Condenser 3"] + Final["Final View"] + + View --> C1 --> C2 --> C3 --> Final + + style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -- `destination_path`: str -- `error`: str | None -- `file_size`: int | None -- `source_path`: str -- `success`: bool +**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) -#### Methods +## Condensation Flow -#### model_config = (configuration object) +### Trigger Mechanisms -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Condensers can be triggered in two ways: -### class LocalWorkspace +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Automatic["Automatic Trigger"] + Agent1["Agent Step"] + Build1["View.from_events()"] + Check1["condenser.condense(view)"] + Trigger1["should_condense()?"] + end + + Agent1 --> Build1 --> Check1 --> Trigger1 + + style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -Bases: [`BaseWorkspace`](#class-baseworkspace) +**Automatic Trigger:** +- **When:** Threshold exceeded (e.g., event count > `max_size`) +- **Who:** Agent calls `condenser.condense()` each step +- **Purpose:** Proactively keep context within limits -Local workspace implementation that operates on the host filesystem. -LocalWorkspace provides direct access to the local filesystem and command execution -environment. It’s suitable for development and testing scenarios where the agent -should operate directly on the host system. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Manual["Manual Trigger"] + Error["LLM Context Error"] + Request["CondensationRequest Event"] + NextStep["Next Agent Step"] + Trigger2["condense() detects request"] + end + + Error --> Request --> NextStep --> Trigger2 + + style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` +**Manual Trigger:** +- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) +- **Who:** Agent (on LLM context window error) or application code +- **Purpose:** Force compression when context limit exceeded -#### Example +### Condensation Workflow -```pycon ->>> workspace = LocalWorkspace(working_dir="/path/to/project") ->>> with workspace: -... result = workspace.execute_command("ls -la") -... content = workspace.read_file("README.md") +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent calls condense(view)"] + + Decision{"should_condense?"} + + ReturnView["Return View
Agent proceeds"] + + Extract["Select Events to Keep/Forget"] + Generate["LLM Generates Summary"] + Create["Create Condensation Event"] + ReturnCond["Return Condensation"] + AddHistory["Agent adds to history"] + NextStep["Next Step: View.from_events()"] + FilterEvents["Filter forgotten events"] + InsertSummary["Insert summary at offset"] + NewView["New condensed view"] + + Start --> Decision + Decision -->|No| ReturnView + Decision -->|Yes| Extract + Extract --> Generate + Generate --> Create + Create --> ReturnCond + ReturnCond --> AddHistory + AddHistory --> NextStep + NextStep --> FilterEvents + FilterEvents --> InsertSummary + InsertSummary --> NewView + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -#### Methods +**Key Steps:** -#### __init__() +1. **Threshold Check:** `should_condense()` determines if condensation needed +2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) +3. **Summary Generation:** LLM creates compressed representation of forgotten events +4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary +5. **Return to Agent:** Condenser returns `Condensation` (not `View`) +6. **History Update:** Agent adds `Condensation` to event log and exits step +7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary -Create a new model by parsing and validating input data from keyword arguments. +## View and Condensation -Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be -validated to form a valid model. +### View Structure -self is explicitly positional-only to allow self as a field name. +A `View` represents the conversation history as it will be sent to the LLM: -#### execute_command() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Full Event List
+ Condensation events"] + FromEvents["View.from_events()"] + Filter["Filter forgotten events"] + Insert["Insert summary"] + View["View
LLMConvertibleEvents"] + Convert["events_to_messages()"] + LLM["LLM Input"] + + Events --> FromEvents + FromEvents --> Filter + Filter --> Insert + Insert --> View + View --> Convert + Convert --> LLM + + style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -Execute a bash command locally. +**View Components:** +- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) +- **`unhandled_condensation_request`:** Flag for pending manual condensation +- **`condensations`:** List of all Condensation events processed +- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics -Uses the shared shell execution utility to run commands with proper -timeout handling, output streaming, and error management. +### Condensation Event -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory (optional) - * `timeout` – Timeout in seconds -* Returns: - Result with stdout, stderr, exit_code, command, and - : timeout_occurred -* Return type: - [CommandResult](#class-commandresult) +When condensation occurs, a `Condensation` event is created: -#### file_download() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Old["Middle Events
~60 events"] + Summary["Summary Text
LLM-generated"] + Event["Condensation Event
forgotten_event_ids"] + Applied["View.from_events()"] + New["New View
~60 events + summary"] + + Old -.->|Summarized| Summary + Summary --> Event + Event --> Applied + Applied --> New + + style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -Download (copy) a file locally. +**Condensation Fields:** +- **`forgotten_event_ids`:** List of event IDs to filter out +- **`summary`:** Compressed text representation of forgotten events +- **`summary_offset`:** Index where summary event should be inserted +- Inherits from `Event`: `id`, `timestamp`, `source` -For local systems, file download is implemented as a file copy operation -using shutil.copy2 to preserve metadata. +## Rolling Window Pattern -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be copied -* Returns: - Result with success status and file information -* Return type: - [FileOperationResult](#class-fileoperationresult) +`RollingCondenser` implements a common pattern for threshold-based condensation: -#### file_upload() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + View["Current View
120+ events"] + Check["Count Events"] + + Compare{"Count >
max_size?"} + + Keep["Keep All Events"] + + Split["Split Events"] + Head["Head
First 4 events"] + Middle["Middle
~56 events"] + Tail["Tail
~56 events"] + Summarize["LLM Summarizes Middle"] + Result["Head + Summary + Tail
~60 events total"] + + View --> Check + Check --> Compare + + Compare -->|Under| Keep + Compare -->|Over| Split + + Split --> Head + Split --> Middle + Split --> Tail + + Middle --> Summarize + Head --> Result + Summarize --> Result + Tail --> Result + + style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Upload (copy) a file locally. +**Rolling Window Strategy:** +1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts +2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context +3. **Summarize Middle:** Compress events between head and tail into summary +4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) -For local systems, file upload is implemented as a file copy operation -using shutil.copy2 to preserve metadata. +## Component Relationships -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be copied -* Returns: - Result with success status and file information -* Return type: - [FileOperationResult](#class-fileoperationresult) +### How Condenser Integrates -#### git_changes() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Condenser["Condenser"] + State["Conversation State"] + Events["Event Log"] + + Agent -->|"View.from_events()"| State + State -->|View| Agent + Agent -->|"condense(view)"| Condenser + Condenser -->|"View | Condensation"| Agent + Agent -->|Adds Condensation| Events + + style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Get the git changes for the repository at the path given. +**Relationship Characteristics:** +- **Agent → State**: Calls `View.from_events()` to get current view +- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered +- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) +- **Agent → Events**: Adds `Condensation` event to log when returned -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +## See Also -#### git_diff() +- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning +- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management +- **[Events](/sdk/arch/events)** - Condensation event type and append-only log +- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers -Get the git diff for the file at the path given. +### Conversation +Source: https://docs.openhands.dev/sdk/arch/conversation.md -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed +The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. -#### model_config = (configuration object) +**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +## Core Responsibilities -#### pause() +The Conversation system has four primary responsibilities: -Pause the workspace (no-op for local workspaces). +1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents +2. **State Orchestration** - Maintain conversation history, events, and execution status +3. **Workspace Coordination** - Bridge agent operations with execution environments +4. **Runtime Services** - Provide persistence, monitoring, security, and visualization -Local workspaces have nothing to pause since they operate directly -on the host filesystem. +## Architecture -#### resume() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart LR + User["User Code"] + + subgraph Factory[" "] + Entry["Conversation()"] + end -Resume the workspace (no-op for local workspaces). + subgraph Implementations[" "] + Local["LocalConversation
Direct execution"] + Remote["RemoteConversation
Via agent-server API"] + end + + subgraph Core[" "] + State["ConversationState
• agent
workspace • stats • ..."] + EventLog["ConversationState.events
Event storage"] + end + + User --> Entry + Entry -.->|LocalWorkspace| Local + Entry -.->|RemoteWorkspace| Remote + + Local --> State + Remote --> State + + State --> EventLog + + classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + + class Entry factory + class Local,Remote impl + class State,EventLog core + class Persist,Stuck,Viz,Secrets service +``` -Local workspaces have nothing to resume since they operate directly -on the host filesystem. +### Key Components -### class RemoteWorkspace +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | +| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | +| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | +| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | +| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | -Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) +## Factory Pattern -Remote workspace implementation that connects to an OpenHands agent server. +The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: -RemoteWorkspace provides access to a sandboxed environment running on a remote -OpenHands agent server. This is the recommended approach for production deployments -as it provides better isolation and security. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Input["Conversation(agent, workspace)"] + Check{Workspace Type?} + Local["LocalConversation
Agent runs in-process"] + Remote["RemoteConversation
Agent runs via API"] + + Input --> Check + Check -->|str or LocalWorkspace| Local + Check -->|RemoteWorkspace| Remote + + style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -#### Example +**Dispatch Logic:** +- **Local:** String paths or `LocalWorkspace` → in-process execution +- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket -```pycon ->>> workspace = RemoteWorkspace( -... host="https://agent-server.example.com", -... working_dir="/workspace" -... ) ->>> with workspace: -... result = workspace.execute_command("ls -la") -... content = workspace.read_file("README.md") +This abstraction enables switching deployment modes without code changes—just swap the workspace type. + +## State Management + +State updates follow a **two-path pattern** depending on the type of change: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["State Update Request"] + Lock["Acquire FIFO Lock"] + Decision{New Event?} + + StateOnly["Update State Fields
stats, status, metadata"] + EventPath["Append to Event Log
messages, actions, observations"] + + Callback["Trigger Callbacks"] + Release["Release Lock"] + + Start --> Lock + Lock --> Decision + Decision -->|No| StateOnly + Decision -->|Yes| EventPath + StateOnly --> Callback + EventPath --> Callback + Callback --> Release + + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px ``` +**Two Update Patterns:** -#### Properties +1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) +2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur -- `alive`: bool - Check if the remote workspace is alive by querying the health endpoint. - * Returns: - True if the health endpoint returns a successful response, False otherwise. -- `client`: Client +**Thread Safety:** +- FIFO Lock ensures ordered, atomic updates +- Callbacks fire after successful commit +- Read operations never block writes -#### Methods +## Execution Models -#### execute_command() +The conversation system supports two execution models with identical APIs: -Execute a bash command on the remote system. +### Local vs Remote Execution -This method starts a bash command via the remote agent server API, -then polls for the output until the command completes. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Local["LocalConversation"] + L1["User sends message"] + L2["Agent executes in-process"] + L3["Direct tool calls"] + L4["Events via callbacks"] + L1 --> L2 --> L3 --> L4 + end + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory (optional) - * `timeout` – Timeout in seconds -* Returns: - Result with stdout, stderr, exit_code, and other metadata -* Return type: - [CommandResult](#class-commandresult) +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Remote["RemoteConversation"] + R1["User sends message"] + R2["HTTP → Agent Server"] + R3["Isolated container execution"] + R4["WebSocket event stream"] + R1 --> R2 --> R3 --> R4 + end + style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -#### file_download() +| Aspect | LocalConversation | RemoteConversation | +|--------|-------------------|-------------------| +| **Execution** | In-process | Remote container/server | +| **Communication** | Direct function calls | HTTP + WebSocket | +| **State Sync** | Immediate | Network serialized | +| **Use Case** | Development, CLI tools | Production, web apps | +| **Isolation** | Process-level | Container-level | -Download a file from the remote system. +**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. -Requests the file from the remote system via HTTP API and saves it locally. +## Auxiliary Services -* Parameters: - * `source_path` – Path to the source file on remote system - * `destination_path` – Path where the file should be saved locally -* Returns: - Result with success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) +The conversation system provides pluggable services that operate independently on the event stream: -#### file_upload() +| Service | Purpose | Architecture Pattern | +|---------|---------|---------------------| +| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | +| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | +| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | +| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -Upload a file to the remote system. +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point -Reads the local file and sends it to the remote system via HTTP API. +## Component Relationships -* Parameters: - * `source_path` – Path to the local source file - * `destination_path` – Path where the file should be uploaded on remote system -* Returns: - Result with success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) +### How Conversation Interacts -#### git_changes() +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Conv["Conversation"] + Agent["Agent"] + WS["Workspace"] + Tools["Tools"] + LLM["LLM"] + + Conv -->|Delegates to| Agent + Conv -->|Configures| WS + Agent -.->|Updates| Conv + Agent -->|Uses| Tools + Agent -->|Queries| LLM + + style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Get the git changes for the repository at the path given. +**Relationship Characteristics:** +- **Conversation → Agent**: One-way orchestration, agent reports back via state updates +- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation +- **Agent → Conversation**: Indirect via state events -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +## See Also -#### git_diff() +- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design +- **[Event System](/sdk/arch/events)** - Event types and flow +- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples -Get the git diff for the file at the path given. +### Design Principles +Source: https://docs.openhands.dev/sdk/arch/design.md -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed +The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. -#### model_config = (configuration object) +[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +## Optional Isolation over Mandatory Sandboxing -#### model_post_init() + +**V0 Challenge:** +Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. +Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. + -Override this method to perform additional initialization after __init__ and model_construct. -This is useful if you want to do some validation that requires the entire model to be initialized. +**V1 Principle:** +**Sandboxing should be opt-in, not universal.** +V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. +When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. -#### reset_client() +## Stateless by Default, One Source of Truth for State -Reset the HTTP client to force re-initialization. + +**V0 Challenge:** +V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. + -This is useful when connection parameters (host, api_key) have changed -and the client needs to be recreated with new values. +**V1 Principle:** +**Keep everything stateless, with exactly one mutable state.** +All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. +The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. -### class Workspace +## Clear Boundaries between Agent and Applications -### class Workspace + +**V0 Challenge:** +The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. +Heavy research dependencies and benchmark integrations further bloated production builds. + -Bases: `object` +**V1 Principle:** +**Maintain strict separation of concerns.** +V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). +Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. -Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. -Usage: -: - Workspace(working_dir=…) -> LocalWorkspace - - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace +## Composable Components for Extensibility + + +**V0 Challenge:** +Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. + +**V1 Principle:** +**Everything should be composable and safe to extend.** +Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. +Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. -# Agent -Source: https://docs.openhands.dev/sdk/arch/agent +### Events +Source: https://docs.openhands.dev/sdk/arch/events.md -The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. +The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. -**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) +**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) ## Core Responsibilities -The Agent system has four primary responsibilities: +The Event System has four primary responsibilities: -1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history -2. **Tool Orchestration** - Select and execute tools, handle results and errors -3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) -4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) +1. **Type Safety** - Enforce event schemas through Pydantic models +2. **LLM Integration** - Convert events to/from LLM message formats +3. **Append-Only Log** - Maintain immutable event history +4. **Service Integration** - Enable observers to react to event streams ## Architecture ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% flowchart TB - subgraph Input[" "] - Events["Event History"] - Context["Agent Context
Skills + Prompts"] - end + Base["Event
Base class"] + LLMBase["LLMConvertibleEvent
Abstract base"] - subgraph Core["Agent Core"] - Condense["Condenser
History compression"] - Reason["LLM Query
Generate actions"] - Security["Security Analyzer
Risk assessment"] + subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] + Message["MessageEvent
User/assistant text"] + Action["ActionEvent
Tool calls"] + System["SystemPromptEvent
Initial system prompt"] + CondSummary["CondensationSummaryEvent
Condenser summary"] + + ObsBase["ObservationBaseEvent
Base for tool responses"] + Observation["ObservationEvent
Tool results"] + UserReject["UserRejectObservation
User rejected action"] + AgentError["AgentErrorEvent
Agent error"] end - subgraph Execution[" "] - Tools["Tool Executor
Action → Observation"] - Results["Observation Events"] + subgraph Internals["Internal Events
NOT visible to the LLM"] + ConvState["ConversationStateUpdateEvent
State updates"] + CondReq["CondensationRequest
Request compression"] + Cond["Condensation
Compression result"] + Pause["PauseEvent
User pause"] end - Events --> Condense - Context -.->|Skills| Reason - Condense --> Reason - Reason --> Security - Security --> Tools - Tools --> Results - Results -.->|Feedback| Events + Base --> LLMBase + Base --> Internals + LLMBase --> LLMTypes + ObsBase --> Observation + ObsBase --> UserReject + ObsBase --> AgentError classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - class Reason primary - class Condense,Security secondary - class Tools tertiary + class Base,LLMBase,Message,Action,SystemPromptEvent primary + class ObsBase,Observation,UserReject,AgentError secondary + class ConvState,CondReq,Cond,Pause tertiary ``` ### Key Components | Component | Purpose | Design | |-----------|---------|--------| -| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | -| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | -| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | -| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | -| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | - -## Reasoning-Action Loop - -The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: +| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | +| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | +| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | +| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | +| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | +| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | +| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | +| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | +| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | +| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | +| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | +| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | +| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% -flowchart TB - Start["step() called"] - Pending{"Pending
actions?"} - ExecutePending["Execute pending actions"] - - HasCondenser{"Has
condenser?"} - Condense["Call condenser.condense()"] - CondenseResult{"Result
type?"} - EmitCondensation["Emit Condensation event"] - UseView["Use View events"] - UseRaw["Use raw events"] - - Query["Query LLM with messages"] - ContextExceeded{"Context
window
exceeded?"} - EmitRequest["Emit CondensationRequest"] - - Parse{"Response
type?"} - CreateActions["Create ActionEvents"] - CreateMessage["Create MessageEvent"] - - Confirmation{"Need
confirmation?"} - SetWaiting["Set WAITING_FOR_CONFIRMATION"] - - Execute["Execute actions"] - Observe["Create ObservationEvents"] - - Return["Return"] - - Start --> Pending - Pending -->|Yes| ExecutePending --> Return - Pending -->|No| HasCondenser - - HasCondenser -->|Yes| Condense - HasCondenser -->|No| UseRaw - Condense --> CondenseResult - CondenseResult -->|Condensation| EmitCondensation --> Return - CondenseResult -->|View| UseView --> Query - UseRaw --> Query - - Query --> ContextExceeded - ContextExceeded -->|Yes| EmitRequest --> Return - ContextExceeded -->|No| Parse - - Parse -->|Tool calls| CreateActions - Parse -->|Message| CreateMessage --> Return - - CreateActions --> Confirmation - Confirmation -->|Yes| SetWaiting --> Return - Confirmation -->|No| Execute - - Execute --> Observe - Observe --> Return - - style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +## Event Types -**Step Execution Flow:** +### LLM-Convertible Events -1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return -2. **Condensation:** If condenser exists: - - Call `condenser.condense()` with current event view - - If returns `View`: use condensed events for LLM query (continue in same step) - - If returns `Condensation`: emit event and return (will be processed next step) -3. **LLM Query:** Query LLM with messages from event history - - If context window exceeded: emit `CondensationRequest` and return -4. **Response Parsing:** Parse LLM response into events - - Tool calls → create `ActionEvent`(s) - - Text message → create `MessageEvent` and return -5. **Confirmation Check:** If actions need user approval: - - Set conversation status to `WAITING_FOR_CONFIRMATION` and return -6. **Action Execution:** Execute tools and create `ObservationEvent`(s) +Events that participate in agent reasoning and can be converted to LLM messages: -**Key Characteristics:** -- **Stateless:** Agent holds no mutable state between steps -- **Event-Driven:** Reads from event history, writes new events -- **Interruptible:** Each step is atomic and can be paused/resumed -## Agent Context +| Event Type | Source | Content | LLM Role | +|------------|--------|---------|----------| +| **MessageEvent (user)** | user | Text, images | `user` | +| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | +| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | +| **ObservationEvent** | environment | Tool execution result | `tool` | +| **UserRejectObservation** | environment | Rejection reason | `tool` | +| **AgentErrorEvent** | agent | Error details | `tool` | +| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | +| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | -The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: +The event system bridges agent events to LLM messages: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - Context["AgentContext"] - - subgraph Skills["Skills"] - Repo["repo
Always active"] - Knowledge["knowledge
Trigger-based"] - end - SystemAug["System prompt prefix/suffix
Per-conversation"] - System["Prompt template
Per-conversation"] - - subgraph Application["Applied to LLM"] - SysPrompt["System Prompt"] - UserMsg["User Messages"] - end + Events["Event List"] + Filter["Filter LLMConvertibleEvent"] + Group["Group ActionEvents
by llm_response_id"] + Convert["Convert to Messages"] + LLM["LLM Input"] - Context --> Skills - Context --> SystemAug - Repo --> SysPrompt - Knowledge -.->|When triggered| UserMsg - System --> SysPrompt - SystemAug --> SysPrompt + Events --> Filter + Filter --> Group + Group --> Convert + Convert --> LLM - style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -| Skill Type | Activation | Use Case | -|------------|------------|----------| -| **repo** | Always included | Project-specific context, conventions | -| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | +**Special Handling - Parallel Function Calling:** -Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. +When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): +1. Group all ActionEvents by `llm_response_id` +2. Combine into single Message with multiple `tool_calls` +3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included +4. All subsequent events in the batch have empty thought fields +**Example:** +``` +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` -## Tool Execution -Tools follow a **strict action-observation pattern**: +### Internal Events -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - LLM["LLM generates tool_call"] - Convert["Convert to ActionEvent"] - - Decision{"Confirmation
mode?"} - Defer["Store as pending"] - - Execute["Execute tool"] - Success{"Success?"} - - Obs["ObservationEvent
with result"] - Error["ObservationEvent
with error"] - - LLM --> Convert - Convert --> Decision - - Decision -->|Yes| Defer - Decision -->|No| Execute - - Execute --> Success - Success -->|Yes| Obs - Success -->|No| Error - - style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +Events for metadata, control flow, and user actions (not sent to LLM): -**Execution Modes:** +| Event Type | Source | Purpose | Key Fields | +|------------|--------|---------|------------| +| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | +| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | +| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | -| Mode | Behavior | Use Case | -|------|----------|----------| -| **Direct** | Execute immediately | Development, trusted environments | -| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools -**Security Integration:** +## Component Relationships -Before execution, the security analyzer evaluates each action: -- **Low Risk:** Execute immediately -- **Medium Risk:** Log warning, execute with monitoring -- **High Risk:** Block execution, request user confirmation +### How Events Integrate -## Component Relationships +## `source` vs LLM `role` + +Events often carry **two different concepts** that are easy to confuse: + +- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. +- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. + +These fields are **intentionally independent**. + +Common examples include: + +- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. +- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. + +**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. -### How Agent Interacts ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR + Events["Event System"] Agent["Agent"] - Conv["Conversation"] - LLM["LLM"] + Conversation["Conversation"] Tools["Tools"] - Context["AgentContext"] + Services["Auxiliary Services"] - Conv -->|.step calls| Agent - Agent -->|Reads events| Conv - Agent -->|Query| LLM - Agent -->|Execute| Tools - Context -.->|Skills and Context| Agent - Agent -.->|New events| Conv + Agent -->|Reads| Events + Agent -->|Writes| Events + Conversation -->|Manages| Events + Tools -->|Creates| Events + Events -.->|Stream| Services - style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` **Relationship Characteristics:** -- **Conversation → Agent**: Orchestrates step execution, provides event history -- **Agent → LLM**: Queries for next actions, receives tool calls or messages -- **Agent → Tools**: Executes actions, receives observations -- **AgentContext → Agent**: Injects skills and prompts into LLM queries +- **Agent → Events**: Reads history for context, writes actions/messages +- **Conversation → Events**: Owns and persists event log +- **Tools → Events**: Create ObservationEvents after execution +- **Services → Events**: Read-only observers for monitoring, visualization + +## Error Events: Agent vs Conversation + +Two distinct error events exist in the SDK, with different purpose and visibility: + +- AgentErrorEvent + - Type: ObservationBaseEvent (LLM-convertible) + - Scope: Error for a specific tool call (has tool_name and tool_call_id) + - Source: "agent" + - LLM visibility: Sent as a tool message so the model can react/recover + - Effect: Conversation continues; not a terminal state + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py +- ConversationErrorEvent + - Type: Event (not LLM-convertible) + - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) + - Source: typically "environment" + - LLM visibility: Not sent to the model + - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py ## See Also -- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle -- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns -- **[Events](/sdk/arch/events)** - Event types and structures -- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns -- **[LLM](/sdk/arch/llm)** - Language model abstraction +- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events +- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management +- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation +- **[Condenser](/sdk/arch/condenser)** - Event history compression +### LLM +Source: https://docs.openhands.dev/sdk/arch/llm.md -# Agent Server Package -Source: https://docs.openhands.dev/sdk/arch/agent-server +The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. -The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. +**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) -**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) +## Core Responsibilities -## Purpose +The LLM system has five primary responsibilities: -The Agent Server enables: -- **Remote execution**: Clients interact with agents via HTTP API -- **Multi-user isolation**: Each user gets isolated workspace -- **Container orchestration**: Manages Docker containers for workspaces -- **Centralized management**: Monitor and control all agents -- **Scalability**: Horizontal scaling with multiple servers +1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers +2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) +3. **Configuration Management** - Load from environment, JSON, or programmatic configuration +4. **Telemetry & Cost** - Track usage, latency, and costs across providers +5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries -## Architecture Overview +## Architecture ```mermaid -graph TB - Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% +flowchart TB + subgraph Configuration["Configuration Sources"] + Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] + JSON["JSON Files
config/llm.json"] + Code["Programmatic
LLM(...)"] + end - API --> Auth[Authentication] - API --> Router[API Router] + subgraph Core["Core LLM"] + Model["LLM Model
Pydantic configuration"] + Pipeline["Request Pipeline
Retry, timeout, telemetry"] + end - Router --> WS[Workspace Manager] - Router --> Conv[Conversation Handler] + subgraph Backend["LiteLLM Backend"] + Providers["100+ Providers
OpenAI, Anthropic, etc."] + end - WS --> Docker[Docker Manager] - Docker --> C1[Container 1
User A] - Docker --> C2[Container 2
User B] - Docker --> C3[Container 3
User C] + subgraph Output["Telemetry"] + Usage["Token Usage"] + Cost["Cost Tracking"] + Latency["Latency Metrics"] + end - Conv --> Agent[Software Agent SDK] - Agent --> C1 - Agent --> C2 - Agent --> C3 + Env --> Model + JSON --> Model + Code --> Model - style Client fill:#e1f5fe - style API fill:#fff3e0 - style WS fill:#e8f5e8 - style Docker fill:#f3e5f5 - style Agent fill:#fce4ec + Model --> Pipeline + Pipeline --> Providers + + Pipeline --> Usage + Pipeline --> Cost + Pipeline --> Latency + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Model primary + class Pipeline secondary + class LiteLLM tertiary ``` ### Key Components -**1. FastAPI Server** -- HTTP REST API endpoints -- Authentication and authorization -- Request validation -- WebSocket support for streaming - -**2. Workspace Manager** -- Creates and manages Docker containers -- Isolates workspaces per user -- Handles container lifecycle -- Manages resource limits - -**3. Conversation Handler** -- Routes requests to appropriate workspace -- Manages conversation state -- Handles concurrent requests -- Supports streaming responses - -**4. Docker Manager** -- Interfaces with Docker daemon -- Builds and pulls images -- Creates and destroys containers -- Monitors container health - -## Design Decisions - -### Why HTTP API? - -Alternative approaches considered: -- **gRPC**: More efficient but harder for web clients -- **WebSockets only**: Good for streaming but not RESTful -- **HTTP + WebSockets**: Best of both worlds - -**Decision**: HTTP REST for operations, WebSockets for streaming -- ✅ Works from any client (web, mobile, CLI) -- ✅ Easy to debug (curl, Postman) -- ✅ Standard authentication (API keys, OAuth) -- ✅ Streaming where needed - -### Why Container Per User? - -Alternative approaches: -- **Shared container**: Multiple users in one container -- **Container per session**: New container each conversation -- **Container per user**: One container per user (chosen) +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | +| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | +| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | +| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | +| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | +| **Telemetry** | Usage tracking | Token counts, costs, latency | -**Decision**: Container per user -- ✅ Strong isolation between users -- ✅ Persistent workspace across sessions -- ✅ Better resource management -- ⚠️ More containers, but worth it for isolation +## Configuration -### Why FastAPI? +See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. -Alternative frameworks: -- **Flask**: Simpler but less type-safe -- **Django**: Too heavyweight -- **FastAPI**: Modern, fast, type-safe (chosen) +### Programmatic Configuration -**Decision**: FastAPI -- ✅ Automatic API documentation (OpenAPI) -- ✅ Type validation with Pydantic -- ✅ Async support for performance -- ✅ WebSocket support built-in +Create LLM instances directly in code: -## API Design +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Code["Python Code"] + LLM["LLM(model=...)"] + Agent["Agent"] + + Code --> LLM + LLM --> Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -### Key Endpoints +**Example:** +```python +from pydantic import SecretStr +from openhands.sdk import LLM -**Workspace Management** -``` -POST /workspaces Create new workspace -GET /workspaces/{id} Get workspace info -DELETE /workspaces/{id} Delete workspace -POST /workspaces/{id}/execute Execute command +llm = LLM( + model="anthropic/claude-sonnet-4.1", + api_key=SecretStr("sk-ant-123"), + temperature=0.1, + timeout=120, +) ``` -**Conversation Management** -``` -POST /conversations Create conversation -GET /conversations/{id} Get conversation -POST /conversations/{id}/messages Send message -GET /conversations/{id}/stream Stream responses (WebSocket) -``` +### Environment Variable Configuration -**Health & Monitoring** -``` -GET /health Server health check -GET /metrics Prometheus metrics -``` +Load from environment using naming convention: -### Authentication +**Environment Variable Pattern:** +- **Prefix:** All variables start with `LLM_` +- **Mapping:** `LLM_FIELD` → `field` (lowercased) +- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr -**API Key Authentication** +**Common Variables:** ```bash -curl -H "Authorization: Bearer YOUR_API_KEY" \ - https://agent-server.example.com/conversations +export LLM_MODEL="anthropic/claude-sonnet-4.1" +export LLM_API_KEY="sk-ant-123" +export LLM_USAGE_ID="primary" +export LLM_TIMEOUT="120" +export LLM_NUM_RETRIES="5" ``` -**Per-user workspace isolation** -- API key → user ID mapping -- Each user gets separate workspace -- Users can't access each other's workspaces +### JSON Configuration -### Streaming Responses +Serialize and load from JSON files: -**WebSocket for real-time updates** +**Example:** ```python -async with websocket_connect(url) as ws: - # Send message - await ws.send_json({"message": "Hello"}) - - # Receive events - async for event in ws: - if event["type"] == "message": - print(event["content"]) +# Save +llm.model_dump_json(exclude_none=True, indent=2) + +# Load +llm = LLM.load_from_json("config/llm.json") ``` -**Why streaming?** -- Real-time feedback to users -- Show agent thinking process -- Better UX for long-running tasks +**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). +If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. -## Deployment Models -### 1. Local Development +## Request Pipeline -Run server locally for testing: -```bash -# Start server -openhands-agent-server --port 8000 +### Completion Flow -# Or with Docker -docker run -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% +flowchart TB + Request["completion() or responses() call"] + Validate["Validate Config"] + + Attempt["LiteLLM Request"] + Success{"Success?"} + + Retry{"Retries
remaining?"} + Wait["Exponential Backoff"] + + Telemetry["Record Telemetry"] + Response["Return Response"] + Error["Raise Error"] + + Request --> Validate + Validate --> Attempt + Attempt --> Success + + Success -->|Yes| Telemetry + Success -->|No| Retry + + Retry -->|Yes| Wait + Retry -->|No| Error + + Wait --> Attempt + Telemetry --> Response + + style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Use case**: Development and testing - -### 2. Single-Server Deployment +**Pipeline Stages:** -Deploy on one server (VPS, EC2, etc.): -```bash -# Install -pip install openhands-agent-server +1. **Validation:** Check required fields (model, messages) +2. **Request:** Call LiteLLM with provider-specific formatting +3. **Retry Logic:** Exponential backoff on failures (configurable) +4. **Telemetry:** Record tokens, cost, latency +5. **Response:** Return completion or raise error -# Run with systemd/supervisor -openhands-agent-server \ - --host 0.0.0.0 \ - --port 8000 \ - --workers 4 -``` +### Responses API Support -**Use case**: Small deployments, prototypes, MVPs +In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. -### 3. Multi-Server Deployment +#### Architecture -Scale horizontally with load balancer: -``` - Load Balancer - | - +-------------+-------------+ - | | | - Server 1 Server 2 Server 3 - (Agents) (Agents) (Agents) - | | | - +-------------+-------------+ - | - Shared State Store - (Database, Redis, etc.) +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Check{"Model supports
Responses API?"} + + subgraph Standard["Standard Path"] + ChatFormat["Format as
Chat Messages"] + ChatCall["litellm.completion()"] + end + + subgraph ResponsesPath["Responses Path"] + RespFormat["Format as
instructions + input[]"] + RespCall["litellm.responses()"] + end + + ChatResponse["ModelResponse"] + RespResponse["ResponsesAPIResponse"] + + Parse["Parse to Message"] + Return["LLMResponse"] + + Check -->|No| ChatFormat + Check -->|Yes| RespFormat + + ChatFormat --> ChatCall + RespFormat --> RespCall + + ChatCall --> ChatResponse + RespCall --> RespResponse + + ChatResponse --> Parse + RespResponse --> Parse + + Parse --> Return + + style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**Use case**: Production SaaS, high traffic, need redundancy - -### 4. Kubernetes Deployment +#### Supported Models -Container orchestration with Kubernetes: -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: agent-server -spec: - replicas: 3 - template: - spec: - containers: - - name: agent-server - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - containerPort: 8000 -``` +Models that automatically use the Responses API path: -**Use case**: Enterprise deployments, auto-scaling, high availability +| Pattern | Examples | Documentation | +|---------|----------|---------------| +| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | -## Resource Management +**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). -### Container Limits -Set per-workspace resource limits: -```python -# In server configuration -WORKSPACE_CONFIG = { - "resource_limits": { - "memory": "2g", # 2GB RAM - "cpus": "2", # 2 CPU cores - "disk": "10g" # 10GB disk - }, - "timeout": 300, # 5 min timeout -} -``` +## Provider Integration -**Why limit resources?** -- Prevent one user from consuming all resources -- Fair usage across users -- Protect server from runaway processes -- Cost control +### LiteLLM Abstraction -### Cleanup & Garbage Collection +Software Agent SDK uses LiteLLM for provider abstraction: -**Container lifecycle**: -- Containers created on first use -- Kept alive between requests (warm) -- Cleaned up after inactivity timeout -- Force cleanup on server shutdown - -**Storage management**: -- Old workspaces deleted automatically -- Disk usage monitored -- Alerts when approaching limits - -## Security Considerations - -### Multi-Tenant Isolation - -**Container isolation**: -- Each user gets separate container -- Containers can't communicate -- Network isolation (optional) -- File system isolation - -**API isolation**: -- API keys mapped to users -- Users can only access their workspaces -- Server validates all permissions - -### Input Validation - -**Server validates**: -- API request schemas -- Command injection attempts -- Path traversal attempts -- File size limits - -**Defense in depth**: -- API validation -- Container validation -- Docker security features -- OS-level security - -### Network Security - -**Best practices**: -- HTTPS only (TLS certificates) -- Firewall rules (only port 443/8000) -- Rate limiting -- DDoS protection - -**Container networking**: -```python -# Disable network for workspace -WORKSPACE_CONFIG = { - "network_mode": "none" # No network access -} - -# Or allow specific hosts -WORKSPACE_CONFIG = { - "allowed_hosts": ["api.example.com"] -} -``` - -## Monitoring & Observability - -### Health Checks - -```bash -# Simple health check -curl https://agent-server.example.com/health - -# Response -{ - "status": "healthy", - "docker": "connected", - "workspaces": 15, - "uptime": 86400 -} +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + SDK["Software Agent SDK"] + LiteLLM["LiteLLM"] + + subgraph Providers["100+ Providers"] + OpenAI["OpenAI"] + Anthropic["Anthropic"] + Google["Google"] + Azure["Azure"] + Others["..."] + end + + SDK --> LiteLLM + LiteLLM --> OpenAI + LiteLLM --> Anthropic + LiteLLM --> Google + LiteLLM --> Azure + LiteLLM --> Others + + style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -### Metrics - -**Prometheus metrics**: -- Request count and latency -- Active workspaces -- Container resource usage -- Error rates - -**Logging**: -- Structured JSON logs -- Per-request tracing -- Workspace events -- Error tracking - -### Alerting - -**Alert on**: -- Server down -- High error rate -- Resource exhaustion -- Container failures - -## Client SDK - -Python SDK for interacting with Agent Server: - -```python -from openhands.client import AgentServerClient - -client = AgentServerClient( - url="https://agent-server.example.com", - api_key="your-api-key" -) - -# Create conversation -conversation = client.create_conversation() - -# Send message -response = client.send_message( - conversation_id=conversation.id, - message="Hello, agent!" -) - -# Stream responses -for event in client.stream_conversation(conversation.id): - if event.type == "message": - print(event.content) -``` +**Benefits:** +- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. +- **Unified API:** Same interface regardless of provider +- **Format Translation:** Provider-specific request/response formatting +- **Error Handling:** Normalized error codes and messages -**Client handles**: -- Authentication -- Request/response serialization -- Error handling -- Streaming -- Retries +### LLM Providers -## Cost Considerations +Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. +The pages linked below live under the OpenHands app section but apply +verbatim to SDK applications because both layers wrap the same +`openhands.sdk.llm.LLM` interface. -### Server Costs +| Provider / scenario | Documentation | +| --- | --- | +| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | +| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | +| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | +| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | +| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | +| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | +| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | +| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | +| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | +| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | -**Compute**: CPU and memory for containers -- Each active workspace = 1 container -- Typically 1-2 GB RAM per workspace -- 0.5-1 CPU core per workspace +When you follow any of those guides while building with the SDK, create an +`LLM` object using the documented parameters (for example, API keys, base URLs, +or custom headers) and pass it into your agent or registry. The OpenHands UI +surfacing is simply a convenience layer on top of the same configuration model. -**Storage**: Workspace files and conversation state -- ~1-10 GB per workspace (depends on usage) -- Conversation history in database -**Network**: API requests and responses -- Minimal (mostly text) -- Streaming adds bandwidth +## Telemetry and Cost Tracking -### Cost Optimization +### Telemetry Collection -**1. Idle timeout**: Shutdown containers after inactivity -```python -WORKSPACE_CONFIG = { - "idle_timeout": 3600 # 1 hour -} -``` +LLM requests automatically collect metrics: -**2. Resource limits**: Don't over-provision -```python -WORKSPACE_CONFIG = { - "resource_limits": { - "memory": "1g", # Smaller limit - "cpus": "0.5" # Fractional CPU - } -} +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Request["LLM Request"] + + subgraph Metrics + Tokens["Token Counts
Input/Output"] + Cost["Cost
USD"] + Latency["Latency
ms"] + end + + Events["Event Log"] + + Request --> Tokens + Request --> Cost + Request --> Latency + + Tokens --> Events + Cost --> Events + Latency --> Events + + style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**3. Shared resources**: Use single server for multiple low-traffic apps - -**4. Auto-scaling**: Scale servers based on demand - -## When to Use Agent Server - -### Use Agent Server When: - -✅ **Multi-user system**: Web app with many users -✅ **Remote clients**: Mobile app, web frontend -✅ **Centralized management**: Need to monitor all agents -✅ **Workspace isolation**: Users shouldn't interfere -✅ **SaaS product**: Building agent-as-a-service -✅ **Scaling**: Need to handle concurrent users - -**Examples**: -- Chatbot platforms -- Code assistant web apps -- Agent marketplaces -- Enterprise agent deployments - -### Use Standalone SDK When: - -✅ **Single-user**: Personal tool or script -✅ **Local execution**: Running on your machine -✅ **Full control**: Need programmatic access -✅ **Simpler deployment**: No server management -✅ **Lower latency**: No network overhead - -**Examples**: -- CLI tools -- Automation scripts -- Local development -- Desktop applications - -### Hybrid Approach - -Use SDK locally but RemoteAPIWorkspace for execution: -- Agent logic in your Python code -- Execution happens on remote server -- Best of both worlds - -## Building Custom Agent Server - -The server is extensible for custom needs: +**Tracked Metrics:** +- **Token Usage:** Input tokens, output tokens, total +- **Cost:** Per-request cost using configured rates +- **Latency:** Request duration in milliseconds +- **Errors:** Failure types and retry counts -**Custom authentication**: -```python -from openhands.agent_server import AgentServer +### Cost Configuration -class CustomAgentServer(AgentServer): - async def authenticate(self, request): - # Custom auth logic - return await oauth_verify(request) -``` +Configure per-token costs for custom models: -**Custom workspace configuration**: ```python -server = AgentServer( - workspace_factory=lambda user: DockerWorkspace( - image=f"custom-image-{user.tier}", - resource_limits=user.resource_limits - ) +llm = LLM( + model="custom/my-model", + input_cost_per_token=0.00001, # $0.01 per 1K tokens + output_cost_per_token=0.00003, # $0.03 per 1K tokens ) ``` -**Custom middleware**: -```python -@server.middleware -async def logging_middleware(request, call_next): - # Custom logging - response = await call_next(request) - return response -``` - -## Next Steps +**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) -### For Usage Examples +**Custom Costs:** Override for: +- Internal models +- Custom pricing agreements +- Cost estimation for budgeting -- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally -- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup -- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API -- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options +## Component Relationships -### For Related Architecture +### How LLM Integrates -- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details -- [SDK Architecture](/sdk/arch/sdk) - Core framework -- [Architecture Overview](/sdk/arch/overview) - System design +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + LLM["LLM"] + Agent["Agent"] + Conversation["Conversation"] + Events["Events"] + Security["Security Analyzer"] + Condenser["Context Condenser"] + + Agent -->|Uses| LLM + LLM -->|Records| Events + Security -.->|Optional| LLM + Condenser -.->|Optional| LLM + Conversation -->|Provides context| Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -### For Implementation Details +**Relationship Characteristics:** +- **Agent → LLM**: Agent uses LLM for reasoning and tool calls +- **LLM → Events**: LLM requests/responses recorded as events +- **Security → LLM**: Optional security analyzer can use separate LLM +- **Condenser → LLM**: Optional context condenser can use separate LLM +- **Configuration**: LLM configured independently, passed to agent +- **Telemetry**: LLM metrics flow through event system to UI/logging -- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples +## See Also +- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions +- **[Events](/sdk/arch/events)** - LLM request/response event types +- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis +- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration -# Condenser -Source: https://docs.openhands.dev/sdk/arch/condenser +### MCP Integration +Source: https://docs.openhands.dev/sdk/arch/mcp.md -The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). +The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. -**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) +**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) ## Core Responsibilities -The Condenser system has four primary responsibilities: +The MCP Integration system has four primary responsibilities: -1. **History Compression** - Reduce event lists to fit within context windows -2. **Threshold Detection** - Determine when condensation should trigger -3. **Summary Generation** - Create meaningful summaries via LLM or heuristics -4. **View Management** - Transform event history into LLM-ready views +1. **MCP Client Management** - Connect to and communicate with MCP servers +2. **Tool Discovery** - Enumerate available tools from MCP servers +3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions +4. **Execution Bridge** - Execute MCP tool calls from agent actions ## Architecture ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% flowchart TB - subgraph Interface["Abstract Interface"] - Base["CondenserBase
Abstract base"] + subgraph Client["MCP Client"] + Sync["MCPClient
Sync/Async bridge"] + Async["AsyncMCPClient
FastMCP base"] end - subgraph Implementations["Concrete Implementations"] - NoOp["NoOpCondenser
No compression"] - LLM["LLMSummarizingCondenser
LLM-based"] - Pipeline["PipelineCondenser
Multi-stage"] + subgraph Bridge["Tool Bridge"] + Def["MCPToolDefinition
Schema conversion"] + Exec["MCPToolExecutor
Execution handler"] end - subgraph Process["Condensation Process"] - View["View
Event history"] - Check["should_condense()?"] - Condense["get_condensation()"] - Result["View | Condensation"] + subgraph Integration["Agent Integration"] + Action["MCPToolAction
Dynamic model"] + Obs["MCPToolObservation
Result wrapper"] end - subgraph Output["Condensation Output"] - CondEvent["Condensation Event
Summary metadata"] - NewView["Condensed View
Reduced tokens"] + subgraph External["External"] + Server["MCP Server
stdio/HTTP"] + Tools["External Tools"] end - Base --> NoOp - Base --> LLM - Base --> Pipeline + Sync --> Async + Async --> Server - View --> Check - Check -->|Yes| Condense - Check -->|No| Result - Condense --> CondEvent - CondEvent --> NewView - NewView --> Result + Server --> Def + Def --> Exec + + Exec --> Action + Action --> Server + Server --> Obs + + Server -.->|Spawns| Tools classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - class Base primary - class LLM,Pipeline secondary - class Check,Condense tertiary + class Sync,Async primary + class Def,Exec secondary + class Action,Obs tertiary ``` ### Key Components | Component | Purpose | Design | |-----------|---------|--------| -| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | -| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | -| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | -| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | -| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | -| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | -| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | -## Condenser Types +## MCP Client -### NoOpCondenser +### Sync/Async Bridge -Pass-through condenser that performs no compression: +The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - View["View"] - NoOp["NoOpCondenser"] - Same["Same View"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Sync["Sync Code
Agent execution"] + Bridge["call_async_from_sync()"] + Executor["AsyncExecutor
Background loop"] + Async["Async MCP Call"] + Server["MCP Server"] + Result["Result"] - View --> NoOp --> Same + Sync --> Bridge + Bridge --> Executor + Executor --> Async + Async --> Server + Server --> Result + Result --> Sync - style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### LLMSummarizingCondenser +**Bridge Pattern:** +- **Problem:** MCP protocol is async, but agent tools run synchronously +- **Solution:** Background event loop that executes async code from sync contexts +- **Benefit:** Agents use MCP tools without async/await in tool definitions -Uses an LLM to generate summaries of conversation history: +**Client Features:** +- **Lifecycle Management:** `__enter__`/`__exit__` for context manager +- **Timeout Support:** Configurable timeouts for MCP operations +- **Error Handling:** Wraps MCP errors in observations +- **Connection Pooling:** Reuses connections across tool calls + +### MCP Server Configuration + +MCP servers are configured using the FastMCP format: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } +} +``` + +**Configuration Fields:** +- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) +- **args:** Arguments to pass to command +- **env:** Environment variables (optional) + +## Tool Discovery and Conversion + +### Discovery Flow ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - View["Long View
120+ events"] - Check["Threshold
exceeded?"] - Summarize["LLM Summarization"] - Summary["Summary Text"] - Metadata["Condensation Event"] - AddToHistory["Add to History"] - NextStep["Next Step: View.from_events()"] - NewView["Condensed View"] +flowchart TB + Config["MCP Config"] + Spawn["Spawn Server"] + List["List Tools"] - View --> Check - Check -->|Yes| Summarize - Summarize --> Summary - Summary --> Metadata - Metadata --> AddToHistory - AddToHistory --> NextStep - NextStep --> NewView + subgraph Convert["Convert Each Tool"] + Schema["MCP Schema"] + Action["Generate Action Model"] + Def["Create ToolDefinition"] + end - style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px + Register["Register in ToolRegistry"] + + Config --> Spawn + Spawn --> List + List --> Schema + + Schema --> Action + Action --> Def + Def --> Register + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Process:** -1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) -2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) -3. **LLM Call:** Generate summary of middle events using dedicated LLM -4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` -5. **Add to History:** Agent adds `Condensation` to event log and returns early -6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary +**Discovery Steps:** -**Configuration:** -- **`max_size`:** Event count threshold before condensation triggers (default: 120) -- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) -- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) +1. **Spawn Server:** Launch MCP server via stdio +2. **List Tools:** Call `tools/list` MCP endpoint +3. **Parse Schemas:** Extract tool names, descriptions, parameters +4. **Generate Models:** Dynamically create Pydantic models for actions +5. **Create Definitions:** Wrap in `ToolDefinition` objects +6. **Register:** Add to agent's tool registry -### PipelineCondenser +### Schema Conversion -Chains multiple condensers in sequence: +MCP tool schemas are converted to SDK tool definitions: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - View["Original View"] - C1["Condenser 1"] - C2["Condenser 2"] - C3["Condenser 3"] - Final["Final View"] + MCP["MCP Tool Schema
JSON Schema"] + Parse["Parse Parameters"] + Model["Dynamic Pydantic Model
MCPToolAction"] + Def["ToolDefinition
SDK format"] - View --> C1 --> C2 --> C3 --> Final + MCP --> Parse + Parse --> Model + Model --> Def - style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) +**Conversion Rules:** -## Condensation Flow +| MCP Schema | SDK Action Model | +|------------|------------------| +| **name** | Class name (camelCase) | +| **description** | Docstring | +| **inputSchema** | Pydantic fields | +| **required** | Field(required=True) | +| **type** | Python type hints | -### Trigger Mechanisms +**Example:** -Condensers can be triggered in two ways: +```python +# MCP Schema +{ + "name": "fetch_url", + "description": "Fetch content from URL", + "inputSchema": { + "type": "object", + "properties": { + "url": {"type": "string"}, + "timeout": {"type": "number"} + }, + "required": ["url"] + } +} -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Automatic["Automatic Trigger"] - Agent1["Agent Step"] - Build1["View.from_events()"] - Check1["condenser.condense(view)"] - Trigger1["should_condense()?"] - end - - Agent1 --> Build1 --> Check1 --> Trigger1 - - style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +# Generated Action Model +class FetchUrl(MCPToolAction): + """Fetch content from URL""" + url: str + timeout: float | None = None ``` -**Automatic Trigger:** -- **When:** Threshold exceeded (e.g., event count > `max_size`) -- **Who:** Agent calls `condenser.condense()` each step -- **Purpose:** Proactively keep context within limits - - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Manual["Manual Trigger"] - Error["LLM Context Error"] - Request["CondensationRequest Event"] - NextStep["Next Agent Step"] - Trigger2["condense() detects request"] - end - - Error --> Request --> NextStep --> Trigger2 - - style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` -**Manual Trigger:** -- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) -- **Who:** Agent (on LLM context window error) or application code -- **Purpose:** Force compression when context limit exceeded +## Tool Execution -### Condensation Workflow +### Execution Flow ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% flowchart TB - Start["Agent calls condense(view)"] - - Decision{"should_condense?"} + Agent["Agent generates action"] + Action["MCPToolAction"] + Executor["MCPToolExecutor"] - ReturnView["Return View
Agent proceeds"] + Convert["Convert to MCP format"] + Call["MCP call_tool"] + Server["MCP Server"] - Extract["Select Events to Keep/Forget"] - Generate["LLM Generates Summary"] - Create["Create Condensation Event"] - ReturnCond["Return Condensation"] - AddHistory["Agent adds to history"] - NextStep["Next Step: View.from_events()"] - FilterEvents["Filter forgotten events"] - InsertSummary["Insert summary at offset"] - NewView["New condensed view"] + Result["MCP Result"] + Obs["MCPToolObservation"] + Return["Return to Agent"] - Start --> Decision - Decision -->|No| ReturnView - Decision -->|Yes| Extract - Extract --> Generate - Generate --> Create - Create --> ReturnCond - ReturnCond --> AddHistory - AddHistory --> NextStep - NextStep --> FilterEvents - FilterEvents --> InsertSummary - InsertSummary --> NewView + Agent --> Action + Action --> Executor + Executor --> Convert + Convert --> Call + Call --> Server + Server --> Result + Result --> Obs + Obs --> Return - style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Key Steps:** - -1. **Threshold Check:** `should_condense()` determines if condensation needed -2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) -3. **Summary Generation:** LLM creates compressed representation of forgotten events -4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary -5. **Return to Agent:** Condenser returns `Condensation` (not `View`) -6. **History Update:** Agent adds `Condensation` to event log and exits step -7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary +**Execution Steps:** -## View and Condensation +1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` +2. **Executor Lookup:** Find `MCPToolExecutor` for tool name +3. **Format Conversion:** Convert action fields to MCP arguments +4. **MCP Call:** Execute `call_tool` via MCP client +5. **Result Parsing:** Parse MCP result (text, images, resources) +6. **Observation Creation:** Wrap in `MCPToolObservation` +7. **Error Handling:** Catch exceptions, return error observations -### View Structure +### MCPToolExecutor -A `View` represents the conversation history as it will be sent to the LLM: +Executors bridge SDK actions to MCP calls: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - Events["Full Event List
+ Condensation events"] - FromEvents["View.from_events()"] - Filter["Filter forgotten events"] - Insert["Insert summary"] - View["View
LLMConvertibleEvents"] - Convert["events_to_messages()"] - LLM["LLM Input"] + Executor["MCPToolExecutor"] + Client["MCP Client"] + Name["tool_name"] - Events --> FromEvents - FromEvents --> Filter - Filter --> Insert - Insert --> View - View --> Convert - Convert --> LLM + Executor -->|Uses| Client + Executor -->|Knows| Name - style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**View Components:** -- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) -- **`unhandled_condensation_request`:** Flag for pending manual condensation -- **`condensations`:** List of all Condensation events processed -- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics +**Executor Responsibilities:** +- **Client Management:** Hold reference to MCP client +- **Tool Identification:** Know which MCP tool to call +- **Argument Conversion:** Transform action fields to MCP format +- **Result Handling:** Parse MCP responses +- **Error Recovery:** Handle connection errors, timeouts, server failures -### Condensation Event +## MCP Tool Lifecycle -When condensation occurs, a `Condensation` event is created: +### From Configuration to Execution ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Old["Middle Events
~60 events"] - Summary["Summary Text
LLM-generated"] - Event["Condensation Event
forgotten_event_ids"] - Applied["View.from_events()"] - New["New View
~60 events + summary"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Load["Load MCP Config"] + Start["Start Conversation"] + Spawn["Spawn MCP Servers"] + Discover["Discover Tools"] + Register["Register Tools"] - Old -.->|Summarized| Summary - Summary --> Event - Event --> Applied - Applied --> New + Ready["Agent Ready"] - style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + Step["Agent Step"] + LLM["LLM Tool Call"] + Execute["Execute MCP Tool"] + Result["Return Observation"] + + End["End Conversation"] + Cleanup["Close MCP Clients"] + + Load --> Start + Start --> Spawn + Spawn --> Discover + Discover --> Register + Register --> Ready + + Ready --> Step + Step --> LLM + LLM --> Execute + Execute --> Result + Result --> Step + + Step --> End + End --> Cleanup + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Condensation Fields:** -- **`forgotten_event_ids`:** List of event IDs to filter out -- **`summary`:** Compressed text representation of forgotten events -- **`summary_offset`:** Index where summary event should be inserted -- Inherits from `Event`: `id`, `timestamp`, `source` +**Lifecycle Phases:** -## Rolling Window Pattern +| Phase | Operations | Components | +|-------|-----------|------------| +| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | +| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | +| **Execution** | Handle tool calls | Agent, MCPToolAction | +| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | -`RollingCondenser` implements a common pattern for threshold-based condensation: +## MCP Annotations + +MCP tools can include metadata hints for agents: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - View["Current View
120+ events"] - Check["Count Events"] - - Compare{"Count >
max_size?"} - - Keep["Keep All Events"] - - Split["Split Events"] - Head["Head
First 4 events"] - Middle["Middle
~56 events"] - Tail["Tail
~56 events"] - Summarize["LLM Summarizes Middle"] - Result["Head + Summary + Tail
~60 events total"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Tool["MCP Tool"] - View --> Check - Check --> Compare + subgraph Annotations + ReadOnly["readOnlyHint"] + Destructive["destructiveHint"] + Progress["progressEnabled"] + end - Compare -->|Under| Keep - Compare -->|Over| Split + Security["Security Analysis"] - Split --> Head - Split --> Middle - Split --> Tail + Tool --> ReadOnly + Tool --> Destructive + Tool --> Progress - Middle --> Summarize - Head --> Result - Summarize --> Result - Tail --> Result + ReadOnly --> Security + Destructive --> Security - style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Rolling Window Strategy:** -1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts -2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context -3. **Summarize Middle:** Compress events between head and tail into summary -4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) +**Annotation Types:** + +| Annotation | Meaning | Use Case | +|------------|---------|----------| +| **readOnlyHint** | Tool doesn't modify state | Lower security risk | +| **destructiveHint** | Tool modifies/deletes data | Require confirmation | +| **progressEnabled** | Tool reports progress | Show progress UI | + +These annotations feed into the security analyzer for risk assessment. ## Component Relationships -### How Condenser Integrates +### How MCP Integrates ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR + MCP["MCP System"] + Skills["Skills"] + Tools["Tool Registry"] Agent["Agent"] - Condenser["Condenser"] - State["Conversation State"] - Events["Event Log"] + Security["Security"] - Agent -->|"View.from_events()"| State - State -->|View| Agent - Agent -->|"condense(view)"| Condenser - Condenser -->|"View | Condensation"| Agent - Agent -->|Adds Condensation| Events + Skills -->|Configures| MCP + MCP -->|Registers| Tools + Agent -->|Uses| Tools + MCP -->|Provides hints| Security - style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px + style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` **Relationship Characteristics:** -- **Agent → State**: Calls `View.from_events()` to get current view -- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered -- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) -- **Agent → Events**: Adds `Condensation` event to log when returned +- **Skills → MCP**: Repository skills can embed MCP configurations +- **MCP → Tools**: MCP tools registered alongside native tools +- **Agent → Tools**: Agents use MCP tools like any other tool +- **MCP → Security**: Annotations inform security risk assessment +- **Transparent Integration**: Agent doesn't distinguish MCP from native tools -## See Also +## Design Rationale -- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning -- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management -- **[Events](/sdk/arch/events)** - Condensation event type and append-only log -- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers +**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. +**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. -# Conversation -Source: https://docs.openhands.dev/sdk/arch/conversation +**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. -The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. +**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. -**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) +**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. -## Core Responsibilities +**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. -The Conversation system has four primary responsibilities: +## See Also -1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents -2. **State Orchestration** - Maintain conversation history, events, and execution status -3. **Workspace Coordination** - Bridge agent operations with execution environments -4. **Runtime Services** - Provide persistence, monitoring, security, and visualization +- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment +- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library -## Architecture +### Overview +Source: https://docs.openhands.dev/sdk/arch/overview.md + +The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. + +Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. + +## Relationship with OpenHands Applications + +The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. +- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. +- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. +- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% -flowchart LR - User["User Code"] - - subgraph Factory[" "] - Entry["Conversation()"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% +graph TB + subgraph Interfaces["OpenHands Interfaces"] + UI[OpenHands GUI
React frontend] + CLI[OpenHands CLI
Command-line interface] + Custom[Your Custom Client
Automations & workflows] end - subgraph Implementations[" "] - Local["LocalConversation
Direct execution"] - Remote["RemoteConversation
Via agent-server API"] - end + SDK[Software Agent SDK
openhands.sdk + tools + workspace] - subgraph Core[" "] - State["ConversationState
• agent
workspace • stats • ..."] - EventLog["ConversationState.events
Event storage"] + subgraph External["External Services"] + LLM[LLM Providers
OpenAI, Anthropic, etc.] + Runtime[Runtime Services
Docker, Remote API, etc.] end + + UI --> SDK + CLI --> SDK + Custom --> SDK - User --> Entry - Entry -.->|LocalWorkspace| Local - Entry -.->|RemoteWorkspace| Remote - - Local --> State - Remote --> State - - State --> EventLog + SDK --> LLM + SDK --> Runtime - classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px - classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px - class Entry factory - class Local,Remote impl - class State,EventLog core - class Persist,Stuck,Viz,Secrets service + class UI,CLI,Custom interface + class SDK sdk + class LLM,Runtime external ``` -### Key Components -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | -| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | -| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | -| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | -| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | +## Four-Package Architecture -## Factory Pattern +The agent-sdk is organized into four distinct Python packages: -The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: +| Package | What It Does | When You Need It | +|---------|-------------|------------------| +| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | +| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | +| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | +| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Input["Conversation(agent, workspace)"] - Check{Workspace Type?} - Local["LocalConversation
Agent runs in-process"] - Remote["RemoteConversation
Agent runs via API"] - - Input --> Check - Check -->|str or LocalWorkspace| Local - Check -->|RemoteWorkspace| Remote - - style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Two Deployment Modes -**Dispatch Logic:** -- **Local:** String paths or `LocalWorkspace` → in-process execution -- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket +The SDK supports two deployment architectures depending on your needs: -This abstraction enables switching deployment modes without code changes—just swap the workspace type. +#### Mode 1: Local Development -## State Management +**Installation:** Just install `openhands-sdk` + `openhands-tools` -State updates follow a **two-path pattern** depending on the type of change: +```bash +pip install openhands-sdk openhands-tools +``` + +**Architecture:** ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Start["State Update Request"] - Lock["Acquire FIFO Lock"] - Decision{New Event?} - - StateOnly["Update State Fields
stats, status, metadata"] - EventPath["Append to Event Log
messages, actions, observations"] - - Callback["Trigger Callbacks"] - Release["Release Lock"] +flowchart LR + SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools - Start --> Lock - Lock --> Decision - Decision -->|No| StateOnly - Decision -->|Yes| EventPath - StateOnly --> Callback - EventPath --> Callback - Callback --> Release + SDK -->|uses| Tools - style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px - style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 ``` -**Two Update Patterns:** - -1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) -2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur - -**Thread Safety:** -- FIFO Lock ensures ordered, atomic updates -- Callbacks fire after successful commit -- Read operations never block writes - -## Execution Models +- `LocalWorkspace` included in SDK (no extra install) +- Everything runs in one process +- Perfect for prototyping and simple use cases +- Quick setup, no Docker required -The conversation system supports two execution models with identical APIs: +#### Mode 2: Production / Sandboxed -### Local vs Remote Execution +**Installation:** Install all 4 packages -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Local["LocalConversation"] - L1["User sends message"] - L2["Agent executes in-process"] - L3["Direct tool calls"] - L4["Events via callbacks"] - L1 --> L2 --> L3 --> L4 - end - style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +```bash +pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server ``` +**Architecture:** + ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Remote["RemoteConversation"] - R1["User sends message"] - R2["HTTP → Agent Server"] - R3["Isolated container execution"] - R4["WebSocket event stream"] - R1 --> R2 --> R3 --> R4 +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% +flowchart LR + + WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk + + subgraph WS[" "] + direction LR + Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws + Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws end - style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px + + Server["openhands.agent_server
FastAPI + WebSocket"]:::server + Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · …"]:::tools + + WSBase -.->|extended by| Docker + WSBase -.->|extended by| Remote + Docker -->|spawns container with| Server + Remote -->|connects via HTTP to| Server + Server -->|runs| Agent + Agent -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 + classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 + classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + + style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none ``` -| Aspect | LocalConversation | RemoteConversation | -|--------|-------------------|-------------------| -| **Execution** | In-process | Remote container/server | -| **Communication** | Direct function calls | HTTP + WebSocket | -| **State Sync** | Immediate | Network serialized | -| **Use Case** | Development, CLI tools | Production, web apps | -| **Isolation** | Process-level | Container-level | - -**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. - -## Auxiliary Services - -The conversation system provides pluggable services that operate independently on the event stream: +- `RemoteWorkspace` auto-spawns agent-server in containers +- Sandboxed execution for security +- Multi-user deployments +- Distributed systems (e.g., Kubernetes) support -| Service | Purpose | Architecture Pattern | -|---------|---------|---------------------| -| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | -| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | -| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | -| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | -| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | + +**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). + -**Design Principle:** Services read from the event log but never mutate state directly. This enables: -- Services can be enabled/disabled independently -- Easy to add new services without changing core orchestration -- Event stream acts as the integration point +### SDK Package (`openhands.sdk`) -## Component Relationships +**Purpose:** Core components and base classes for OpenHands agent. -### How Conversation Interacts +**Key Components:** +- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop +- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle +- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry +- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration +- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) +- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) +- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation +- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management +- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Conv["Conversation"] - Agent["Agent"] - WS["Workspace"] - Tools["Tools"] - LLM["LLM"] - - Conv -->|Delegates to| Agent - Conv -->|Configures| WS - Agent -.->|Updates| Conv - Agent -->|Uses| Tools - Agent -->|Queries| LLM - - style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +**Design:** Stateless, immutable components with type-safe Pydantic models. -**Relationship Characteristics:** -- **Conversation → Agent**: One-way orchestration, agent reports back via state updates -- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation -- **Agent → Conversation**: Indirect via state events +**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. -## See Also +**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) -- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design -- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design -- **[Event System](/sdk/arch/events)** - Event types and flow -- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples +### Tools Package (`openhands.tools`) -# Design Principles -Source: https://docs.openhands.dev/sdk/arch/design + +**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. + -The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. +**Purpose:** Pre-built tools following consistent patterns. -[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. +**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. -## Optional Isolation over Mandatory Sandboxing + +For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. + - -**V0 Challenge:** -Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. -Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. - -**V1 Principle:** -**Sandboxing should be opt-in, not universal.** -V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. -When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. +### Workspace Package (`openhands.workspace`) -## Stateless by Default, One Source of Truth for State +**Purpose:** Workspace implementations extending SDK base classes. - -**V0 Challenge:** -V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. - +**Key Components:** Docker Workspace, Remote API Workspace, and more. -**V1 Principle:** -**Keep everything stateless, with exactly one mutable state.** -All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. -The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. +**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. -## Clear Boundaries between Agent and Applications +**Use Cases:** Sandboxed execution, multi-user deployments, production environments. - -**V0 Challenge:** -The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. -Heavy research dependencies and benchmark integrations further bloated production builds. - + +For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). + -**V1 Principle:** -**Maintain strict separation of concerns.** -V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). -Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. +### Agent Server Package (`openhands.agent_server`) +**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. -## Composable Components for Extensibility +**Features:** +- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode +- Service management with isolated per-user sessions +- API key authentication and health checking - -**V0 Challenge:** -Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. - +**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). -**V1 Principle:** -**Everything should be composable and safe to extend.** -Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. -Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. +**Use Cases:** Multi-user web apps, SaaS products, distributed systems. + +For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). + -# Events -Source: https://docs.openhands.dev/sdk/arch/events +## How Components Work Together -The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. +### Basic Execution Flow (Local) -**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) +When you send a message to an agent, here's what happens: -## Core Responsibilities +```mermaid +sequenceDiagram + participant You + participant Conversation + participant Agent + participant LLM + participant Tool + + You->>Conversation: "Create hello.txt" + Conversation->>Agent: Process message + Agent->>LLM: What should I do? + LLM-->>Agent: Use BashTool("touch hello.txt") + Agent->>Tool: Execute action + Note over Tool: Runs in same environment
as Agent (local/container/remote) + Tool-->>Agent: Observation + Agent->>LLM: Got result, continue? + LLM-->>Agent: Done + Agent-->>Conversation: Update state + Conversation-->>You: "File created!" +``` -The Event System has four primary responsibilities: +**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. -1. **Type Safety** - Enforce event schemas through Pydantic models -2. **LLM Integration** - Convert events to/from LLM message formats -3. **Append-Only Log** - Maintain immutable event history -4. **Service Integration** - Enable observers to react to event streams +### Deployment Flexibility -## Architecture +The same agent code runs in different environments by swapping workspace configuration: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% -flowchart TB - Base["Event
Base class"] - LLMBase["LLMConvertibleEvent
Abstract base"] - - subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] - Message["MessageEvent
User/assistant text"] - Action["ActionEvent
Tool calls"] - System["SystemPromptEvent
Initial system prompt"] - CondSummary["CondensationSummaryEvent
Condenser summary"] - - ObsBase["ObservationBaseEvent
Base for tool responses"] - Observation["ObservationEvent
Tool results"] - UserReject["UserRejectObservation
User rejected action"] - AgentError["AgentErrorEvent
Agent error"] +graph TB + subgraph "Your Code (Unchanged)" + Code["Agent + Tools + LLM"] end - subgraph Internals["Internal Events
NOT visible to the LLM"] - ConvState["ConversationStateUpdateEvent
State updates"] - CondReq["CondensationRequest
Request compression"] - Cond["Condensation
Compression result"] - Pause["PauseEvent
User pause"] + subgraph "Deployment Options" + Local["Local
Direct execution"] + Docker["Docker
Containerized"] + Remote["Remote
Multi-user server"] end - Base --> LLMBase - Base --> Internals - LLMBase --> LLMTypes - ObsBase --> Observation - ObsBase --> UserReject - ObsBase --> AgentError - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + Code -->|LocalWorkspace| Local + Code -->|DockerWorkspace| Docker + Code -->|RemoteAPIWorkspace| Remote - class Base,LLMBase,Message,Action,SystemPromptEvent primary - class ObsBase,Observation,UserReject,AgentError secondary - class ConvState,CondReq,Cond,Pause tertiary + style Code fill:#e1f5fe + style Local fill:#e8f5e8 + style Docker fill:#e8f5e8 + style Remote fill:#e8f5e8 ``` -### Key Components +## Next Steps -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | -| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | -| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | -| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | -| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | -| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | -| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | -| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | -| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | -| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | -| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | -| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | -| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | -| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | +### Get Started +- [Getting Started](/sdk/getting-started) – Build your first agent +- [Hello World](/sdk/guides/hello-world) – Minimal example -## Event Types +### Explore Components -### LLM-Convertible Events +**SDK Package:** +- [Agent](/sdk/arch/agent) – Core reasoning-action loop +- [Conversation](/sdk/arch/conversation) – State management and lifecycle +- [LLM](/sdk/arch/llm) – Language model integration +- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern +- [Events](/sdk/arch/events) – Typed event framework +- [Workspace](/sdk/arch/workspace) – Base workspace architecture -Events that participate in agent reasoning and can be converted to LLM messages: +**Tools Package:** +- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details +**Workspace Package:** +- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details -| Event Type | Source | Content | LLM Role | -|------------|--------|---------|----------| -| **MessageEvent (user)** | user | Text, images | `user` | -| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | -| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | -| **ObservationEvent** | environment | Tool execution result | `tool` | -| **UserRejectObservation** | environment | Rejection reason | `tool` | -| **AgentErrorEvent** | agent | Error details | `tool` | -| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | -| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | +**Agent Server:** +- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details -The event system bridges agent events to LLM messages: +### Deploy +- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service +- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Events["Event List"] - Filter["Filter LLMConvertibleEvent"] - Group["Group ActionEvents
by llm_response_id"] - Convert["Convert to Messages"] - LLM["LLM Input"] - - Events --> Filter - Filter --> Group - Group --> Convert - Convert --> LLM - - style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px - style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Source Code +- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework +- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools +- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples -**Special Handling - Parallel Function Calling:** +### SDK Package +Source: https://docs.openhands.dev/sdk/arch/sdk.md -When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): -1. Group all ActionEvents by `llm_response_id` -2. Combine into single Message with multiple `tool_calls` -3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included -4. All subsequent events in the batch have empty thought fields +The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. -**Example:** -``` -ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) -ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) -→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) -``` +**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) +## Purpose -### Internal Events +The SDK package handles: +- **Agent reasoning loop**: How agents process messages and make decisions +- **State management**: Conversation lifecycle and persistence +- **LLM integration**: Provider-agnostic language model access +- **Tool system**: Typed actions and observations +- **Workspace abstraction**: Where code executes +- **Extensibility**: Skills, condensers, MCP, security -Events for metadata, control flow, and user actions (not sent to LLM): +## Core Components -| Event Type | Source | Purpose | Key Fields | -|------------|--------|---------|------------| -| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | -| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | -| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | +```mermaid +graph TB + Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] + + Agent --> LLM[LLM
Language Model] + Agent --> Tools[Tool System
Capabilities] + Agent --> Micro[Skills
Behavior Modules] + Agent --> Cond[Condenser
Memory Manager] + + Tools --> Workspace[Workspace
Execution] + + Conv --> Events[Events
Communication] + Tools --> MCP[MCP
External Tools] + Workspace --> Security[Security
Validation] + + style Conv fill:#e1f5fe + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fff3e0 + style Workspace fill:#fce4ec +``` -**Source Types:** -- **user**: Event originated from user input -- **agent**: Event generated by agent logic -- **environment**: Event from system/framework/tools +### 1. Conversation - State & Lifecycle -## Component Relationships +**What it does**: Manages the entire conversation lifecycle and state. -### How Events Integrate +**Key responsibilities**: +- Maintains conversation state (immutable) +- Handles message flow between user and agent +- Manages turn-taking and async execution +- Persists and restores conversation state +- Emits events for monitoring -## `source` vs LLM `role` +**Design decisions**: +- **Immutable state**: Each operation returns a new Conversation instance +- **Serializable**: Can be saved to disk or database and restored +- **Async-first**: Built for streaming and concurrent execution -Events often carry **two different concepts** that are easy to confuse: +**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. -- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. -- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. +**Example use cases**: +- Saving conversation to database after each turn +- Implementing undo/redo functionality +- Building multi-session chatbots +- Time-travel debugging -These fields are **intentionally independent**. +**Learn more**: +- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) +- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) +- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) -Common examples include: +--- -- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. -- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. +### 2. Agent - The Reasoning Loop -**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. +**What it does**: The core reasoning engine that processes messages and decides what to do. +**Key responsibilities**: +- Receives messages and current state +- Consults LLM to reason about next action +- Validates and executes tool calls +- Processes observations and loops until completion +- Integrates with skills for specialized behavior -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Events["Event System"] - Agent["Agent"] - Conversation["Conversation"] - Tools["Tools"] - Services["Auxiliary Services"] - - Agent -->|Reads| Events - Agent -->|Writes| Events - Conversation -->|Manages| Events - Tools -->|Creates| Events - Events -.->|Stream| Services - - style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +**Design decisions**: +- **Stateless**: Agent doesn't hold state, operates on Conversation +- **Extensible**: Behavior can be modified via skills +- **Provider-agnostic**: Works with any LLM through unified interface -**Relationship Characteristics:** -- **Agent → Events**: Reads history for context, writes actions/messages -- **Conversation → Events**: Owns and persists event log -- **Tools → Events**: Create ObservationEvents after execution -- **Services → Events**: Read-only observers for monitoring, visualization +**The reasoning loop**: +1. Receive message from Conversation +2. Add message to context +3. Consult LLM with full conversation history +4. If LLM returns tool call → validate and execute tool +5. If tool returns observation → add to context, go to step 3 +6. If LLM returns response → done, return to user -## Error Events: Agent vs Conversation +**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. -Two distinct error events exist in the SDK, with different purpose and visibility: +**Example use cases**: +- Planning agents that break tasks into steps +- Code review agents with specific checks +- Agents with domain-specific reasoning patterns -- AgentErrorEvent - - Type: ObservationBaseEvent (LLM-convertible) - - Scope: Error for a specific tool call (has tool_name and tool_call_id) - - Source: "agent" - - LLM visibility: Sent as a tool message so the model can react/recover - - Effect: Conversation continues; not a terminal state - - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py +**Learn more**: +- Guide: [Custom Agents](/sdk/guides/agent-custom) +- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) +- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) -- ConversationErrorEvent - - Type: Event (not LLM-convertible) - - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) - - Source: typically "environment" - - LLM visibility: Not sent to the model - - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications - - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py +--- -## See Also +### 3. LLM - Language Model Integration -- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events -- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management -- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation -- **[Condenser](/sdk/arch/condenser)** - Event history compression +**What it does**: Provides a provider-agnostic interface to language models. +**Key responsibilities**: +- Abstracts different LLM providers (OpenAI, Anthropic, etc.) +- Handles message formatting and conversion +- Manages streaming responses +- Supports tool calling and reasoning modes +- Handles retries and error recovery -# LLM -Source: https://docs.openhands.dev/sdk/arch/llm +**Design decisions**: +- **Provider-agnostic**: Same API works with any provider +- **Streaming-first**: Built for real-time responses +- **Type-safe**: Pydantic models for all messages +- **Extensible**: Easy to add new providers -The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. +**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: +- Cost optimization (switch to cheaper models) +- Testing with different models +- Avoiding vendor lock-in +- Supporting customer choice -**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) +**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. -## Core Responsibilities +**Example use cases**: +- Routing requests to different models based on complexity +- Implementing custom caching strategies +- Adding observability hooks -The LLM system has five primary responsibilities: +**Learn more**: +- Guide: [LLM Registry](/sdk/guides/llm-registry) +- Guide: [LLM Routing](/sdk/guides/llm-routing) +- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) +- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) -1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers -2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) -3. **Configuration Management** - Load from environment, JSON, or programmatic configuration -4. **Telemetry & Cost** - Track usage, latency, and costs across providers -5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries +--- -## Architecture +### 4. Tool System - Typed Capabilities -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% -flowchart TB - subgraph Configuration["Configuration Sources"] - Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] - JSON["JSON Files
config/llm.json"] - Code["Programmatic
LLM(...)"] - end - - subgraph Core["Core LLM"] - Model["LLM Model
Pydantic configuration"] - Pipeline["Request Pipeline
Retry, timeout, telemetry"] - end - - subgraph Backend["LiteLLM Backend"] - Providers["100+ Providers
OpenAI, Anthropic, etc."] - end - - subgraph Output["Telemetry"] - Usage["Token Usage"] - Cost["Cost Tracking"] - Latency["Latency Metrics"] - end - - Env --> Model - JSON --> Model - Code --> Model - - Model --> Pipeline - Pipeline --> Providers - - Pipeline --> Usage - Pipeline --> Cost - Pipeline --> Latency - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Model primary - class Pipeline secondary - class LiteLLM tertiary -``` +**What it does**: Defines what agents can do through a typed action/observation pattern. -### Key Components +**Key responsibilities**: +- Defines tool schemas (inputs and outputs) +- Validates actions before execution +- Executes tools and returns typed observations +- Generates JSON schemas for LLM tool calling +- Registers tools with the agent -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | -| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | -| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | -| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | -| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | -| **Telemetry** | Usage tracking | Token counts, costs, latency | +**Design decisions**: +- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs +- **Schema generation**: Pydantic models auto-generate JSON schemas +- **Executor pattern**: Separation of tool definition and execution +- **Composable**: Tools can call other tools -## Configuration +**The three components**: +1. **Action**: Input schema (what the tool accepts) +2. **Observation**: Output schema (what the tool returns) +3. **ToolExecutor**: Logic that transforms Action → Observation -See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. +**Why this pattern?** +- Type safety catches errors early +- LLMs get accurate schemas for tool calling +- Tools are testable in isolation +- Easy to compose tools -### Programmatic Configuration +**When to customize**: When you need domain-specific capabilities not covered by built-in tools. -Create LLM instances directly in code: +**Example use cases**: +- Database query tools +- API integration tools +- Custom file format parsers +- Domain-specific calculators -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Code["Python Code"] - LLM["LLM(model=...)"] - Agent["Agent"] - - Code --> LLM - LLM --> Agent - - style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` +**Learn more**: +- Guide: [Custom Tools](/sdk/guides/custom-tools) +- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) -**Example:** -```python -from pydantic import SecretStr -from openhands.sdk import LLM +--- -llm = LLM( - model="anthropic/claude-sonnet-4.1", - api_key=SecretStr("sk-ant-123"), - temperature=0.1, - timeout=120, -) -``` +### 5. Workspace - Execution Abstraction -### Environment Variable Configuration +**What it does**: Abstracts *where* code executes (local, Docker, remote). -Load from environment using naming convention: +**Key responsibilities**: +- Provides unified interface for code execution +- Handles file operations across environments +- Manages working directories +- Supports different isolation levels -**Environment Variable Pattern:** -- **Prefix:** All variables start with `LLM_` -- **Mapping:** `LLM_FIELD` → `field` (lowercased) -- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr +**Design decisions**: +- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package +- **Environment-agnostic**: Code works the same locally or remotely +- **Lazy initialization**: Workspace setup happens on first use -**Common Variables:** -```bash -export LLM_MODEL="anthropic/claude-sonnet-4.1" -export LLM_API_KEY="sk-ant-123" -export LLM_USAGE_ID="primary" -export LLM_TIMEOUT="120" -export LLM_NUM_RETRIES="5" -``` +**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. -### JSON Configuration +**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. -Serialize and load from JSON files: +**Learn more**: +- Architecture: [Workspace Architecture](/sdk/arch/workspace) +- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) +- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) -**Example:** -```python -# Save -llm.model_dump_json(exclude_none=True, indent=2) +--- -# Load -llm = LLM.load_from_json("config/llm.json") -``` +### 6. Events - Component Communication -**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). -If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. +**What it does**: Enables observability and debugging through event emissions. +**Key responsibilities**: +- Defines event types (messages, actions, observations, errors) +- Emitted by Conversation, Agent, Tools +- Enables logging, debugging, and monitoring +- Supports custom event handlers -## Request Pipeline +**Design decisions**: +- **Immutable**: Events are snapshots, not mutable objects +- **Serializable**: Can be logged, stored, replayed +- **Type-safe**: Pydantic models for all events -### Completion Flow +**Why events?** They provide a timeline of what happened during agent execution. Essential for: +- Debugging agent behavior +- Understanding decision-making +- Building observability dashboards +- Implementing custom logging -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% -flowchart TB - Request["completion() or responses() call"] - Validate["Validate Config"] - - Attempt["LiteLLM Request"] - Success{"Success?"} - - Retry{"Retries
remaining?"} - Wait["Exponential Backoff"] - - Telemetry["Record Telemetry"] - Response["Return Response"] - Error["Raise Error"] - - Request --> Validate - Validate --> Attempt - Attempt --> Success - - Success -->|Yes| Telemetry - Success -->|No| Retry - - Retry -->|Yes| Wait - Retry -->|No| Error - - Wait --> Attempt - Telemetry --> Response - - style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. -**Pipeline Stages:** +**Learn more**: +- Guide: [Metrics and Observability](/sdk/guides/metrics) +- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) -1. **Validation:** Check required fields (model, messages) -2. **Request:** Call LiteLLM with provider-specific formatting -3. **Retry Logic:** Exponential backoff on failures (configurable) -4. **Telemetry:** Record tokens, cost, latency -5. **Response:** Return completion or raise error +--- -### Responses API Support +### 7. Condenser - Memory Management -In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. +**What it does**: Compresses conversation history when it gets too long. -#### Architecture +**Key responsibilities**: +- Monitors conversation length +- Summarizes older messages +- Preserves important context +- Keeps conversation within token limits -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Check{"Model supports
Responses API?"} - - subgraph Standard["Standard Path"] - ChatFormat["Format as
Chat Messages"] - ChatCall["litellm.completion()"] - end - - subgraph ResponsesPath["Responses Path"] - RespFormat["Format as
instructions + input[]"] - RespCall["litellm.responses()"] - end - - ChatResponse["ModelResponse"] - RespResponse["ResponsesAPIResponse"] - - Parse["Parse to Message"] - Return["LLMResponse"] - - Check -->|No| ChatFormat - Check -->|Yes| RespFormat - - ChatFormat --> ChatCall - RespFormat --> RespCall - - ChatCall --> ChatResponse - RespCall --> RespResponse - - ChatResponse --> Parse - RespResponse --> Parse - - Parse --> Return - - style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +**Design decisions**: +- **Pluggable**: Different condensing strategies +- **Automatic**: Triggered when context gets large +- **Preserves semantics**: Important information retained -#### Supported Models +**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. -Models that automatically use the Responses API path: +**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. -| Pattern | Examples | Documentation | -|---------|----------|---------------| -| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | +**Example strategies**: +- Summarize old messages +- Keep only last N turns +- Preserve task-related messages -**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). +**Learn more**: +- Guide: [Context Condenser](/sdk/guides/context-condenser) +- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) +--- -## Provider Integration +### 8. MCP - Model Context Protocol -### LiteLLM Abstraction +**What it does**: Integrates external tool servers via Model Context Protocol. -Software Agent SDK uses LiteLLM for provider abstraction: +**Key responsibilities**: +- Connects to MCP-compatible tool servers +- Translates MCP tools to SDK tool format +- Manages server lifecycle +- Handles server communication -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - SDK["Software Agent SDK"] - LiteLLM["LiteLLM"] - - subgraph Providers["100+ Providers"] - OpenAI["OpenAI"] - Anthropic["Anthropic"] - Google["Google"] - Azure["Azure"] - Others["..."] - end - - SDK --> LiteLLM - LiteLLM --> OpenAI - LiteLLM --> Anthropic - LiteLLM --> Google - LiteLLM --> Azure - LiteLLM --> Others - - style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +**Design decisions**: +- **Standard protocol**: Uses MCP specification +- **Transparent integration**: MCP tools look like regular tools to agents +- **Process management**: Handles server startup/shutdown -**Benefits:** -- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. -- **Unified API:** Same interface regardless of provider -- **Format Translation:** Provider-specific request/response formatting -- **Error Handling:** Normalized error codes and messages +**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. -### LLM Providers +**When to use**: When you need tools that: +- Already have MCP servers (fetch, filesystem, etc.) +- Are too complex to rewrite as SDK tools +- Need to run in separate processes +- Are provided by third parties -Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. -The pages linked below live under the OpenHands app section but apply -verbatim to SDK applications because both layers wrap the same -`openhands.sdk.llm.LLM` interface. +**Learn more**: +- Guide: [MCP Integration](/sdk/guides/mcp) +- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) +- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) -| Provider / scenario | Documentation | -| --- | --- | -| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | -| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | -| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | -| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | -| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | -| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | -| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | -| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | -| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | -| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | +--- -When you follow any of those guides while building with the SDK, create an -`LLM` object using the documented parameters (for example, API keys, base URLs, -or custom headers) and pass it into your agent or registry. The OpenHands UI -surfacing is simply a convenience layer on top of the same configuration model. +### 9. Skills (formerly Microagents) - Behavior Modules +**What it does**: Specialized modules that modify agent behavior for specific tasks. -## Telemetry and Cost Tracking +**Key responsibilities**: +- Provide domain-specific instructions +- Modify system prompts +- Guide agent decision-making +- Compose to create specialized agents -### Telemetry Collection +**Design decisions**: +- **Composable**: Multiple skills can work together +- **Declarative**: Defined as configuration, not code +- **Reusable**: Share skills across agents -LLM requests automatically collect metrics: +**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. + +**Example skills**: +- GitHub operations (issue creation, PRs) +- Code review guidelines +- Documentation style enforcement +- Project-specific conventions + +**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. + +**Learn more**: +- Guide: [Agent Skills & Context](/sdk/guides/skill) +- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +--- + +### 10. Security - Validation & Sandboxing + +**What it does**: Validates inputs and enforces security constraints. + +**Key responsibilities**: +- Input validation +- Command sanitization +- Path traversal prevention +- Resource limits + +**Design decisions**: +- **Defense in depth**: Multiple validation layers +- **Fail-safe**: Rejects suspicious inputs by default +- **Configurable**: Adjust security levels as needed + +**Why needed?** Agents execute arbitrary code and file operations. Security prevents: +- Malicious prompts escaping sandboxes +- Path traversal attacks +- Resource exhaustion +- Unintended system access + +**When to customize**: When you need domain-specific validation rules or want to adjust security policies. + +**Learn more**: +- Guide: [Security and Secrets](/sdk/guides/security) +- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +--- + +## How Components Work Together + +### Example: User asks agent to create a file -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Request["LLM Request"] - - subgraph Metrics - Tokens["Token Counts
Input/Output"] - Cost["Cost
USD"] - Latency["Latency
ms"] - end - - Events["Event Log"] - - Request --> Tokens - Request --> Cost - Request --> Latency - - Tokens --> Events - Cost --> Events - Latency --> Events - - style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +1. User → Conversation: "Create a file called hello.txt with 'Hello World'" -**Tracked Metrics:** -- **Token Usage:** Input tokens, output tokens, total -- **Cost:** Per-request cost using configured rates -- **Latency:** Request duration in milliseconds -- **Errors:** Failure types and retry counts +2. Conversation → Agent: New message event -### Cost Configuration +3. Agent → LLM: Full conversation history + available tools -Configure per-token costs for custom models: +4. LLM → Agent: Tool call for FileEditorTool.create() -```python -llm = LLM( - model="custom/my-model", - input_cost_per_token=0.00001, # $0.01 per 1K tokens - output_cost_per_token=0.00003, # $0.03 per 1K tokens -) +5. Agent → Tool System: Validate FileEditorAction + +6. Tool System → Tool Executor: Execute action + +7. Tool Executor → Workspace: Create file (local/docker/remote) + +8. Workspace → Tool Executor: Success + +9. Tool Executor → Tool System: FileEditorObservation (success=true) + +10. Tool System → Agent: Observation + +11. Agent → LLM: Updated history with observation + +12. LLM → Agent: "File created successfully" + +13. Agent → Conversation: Done, final response + +14. Conversation → User: "File created successfully" ``` -**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) +Throughout this flow: +- **Events** are emitted for observability +- **Condenser** may trigger if history gets long +- **Skills** influence LLM's decision-making +- **Security** validates file paths and operations +- **MCP** could provide additional tools if configured -**Custom Costs:** Override for: -- Internal models -- Custom pricing agreements -- Cost estimation for budgeting +## Design Patterns -## Component Relationships +### Immutability -### How LLM Integrates +All core objects are immutable. Operations return new instances: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - LLM["LLM"] - Agent["Agent"] - Conversation["Conversation"] - Events["Events"] - Security["Security Analyzer"] - Condenser["Context Condenser"] - - Agent -->|Uses| LLM - LLM -->|Records| Events - Security -.->|Optional| LLM - Condenser -.->|Optional| LLM - Conversation -->|Provides context| Agent - - style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +```python +conversation = Conversation(...) +new_conversation = conversation.add_message(message) +# conversation is unchanged, new_conversation has the message ``` -**Relationship Characteristics:** -- **Agent → LLM**: Agent uses LLM for reasoning and tool calls -- **LLM → Events**: LLM requests/responses recorded as events -- **Security → LLM**: Optional security analyzer can use separate LLM -- **Condenser → LLM**: Optional context condenser can use separate LLM -- **Configuration**: LLM configured independently, passed to agent -- **Telemetry**: LLM metrics flow through event system to UI/logging +**Why?** Makes debugging easier, enables time-travel, ensures serializability. -## See Also +### Composition Over Inheritance -- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions -- **[Events](/sdk/arch/events)** - LLM request/response event types -- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis -- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration +Agents are composed from: +- LLM provider +- Tool list +- Skill list +- Condenser strategy +- Security policy + +You don't subclass Agent - you configure it. +**Why?** More flexible, easier to test, enables runtime configuration. -# MCP Integration -Source: https://docs.openhands.dev/sdk/arch/mcp +### Type Safety -The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. +Everything uses Pydantic models: +- Messages, actions, observations are typed +- Validation happens automatically +- Schemas generate from types -**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) +**Why?** Catches errors early, provides IDE support, self-documenting. + +## Next Steps + +### For Usage Examples + +- [Getting Started](/sdk/getting-started) - Build your first agent +- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities +- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers +- [Conversation Management](/sdk/guides/convo-persistence) - State handling + +### For Related Architecture + +- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations +- [Workspace Architecture](/sdk/arch/workspace) - Execution environments +- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution + +### For Implementation Details + +- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code +- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code +- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + +### Security +Source: https://docs.openhands.dev/sdk/arch/security.md + +The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. + +**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) ## Core Responsibilities -The MCP Integration system has four primary responsibilities: +The Security system has four primary responsibilities: -1. **MCP Client Management** - Connect to and communicate with MCP servers -2. **Tool Discovery** - Enumerate available tools from MCP servers -3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions -4. **Execution Bridge** - Execute MCP tool calls from agent actions +1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions +2. **Confirmation Policy** - Determine when user approval is required based on risk +3. **Action Validation** - Enforce security policies before execution +4. **Audit Trail** - Record security decisions in event history ## Architecture ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% flowchart TB - subgraph Client["MCP Client"] - Sync["MCPClient
Sync/Async bridge"] - Async["AsyncMCPClient
FastMCP base"] + subgraph Interface["Abstract Interface"] + Base["SecurityAnalyzerBase
Abstract analyzer"] end - subgraph Bridge["Tool Bridge"] - Def["MCPToolDefinition
Schema conversion"] - Exec["MCPToolExecutor
Execution handler"] + subgraph Implementations["Concrete Analyzers"] + LLM["LLMSecurityAnalyzer
Inline risk prediction"] + NoOp["NoOpSecurityAnalyzer
No analysis"] end - subgraph Integration["Agent Integration"] - Action["MCPToolAction
Dynamic model"] - Obs["MCPToolObservation
Result wrapper"] + subgraph Risk["Risk Levels"] + Low["LOW
Safe operations"] + Medium["MEDIUM
Moderate risk"] + High["HIGH
Dangerous ops"] + Unknown["UNKNOWN
Unanalyzed"] end - subgraph External["External"] - Server["MCP Server
stdio/HTTP"] - Tools["External Tools"] + subgraph Policy["Confirmation Policy"] + Check["should_require_confirmation()"] + Mode["Confirmation Mode"] + Decision["Require / Allow"] end - Sync --> Async - Async --> Server + Base --> LLM + Base --> NoOp - Server --> Def - Def --> Exec + Implementations --> Low + Implementations --> Medium + Implementations --> High + Implementations --> Unknown - Exec --> Action - Action --> Server - Server --> Obs + Low --> Check + Medium --> Check + High --> Check + Unknown --> Check - Server -.->|Spawns| Tools + Check --> Mode + Mode --> Decision classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - class Sync,Async primary - class Def,Exec secondary - class Action,Obs tertiary + class Base primary + class LLM secondary + class High danger + class Check tertiary ``` ### Key Components | Component | Purpose | Design | |-----------|---------|--------| -| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | -| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | -| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | -| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | -| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | - -## MCP Client +| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | +| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | +| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | +| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | +| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | -### Sync/Async Bridge +## Risk Levels -The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: +Security analyzers return one of four risk levels: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart TB - Sync["Sync Code
Agent execution"] - Bridge["call_async_from_sync()"] - Executor["AsyncExecutor
Background loop"] - Async["Async MCP Call"] - Server["MCP Server"] - Result["Result"] + Action["ActionEvent"] + Analyze["Security Analyzer"] - Sync --> Bridge - Bridge --> Executor - Executor --> Async - Async --> Server - Server --> Result - Result --> Sync + subgraph Levels["Risk Levels"] + Low["LOW
Read-only, safe"] + Medium["MEDIUM
Modify files"] + High["HIGH
Delete, execute"] + Unknown["UNKNOWN
Not analyzed"] + end - style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px + Action --> Analyze + Analyze --> Low + Analyze --> Medium + Analyze --> High + Analyze --> Unknown + + style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px + style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px + style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px ``` -**Bridge Pattern:** -- **Problem:** MCP protocol is async, but agent tools run synchronously -- **Solution:** Background event loop that executes async code from sync contexts -- **Benefit:** Agents use MCP tools without async/await in tool definitions - -**Client Features:** -- **Lifecycle Management:** `__enter__`/`__exit__` for context manager -- **Timeout Support:** Configurable timeouts for MCP operations -- **Error Handling:** Wraps MCP errors in observations -- **Connection Pooling:** Reuses connections across tool calls - -### MCP Server Configuration - -MCP servers are configured using the FastMCP format: +### Risk Level Definitions -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "filesystem": { - "command": "npx", - "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] - } - } -} -``` +| Level | Characteristics | Examples | +|-------|----------------|----------| +| **LOW** | Read-only, no state changes | File reading, directory listing, search | +| **MEDIUM** | Modifies user data | File editing, creating files, API calls | +| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | +| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | -**Configuration Fields:** -- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) -- **args:** Arguments to pass to command -- **env:** Environment variables (optional) +## Security Analyzers -## Tool Discovery and Conversion +### LLMSecurityAnalyzer -### Discovery Flow +Leverages the LLM's inline risk assessment during action generation: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% flowchart TB - Config["MCP Config"] - Spawn["Spawn Server"] - List["List Tools"] - - subgraph Convert["Convert Each Tool"] - Schema["MCP Schema"] - Action["Generate Action Model"] - Def["Create ToolDefinition"] - end - - Register["Register in ToolRegistry"] - - Config --> Spawn - Spawn --> List - List --> Schema + Schema["Tool Schema
+ security_risk param"] + LLM["LLM generates action
with security_risk"] + ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] + Extract["Extract security_risk
from arguments"] + ActionEvent["ActionEvent
with security_risk set"] + Analyzer["LLMSecurityAnalyzer
returns security_risk"] - Schema --> Action - Action --> Def - Def --> Register + Schema --> LLM + LLM --> ToolCall + ToolCall --> Extract + Extract --> ActionEvent + ActionEvent --> Analyzer - style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Discovery Steps:** +**Analysis Process:** -1. **Spawn Server:** Launch MCP server via stdio -2. **List Tools:** Call `tools/list` MCP endpoint -3. **Parse Schemas:** Extract tool names, descriptions, parameters -4. **Generate Models:** Dynamically create Pydantic models for actions -5. **Create Definitions:** Wrap in `ToolDefinition` objects -6. **Register:** Add to agent's tool registry +1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema +2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments +3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments +4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` +5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level +6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step -### Schema Conversion +**Example Tool Call:** +```json +{ + "name": "execute_bash", + "arguments": { + "command": "rm -rf /tmp/cache", + "security_risk": "HIGH" + } +} +``` -MCP tool schemas are converted to SDK tool definitions: +The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. + +**Configuration:** +- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent +- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools +- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation + +### NoOpSecurityAnalyzer + +Passthrough analyzer that skips analysis: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - MCP["MCP Tool Schema
JSON Schema"] - Parse["Parse Parameters"] - Model["Dynamic Pydantic Model
MCPToolAction"] - Def["ToolDefinition
SDK format"] + Action["ActionEvent"] + NoOp["NoOpSecurityAnalyzer"] + Unknown["SecurityRisk.UNKNOWN"] - MCP --> Parse - Parse --> Model - Model --> Def + Action --> NoOp --> Unknown - style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px ``` -**Conversion Rules:** +**Use Case:** Development, trusted environments, or when confirmation mode handles all actions -| MCP Schema | SDK Action Model | -|------------|------------------| -| **name** | Class name (camelCase) | -| **description** | Docstring | -| **inputSchema** | Pydantic fields | -| **required** | Field(required=True) | -| **type** | Python type hints | +## Confirmation Policy -**Example:** +The confirmation policy determines when user approval is required. There are three policy implementations: -```python -# MCP Schema -{ - "name": "fetch_url", - "description": "Fetch content from URL", - "inputSchema": { - "type": "object", - "properties": { - "url": {"type": "string"}, - "timeout": {"type": "number"} - }, - "required": ["url"] - } -} +**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) -# Generated Action Model -class FetchUrl(MCPToolAction): - """Fetch content from URL""" - url: str - timeout: float | None = None -``` +### Policy Types -## Tool Execution +| Policy | Behavior | Use Case | +|--------|----------|----------| +| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | +| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | +| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | -### Execution Flow +### ConfirmRisky (Default Policy) + +The most flexible policy with configurable thresholds: ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% flowchart TB - Agent["Agent generates action"] - Action["MCPToolAction"] - Executor["MCPToolExecutor"] + Risk["SecurityRisk"] + CheckUnknown{"Risk ==
UNKNOWN?"} + UseConfirmUnknown{"confirm_unknown
setting?"} + CheckThreshold{"risk.is_riskier
(threshold)?"} - Convert["Convert to MCP format"] - Call["MCP call_tool"] - Server["MCP Server"] + Confirm["Require Confirmation"] + Allow["Allow Execution"] - Result["MCP Result"] - Obs["MCPToolObservation"] - Return["Return to Agent"] + Risk --> CheckUnknown + CheckUnknown -->|Yes| UseConfirmUnknown + CheckUnknown -->|No| CheckThreshold - Agent --> Action - Action --> Executor - Executor --> Convert - Convert --> Call - Call --> Server - Server --> Result - Result --> Obs - Obs --> Return + UseConfirmUnknown -->|True| Confirm + UseConfirmUnknown -->|False| Allow - style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px + CheckThreshold -->|Yes| Confirm + CheckThreshold -->|No| Allow + + style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px ``` -**Execution Steps:** +**Configuration:** +- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required + - Cannot be set to `UNKNOWN` + - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` +- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation -1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` -2. **Executor Lookup:** Find `MCPToolExecutor` for tool name -3. **Format Conversion:** Convert action fields to MCP arguments -4. **MCP Call:** Execute `call_tool` via MCP client -5. **Result Parsing:** Parse MCP result (text, images, resources) -6. **Observation Creation:** Wrap in `MCPToolObservation` -7. **Error Handling:** Catch exceptions, return error observations +### Confirmation Rules by Policy -### MCPToolExecutor +#### ConfirmRisky with threshold=HIGH (Default) -Executors bridge SDK actions to MCP calls: +| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | +|------------|----------------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | ✅ Allow | ✅ Allow | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=MEDIUM + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=LOW + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +**Key Rules:** +- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` +- **UNKNOWN handling** is configurable via `confirm_unknown` flag +- **Threshold cannot be UNKNOWN** - validated at policy creation time + + +## Component Relationships ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - Executor["MCPToolExecutor"] - Client["MCP Client"] - Name["tool_name"] + Security["Security Analyzer"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + MCP["MCP Tools"] - Executor -->|Uses| Client - Executor -->|Knows| Name + Agent -->|Validates actions| Security + Security -->|Checks| Tools + Security -->|Uses hints| MCP + Conversation -->|Pauses for confirmation| Agent - style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Executor Responsibilities:** -- **Client Management:** Hold reference to MCP client -- **Tool Identification:** Know which MCP tool to call -- **Argument Conversion:** Transform action fields to MCP format -- **Result Handling:** Parse MCP responses -- **Error Recovery:** Handle connection errors, timeouts, server failures +**Relationship Characteristics:** +- **Agent → Security**: Validates actions before execution +- **Security → Tools**: Examines tool characteristics (annotations) +- **Security → MCP**: Uses MCP hints for risk assessment +- **Conversation → Agent**: Pauses for user confirmation when required +- **Optional Component**: Security analyzer can be disabled for trusted environments -## MCP Tool Lifecycle +## See Also -### From Configuration to Execution +- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers +- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints +- **[Security Guide](/sdk/guides/security)** - Configuring security policies + +### Skill +Source: https://docs.openhands.dev/sdk/arch/skill.md + +The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. + +**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +## Core Responsibilities + +The Skill system has four primary responsibilities: + +1. **Context Injection** - Add specialized prompts to agent context based on triggers +2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) +3. **MCP Integration** - Load MCP tools associated with repository skills +4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats + +## Architecture ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% flowchart TB - Load["Load MCP Config"] - Start["Start Conversation"] - Spawn["Spawn MCP Servers"] - Discover["Discover Tools"] - Register["Register Tools"] + subgraph Types["Skill Types"] + Repo["Repository Skill
trigger: None"] + Knowledge["Knowledge Skill
trigger: KeywordTrigger"] + Task["Task Skill
trigger: TaskTrigger"] + end - Ready["Agent Ready"] + subgraph Triggers["Trigger Evaluation"] + Always["Always Active
Repository guidelines"] + Keyword["Keyword Match
String matching on user messages"] + TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] + end - Step["Agent Step"] - LLM["LLM Tool Call"] - Execute["Execute MCP Tool"] - Result["Return Observation"] + subgraph Content["Skill Content"] + Markdown["Markdown with Frontmatter"] + MCPTools["MCP Tools Config
Repo skills only"] + Inputs["Input Metadata
Task skills only"] + end - End["End Conversation"] - Cleanup["Close MCP Clients"] + subgraph Integration["Agent Integration"] + Context["Agent Context"] + Prompt["System Prompt"] + end - Load --> Start - Start --> Spawn - Spawn --> Discover - Discover --> Register - Register --> Ready + Repo --> Always + Knowledge --> Keyword + Task --> TaskMatch - Ready --> Step - Step --> LLM - LLM --> Execute - Execute --> Result - Result --> Step + Always --> Markdown + Keyword --> Markdown + TaskMatch --> Markdown - Step --> End - End --> Cleanup + Repo -.->|Optional| MCPTools + Task -.->|Requires| Inputs - style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px + Markdown --> Context + MCPTools --> Context + Context --> Prompt + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Repo,Knowledge,Task primary + class Always,Keyword,TaskMatch secondary + class Context tertiary ``` -**Lifecycle Phases:** +### Key Components -| Phase | Operations | Components | -|-------|-----------|------------| -| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | -| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | -| **Execution** | Handle tool calls | Agent, MCPToolAction | -| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | +| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | +| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | +| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | +| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | -## MCP Annotations +## Skill Types -MCP tools can include metadata hints for agents: +### Repository Skills + +Always-active, repository-specific guidelines. + +**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% flowchart LR - Tool["MCP Tool"] - - subgraph Annotations - ReadOnly["readOnlyHint"] - Destructive["destructiveHint"] - Progress["progressEnabled"] - end - - Security["Security Analysis"] - - Tool --> ReadOnly - Tool --> Destructive - Tool --> Progress + File["AGENTS.md"] + Parse["Parse Frontmatter"] + Skill["Skill(trigger=None)"] + Context["Always in Context"] - ReadOnly --> Security - Destructive --> Security + File --> Parse + Parse --> Skill + Skill --> Context - style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Annotation Types:** +**Characteristics:** +- **Trigger:** `None` (always active) +- **Purpose:** Project conventions, coding standards, architecture rules +- **MCP Tools:** Can include MCP tool configuration +- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) -| Annotation | Meaning | Use Case | -|------------|---------|----------| -| **readOnlyHint** | Tool doesn't modify state | Lower security risk | -| **destructiveHint** | Tool modifies/deletes data | Require confirmation | -| **progressEnabled** | Tool reports progress | Show progress UI | +**Example Files (permanent context):** +- `AGENTS.md` - General agent instructions +- `GEMINI.md` - Gemini-specific instructions +- `CLAUDE.md` - Claude-specific instructions -These annotations feed into the security analyzer for risk assessment. +**Other supported formats:** +- `.cursorrules` - Cursor IDE guidelines +- `agents.md` / `agent.md` - General agent instructions -## Component Relationships +### Knowledge Skills -### How MCP Integrates +Keyword-triggered skills for specialized domains: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - MCP["MCP System"] - Skills["Skills"] - Tools["Tool Registry"] - Agent["Agent"] - Security["Security"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Check["Check Keywords"] + Match{"Match?"} + Activate["Activate Skill"] + Skip["Skip Skill"] + Context["Add to Context"] - Skills -->|Configures| MCP - MCP -->|Registers| Tools - Agent -->|Uses| Tools - MCP -->|Provides hints| Security + User --> Check + Check --> Match + Match -->|Yes| Activate + Match -->|No| Skip + Activate --> Context - style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**Relationship Characteristics:** -- **Skills → MCP**: Repository skills can embed MCP configurations -- **MCP → Tools**: MCP tools registered alongside native tools -- **Agent → Tools**: Agents use MCP tools like any other tool -- **MCP → Security**: Annotations inform security risk assessment -- **Transparent Integration**: Agent doesn't distinguish MCP from native tools +**Characteristics:** +- **Trigger:** `KeywordTrigger` with regex patterns +- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") +- **Activation:** Keywords detected in user messages +- **Location:** System or user-defined knowledge base -## Design Rationale +**Trigger Example:** +```yaml +--- +name: kubernetes +trigger: + type: keyword + keywords: ["kubernetes", "k8s", "kubectl"] +--- +``` -**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. +### Task Skills -**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. +Keyword-triggered skills with structured inputs for guided workflows: -**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Match{"Keyword
Match?"} + Inputs["Collect User Inputs"] + Template["Apply Template"] + Context["Add to Context"] + Skip["Skip Skill"] + + User --> Match + Match -->|Yes| Inputs + Match -->|No| Skip + Inputs --> Template + Template --> Context + + style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. +**Characteristics:** +- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) +- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) +- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) +- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) +- **Location:** System-defined or custom task templates -**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. +**Trigger Example:** +```yaml +--- +name: bug_fix +triggers: ["/bug_fix", "fix bug", "bug report"] +inputs: + - name: bug_description + description: "Describe the bug" + required: true +--- +``` -**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. +**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. -## See Also +## Trigger Evaluation -- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework -- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills -- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment -- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications -- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library +Skills are evaluated at different points in the agent lifecycle: +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent Step Start"] + + Repo["Check Repository Skills
trigger: None"] + AddRepo["Always Add to Context"] + + Message["Check User Message"] + Keyword["Match Keyword Triggers"] + AddKeyword["Add Matched Skills"] + + TaskType["Check Task Type"] + TaskMatch["Match Task Triggers"] + AddTask["Add Task Skill"] + + Build["Build Agent Context"] + + Start --> Repo + Repo --> AddRepo + + Start --> Message + Message --> Keyword + Keyword --> AddKeyword + + Start --> TaskType + TaskType --> TaskMatch + TaskMatch --> AddTask + + AddRepo --> Build + AddKeyword --> Build + AddTask --> Build + + style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -# Overview -Source: https://docs.openhands.dev/sdk/arch/overview +**Evaluation Rules:** -The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. +| Trigger Type | Evaluation Point | Activation Condition | +|--------------|------------------|----------------------| +| **None** | Every step | Always active | +| **KeywordTrigger** | On user message | Keyword/string match in message | +| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | -Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. +**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. -## Relationship with OpenHands Applications +## MCP Tool Integration -The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. -- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. -- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. -- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. +Repository skills can include MCP tool configurations: ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% -graph TB - subgraph Interfaces["OpenHands Interfaces"] - UI[OpenHands GUI
React frontend] - CLI[OpenHands CLI
Command-line interface] - Custom[Your Custom Client
Automations & workflows] - end - - SDK[Software Agent SDK
openhands.sdk + tools + workspace] - - subgraph External["External Services"] - LLM[LLM Providers
OpenAI, Anthropic, etc.] - Runtime[Runtime Services
Docker, Remote API, etc.] - end - - UI --> SDK - CLI --> SDK - Custom --> SDK - - SDK --> LLM - SDK --> Runtime +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skill["Repository Skill"] + MCPConfig["mcp_tools Config"] + Client["MCP Client"] + Tools["Tool Registry"] - classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + Skill -->|Contains| MCPConfig + MCPConfig -->|Spawns| Client + Client -->|Registers| Tools - class UI,CLI,Custom interface - class SDK sdk - class LLM,Runtime external + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` +**MCP Configuration Format:** -## Four-Package Architecture +Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): -The agent-sdk is organized into four distinct Python packages: +```yaml +--- +name: repo_skill +mcp_tools: + mcpServers: + filesystem: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] +--- +``` -| Package | What It Does | When You Need It | -|---------|-------------|------------------| -| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | -| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | -| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | -| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | +**Workflow:** +1. **Load Skill:** Parse markdown file with frontmatter +2. **Extract MCP Config:** Read `mcp_tools` field +3. **Spawn MCP Servers:** Create MCP clients for each server +4. **Register Tools:** Add MCP tools to agent's tool registry +5. **Inject Context:** Add skill content to agent prompt -### Two Deployment Modes +## Skill File Format -The SDK supports two deployment architectures depending on your needs: +Skills are defined in markdown files with YAML frontmatter: -#### Mode 1: Local Development +```markdown +--- +name: skill_name +trigger: + type: keyword + keywords: ["pattern1", "pattern2"] +--- -**Installation:** Just install `openhands-sdk` + `openhands-tools` +# Skill Content -```bash -pip install openhands-sdk openhands-tools +This is the instruction text that will be added to the agent's context. ``` -**Architecture:** +**Frontmatter Fields:** + +| Field | Required | Description | +|-------|----------|-------------| +| **name** | Yes | Unique skill identifier | +| **trigger** | Yes* | Activation trigger (`null` for always active) | +| **mcp_tools** | No | MCP server configuration (repo skills only) | +| **inputs** | No | User input metadata (task skills only) | + +*Repository skills use `trigger: null` (or omit trigger field) + +## Component Relationships + +### How Skills Integrate ```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% flowchart LR - SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk - Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools - - SDK -->|uses| Tools - - classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 - classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 -``` - -- `LocalWorkspace` included in SDK (no extra install) -- Everything runs in one process -- Perfect for prototyping and simple use cases -- Quick setup, no Docker required - -#### Mode 2: Production / Sandboxed - -**Installation:** Install all 4 packages - -```bash -pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server -``` - -**Architecture:** - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% -flowchart LR - - WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk - - subgraph WS[" "] - direction LR - Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws - Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws - end - - Server["openhands.agent_server
FastAPI + WebSocket"]:::server - Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk - Tools["openhands.tools
BashTool · FileEditor · …"]:::tools - - WSBase -.->|extended by| Docker - WSBase -.->|extended by| Remote - Docker -->|spawns container with| Server - Remote -->|connects via HTTP to| Server - Server -->|runs| Agent - Agent -->|uses| Tools + Skills["Skill System"] + Context["Agent Context"] + Agent["Agent"] + MCP["MCP Client"] - classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 - classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 - classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 - classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + Skills -->|Injects content| Context + Skills -.->|Spawns tools| MCP + Context -->|System prompt| Agent + MCP -->|Tool| Agent - style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none + style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -- `RemoteWorkspace` auto-spawns agent-server in containers -- Sandboxed execution for security -- Multi-user deployments -- Distributed systems (e.g., Kubernetes) support - - -**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). - - -### SDK Package (`openhands.sdk`) +**Relationship Characteristics:** +- **Skills → Agent Context**: Active skills contribute their content to system prompt +- **Skills → MCP**: Repository skills can spawn MCP servers and register tools +- **Context → Agent**: Combined skill content becomes part of agent's instructions +- **Skills Lifecycle**: Loaded at conversation start, evaluated each step -**Purpose:** Core components and base classes for OpenHands agent. +## See Also -**Key Components:** -- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop -- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle -- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry -- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration -- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) -- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) -- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation -- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management -- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution +- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context +- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management +- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications -**Design:** Stateless, immutable components with type-safe Pydantic models. +### Tool System & MCP +Source: https://docs.openhands.dev/sdk/arch/tool-system.md -**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. +The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. -**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) +**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) -### Tools Package (`openhands.tools`) +## Core Responsibilities +The Tool System has four primary responsibilities: - -**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. - +1. **Type Safety** - Enforce action/observation schemas via Pydantic models +2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas +3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs +4. **Tool Registry** - Discover and resolve tools by name or pattern -**Purpose:** Pre-built tools following consistent patterns. +## Tool System -**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. +### Architecture Overview - -For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. - +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Definition["Tool Definition"] + Action["Action
Input schema"] + Observation["Observation
Output schema"] + Executor["Executor
Business logic"] + end + + subgraph Framework["Tool Framework"] + Base["ToolBase
Abstract base"] + Impl["Tool Implementation
Concrete tool"] + Registry["Tool Registry
Spec → Tool"] + end + Agent["Agent"] + LLM["LLM"] + ToolSpec["Tool Spec
name + params"] -### Workspace Package (`openhands.workspace`) + Base -.->|Extends| Impl + + ToolSpec -->|resolve_tool| Registry + Registry -->|Create instances| Impl + Impl -->|Available in| Agent + Impl -->|Generate schema| LLM + LLM -->|Generate tool call| Agent + Agent -->|Parse & validate| Action + Agent -->|Execute via Tool.\_\_call\_\_| Executor + Executor -->|Return| Observation + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Action,Observation,Executor secondary + class Registry tertiary +``` -**Purpose:** Workspace implementations extending SDK base classes. +### Key Components -**Key Components:** Docker Workspace, Remote API Workspace, and more. +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | +| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | +| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | +| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | +| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | +| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | +| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | +| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | -**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. +### Action-Observation Pattern -**Use Cases:** Sandboxed execution, multi-user deployments, production environments. +The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. - -For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). - +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Agent["Agent Layer"] + ToolCall["MessageToolCall
from LLM"] + ParseJSON["Parse JSON
arguments"] + CreateAction["tool.action_from_arguments()
Pydantic validation"] + WrapAction["ActionEvent
wraps Action"] + WrapObs["ObservationEvent
wraps Observation"] + Error["AgentErrorEvent"] + end + + subgraph ToolSystem["Tool System"] + ActionType["Action
Pydantic model"] + ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] + Execute["ToolExecutor
business logic"] + ObsType["Observation
Pydantic model"] + end + + ToolCall --> ParseJSON + ParseJSON -->|Valid JSON| CreateAction + ParseJSON -->|Invalid JSON| Error + CreateAction -->|Valid| ActionType + CreateAction -->|Invalid| Error + ActionType --> WrapAction + ActionType --> ToolCall2 + ToolCall2 --> Execute + Execute --> ObsType + ObsType --> WrapObs + + style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px + style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px +``` -### Agent Server Package (`openhands.agent_server`) +**Tool System Boundary:** +- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance +- **Output**: `Observation` instance with structured result +- **No knowledge of**: Events, LLM messages, conversation state -**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. +### Tool Definition -**Features:** -- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode -- Service management with isolated per-user sessions -- API key authentication and health checking +Tools are defined using two patterns depending on complexity: -**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). +#### Pattern 1: Direct Instantiation (Simple Tools) -**Use Cases:** Multi-user web apps, SaaS products, distributed systems. +For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): - -For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). - +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
stateless logic"] + Tool["ToolDefinition(...,
executor=Executor())"] + + Action --> Tool + Obs --> Tool + Exec --> Tool + + style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -## How Components Work Together +**Components:** +1. **Action** - Pydantic model with `visualize` property for display +2. **Observation** - Pydantic model with `to_llm_content` property for LLM +3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` +4. **ToolDefinition** - Direct instantiation with executor instance -### Basic Execution Flow (Local) +#### Pattern 2: Subclass with Factory (Stateful Tools) -When you send a message to an agent, here's what happens: +For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): ```mermaid -sequenceDiagram - participant You - participant Conversation - participant Agent - participant LLM - participant Tool +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
with \_\_init\_\_ and state"] + Subclass["class MyTool(ToolDefinition)
with create() method"] + Instance["Return [MyTool(...,
executor=instance)]"] - You->>Conversation: "Create hello.txt" - Conversation->>Agent: Process message - Agent->>LLM: What should I do? - LLM-->>Agent: Use BashTool("touch hello.txt") - Agent->>Tool: Execute action - Note over Tool: Runs in same environment
as Agent (local/container/remote) - Tool-->>Agent: Observation - Agent->>LLM: Got result, continue? - LLM-->>Agent: Done - Agent-->>Conversation: Update state - Conversation-->>You: "File created!" + Action --> Subclass + Obs --> Subclass + Exec --> Subclass + Subclass --> Instance + + style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. - -### Deployment Flexibility - -The same agent code runs in different environments by swapping workspace configuration: +**Components:** +1. **Action/Observation** - Same as Pattern 1 +2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup +3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method +4. **Factory Method** - Returns sequence of configured tool instances ```mermaid -graph TB - subgraph "Your Code (Unchanged)" - Code["Agent + Tools + LLM"] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Pattern1["Pattern 1: Direct Instantiation"] + P1A["Define Action/Observation
with visualize/to_llm_content"] + P1E["Define ToolExecutor
with \_\_call\_\_()"] + P1T["ToolDefinition(...,
executor=Executor())"] end - subgraph "Deployment Options" - Local["Local
Direct execution"] - Docker["Docker
Containerized"] - Remote["Remote
Multi-user server"] + subgraph Pattern2["Pattern 2: Subclass with Factory"] + P2A["Define Action/Observation
with visualize/to_llm_content"] + P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] + P2C["class MyTool(ToolDefinition)
@classmethod create()"] + P2I["Return [MyTool(...,
executor=instance)]"] end - Code -->|LocalWorkspace| Local - Code -->|DockerWorkspace| Docker - Code -->|RemoteAPIWorkspace| Remote + P1A --> P1E + P1E --> P1T - style Code fill:#e1f5fe - style Local fill:#e8f5e8 - style Docker fill:#e8f5e8 - style Remote fill:#e8f5e8 + P2A --> P2E + P2E --> P2C + P2C --> P2I + + style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -## Next Steps - -### Get Started -- [Getting Started](/sdk/getting-started) – Build your first agent -- [Hello World](/sdk/guides/hello-world) – Minimal example - -### Explore Components - -**SDK Package:** -- [Agent](/sdk/arch/agent) – Core reasoning-action loop -- [Conversation](/sdk/arch/conversation) – State management and lifecycle -- [LLM](/sdk/arch/llm) – Language model integration -- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern -- [Events](/sdk/arch/events) – Typed event framework -- [Workspace](/sdk/arch/workspace) – Base workspace architecture - -**Tools Package:** -- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details - -**Workspace Package:** -- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details - -**Agent Server:** -- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details +**Key Design Elements:** -### Deploy -- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely -- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup -- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service -- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server +| Component | Purpose | Requirements | +|-----------|---------|--------------| +| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | +| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | +| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | +| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | -### Source Code -- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework -- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools -- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces -- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples +**When to Use Each Pattern:** +| Pattern | Use Case | Examples | +|---------|----------|----------| +| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | +| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | -# SDK Package -Source: https://docs.openhands.dev/sdk/arch/sdk +### Tool Annotations -The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. +Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: -**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) +| Field | Meaning | Examples | +|-------|---------|----------| +| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | +| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | +| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | +| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | -## Purpose +**Key Behaviors:** +- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` +- Annotations help LLMs reason about tool safety and side effects -The SDK package handles: -- **Agent reasoning loop**: How agents process messages and make decisions -- **State management**: Conversation lifecycle and persistence -- **LLM integration**: Provider-agnostic language model access -- **Tool system**: Typed actions and observations -- **Workspace abstraction**: Where code executes -- **Extensibility**: Skills, condensers, MCP, security +### Tool Registry -## Core Components +The registry enables **dynamic tool discovery** and instantiation from tool specifications: ```mermaid -graph TB - Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + ToolSpec["Tool Spec
name + params"] - Agent --> LLM[LLM
Language Model] - Agent --> Tools[Tool System
Capabilities] - Agent --> Micro[Skills
Behavior Modules] - Agent --> Cond[Condenser
Memory Manager] + subgraph Registry["Tool Registry"] + Resolver["Resolver
name → factory"] + Factory["Factory
create(params)"] + end - Tools --> Workspace[Workspace
Execution] + Instance["Tool Instance
with executor"] + Agent["Agent"] - Conv --> Events[Events
Communication] - Tools --> MCP[MCP
External Tools] - Workspace --> Security[Security
Validation] + ToolSpec -->|"resolve_tool(spec)"| Resolver + Resolver -->|Lookup factory| Factory + Factory -->|"create(**params)"| Instance + Instance -->|Used by| Agent - style Conv fill:#e1f5fe - style Agent fill:#f3e5f5 - style LLM fill:#e8f5e8 - style Tools fill:#fff3e0 - style Workspace fill:#fce4ec + style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -### 1. Conversation - State & Lifecycle - -**What it does**: Manages the entire conversation lifecycle and state. - -**Key responsibilities**: -- Maintains conversation state (immutable) -- Handles message flow between user and agent -- Manages turn-taking and async execution -- Persists and restores conversation state -- Emits events for monitoring +**Resolution Workflow:** -**Design decisions**: -- **Immutable state**: Each operation returns a new Conversation instance -- **Serializable**: Can be saved to disk or database and restored -- **Async-first**: Built for streaming and concurrent execution +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution -**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. +**Registration Types:** -**Example use cases**: -- Saving conversation to database after each turn -- Implementing undo/redo functionality -- Building multi-session chatbots -- Time-travel debugging +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | -**Learn more**: -- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) -- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) -- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) +### File Organization ---- +Tools follow a consistent file structure for maintainability: -### 2. Agent - The Reasoning Loop +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` -**What it does**: The core reasoning engine that processes messages and decides what to do. +**File Responsibilities:** -**Key responsibilities**: -- Receives messages and current state -- Consults LLM to reason about next action -- Validates and executes tool calls -- Processes observations and loops until completion -- Integrates with skills for specialized behavior +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | -**Design decisions**: -- **Stateless**: Agent doesn't hold state, operates on Conversation -- **Extensible**: Behavior can be modified via skills -- **Provider-agnostic**: Works with any LLM through unified interface +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability -**The reasoning loop**: -1. Receive message from Conversation -2. Add message to context -3. Consult LLM with full conversation history -4. If LLM returns tool call → validate and execute tool -5. If tool returns observation → add to context, go to step 3 -6. If LLM returns response → done, return to user +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation -**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. -**Example use cases**: -- Planning agents that break tasks into steps -- Code review agents with specific checks -- Agents with domain-specific reasoning patterns +## MCP Integration -**Learn more**: -- Guide: [Custom Agents](/sdk/guides/agent-custom) -- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) -- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) +The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. ---- +**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) -### 3. LLM - Language Model Integration - -**What it does**: Provides a provider-agnostic interface to language models. - -**Key responsibilities**: -- Abstracts different LLM providers (OpenAI, Anthropic, etc.) -- Handles message formatting and conversion -- Manages streaming responses -- Supports tool calling and reasoning modes -- Handles retries and error recovery - -**Design decisions**: -- **Provider-agnostic**: Same API works with any provider -- **Streaming-first**: Built for real-time responses -- **Type-safe**: Pydantic models for all messages -- **Extensible**: Easy to add new providers - -**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: -- Cost optimization (switch to cheaper models) -- Testing with different models -- Avoiding vendor lock-in -- Supporting customer choice - -**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. - -**Example use cases**: -- Routing requests to different models based on complexity -- Implementing custom caching strategies -- Adding observability hooks - -**Learn more**: -- Guide: [LLM Registry](/sdk/guides/llm-registry) -- Guide: [LLM Routing](/sdk/guides/llm-routing) -- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) -- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) - ---- - -### 4. Tool System - Typed Capabilities - -**What it does**: Defines what agents can do through a typed action/observation pattern. - -**Key responsibilities**: -- Defines tool schemas (inputs and outputs) -- Validates actions before execution -- Executes tools and returns typed observations -- Generates JSON schemas for LLM tool calling -- Registers tools with the agent - -**Design decisions**: -- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs -- **Schema generation**: Pydantic models auto-generate JSON schemas -- **Executor pattern**: Separation of tool definition and execution -- **Composable**: Tools can call other tools - -**The three components**: -1. **Action**: Input schema (what the tool accepts) -2. **Observation**: Output schema (what the tool returns) -3. **ToolExecutor**: Logic that transforms Action → Observation - -**Why this pattern?** -- Type safety catches errors early -- LLMs get accurate schemas for tool calling -- Tools are testable in isolation -- Easy to compose tools - -**When to customize**: When you need domain-specific capabilities not covered by built-in tools. - -**Example use cases**: -- Database query tools -- API integration tools -- Custom file format parsers -- Domain-specific calculators - -**Learn more**: -- Guide: [Custom Tools](/sdk/guides/custom-tools) -- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) - ---- - -### 5. Workspace - Execution Abstraction - -**What it does**: Abstracts *where* code executes (local, Docker, remote). - -**Key responsibilities**: -- Provides unified interface for code execution -- Handles file operations across environments -- Manages working directories -- Supports different isolation levels - -**Design decisions**: -- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package -- **Environment-agnostic**: Code works the same locally or remotely -- **Lazy initialization**: Workspace setup happens on first use - -**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. - -**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. - -**Learn more**: -- Architecture: [Workspace Architecture](/sdk/arch/workspace) -- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) -- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) - ---- - -### 6. Events - Component Communication - -**What it does**: Enables observability and debugging through event emissions. - -**Key responsibilities**: -- Defines event types (messages, actions, observations, errors) -- Emitted by Conversation, Agent, Tools -- Enables logging, debugging, and monitoring -- Supports custom event handlers - -**Design decisions**: -- **Immutable**: Events are snapshots, not mutable objects -- **Serializable**: Can be logged, stored, replayed -- **Type-safe**: Pydantic models for all events - -**Why events?** They provide a timeline of what happened during agent execution. Essential for: -- Debugging agent behavior -- Understanding decision-making -- Building observability dashboards -- Implementing custom logging - -**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. - -**Learn more**: -- Guide: [Metrics and Observability](/sdk/guides/metrics) -- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) - ---- - -### 7. Condenser - Memory Management - -**What it does**: Compresses conversation history when it gets too long. - -**Key responsibilities**: -- Monitors conversation length -- Summarizes older messages -- Preserves important context -- Keeps conversation within token limits - -**Design decisions**: -- **Pluggable**: Different condensing strategies -- **Automatic**: Triggered when context gets large -- **Preserves semantics**: Important information retained - -**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. - -**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. - -**Example strategies**: -- Summarize old messages -- Keep only last N turns -- Preserve task-related messages - -**Learn more**: -- Guide: [Context Condenser](/sdk/guides/context-condenser) -- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) - ---- - -### 8. MCP - Model Context Protocol - -**What it does**: Integrates external tool servers via Model Context Protocol. - -**Key responsibilities**: -- Connects to MCP-compatible tool servers -- Translates MCP tools to SDK tool format -- Manages server lifecycle -- Handles server communication - -**Design decisions**: -- **Standard protocol**: Uses MCP specification -- **Transparent integration**: MCP tools look like regular tools to agents -- **Process management**: Handles server startup/shutdown - -**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. - -**When to use**: When you need tools that: -- Already have MCP servers (fetch, filesystem, etc.) -- Are too complex to rewrite as SDK tools -- Need to run in separate processes -- Are provided by third parties - -**Learn more**: -- Guide: [MCP Integration](/sdk/guides/mcp) -- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) -- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) - ---- - -### 9. Skills (formerly Microagents) - Behavior Modules - -**What it does**: Specialized modules that modify agent behavior for specific tasks. - -**Key responsibilities**: -- Provide domain-specific instructions -- Modify system prompts -- Guide agent decision-making -- Compose to create specialized agents - -**Design decisions**: -- **Composable**: Multiple skills can work together -- **Declarative**: Defined as configuration, not code -- **Reusable**: Share skills across agents - -**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. - -**Example skills**: -- GitHub operations (issue creation, PRs) -- Code review guidelines -- Documentation style enforcement -- Project-specific conventions - -**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. - -**Learn more**: -- Guide: [Agent Skills & Context](/sdk/guides/skill) -- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) - ---- - -### 10. Security - Validation & Sandboxing - -**What it does**: Validates inputs and enforces security constraints. - -**Key responsibilities**: -- Input validation -- Command sanitization -- Path traversal prevention -- Resource limits - -**Design decisions**: -- **Defense in depth**: Multiple validation layers -- **Fail-safe**: Rejects suspicious inputs by default -- **Configurable**: Adjust security levels as needed - -**Why needed?** Agents execute arbitrary code and file operations. Security prevents: -- Malicious prompts escaping sandboxes -- Path traversal attacks -- Resource exhaustion -- Unintended system access - -**When to customize**: When you need domain-specific validation rules or want to adjust security policies. - -**Learn more**: -- Guide: [Security and Secrets](/sdk/guides/security) -- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) - ---- - -## How Components Work Together - -### Example: User asks agent to create a file - -``` -1. User → Conversation: "Create a file called hello.txt with 'Hello World'" - -2. Conversation → Agent: New message event - -3. Agent → LLM: Full conversation history + available tools - -4. LLM → Agent: Tool call for FileEditorTool.create() - -5. Agent → Tool System: Validate FileEditorAction - -6. Tool System → Tool Executor: Execute action - -7. Tool Executor → Workspace: Create file (local/docker/remote) - -8. Workspace → Tool Executor: Success - -9. Tool Executor → Tool System: FileEditorObservation (success=true) - -10. Tool System → Agent: Observation - -11. Agent → LLM: Updated history with observation - -12. LLM → Agent: "File created successfully" - -13. Agent → Conversation: Done, final response - -14. Conversation → User: "File created successfully" -``` - -Throughout this flow: -- **Events** are emitted for observability -- **Condenser** may trigger if history gets long -- **Skills** influence LLM's decision-making -- **Security** validates file paths and operations -- **MCP** could provide additional tools if configured - -## Design Patterns - -### Immutability - -All core objects are immutable. Operations return new instances: - -```python -conversation = Conversation(...) -new_conversation = conversation.add_message(message) -# conversation is unchanged, new_conversation has the message -``` - -**Why?** Makes debugging easier, enables time-travel, ensures serializability. - -### Composition Over Inheritance - -Agents are composed from: -- LLM provider -- Tool list -- Skill list -- Condenser strategy -- Security policy - -You don't subclass Agent - you configure it. - -**Why?** More flexible, easier to test, enables runtime configuration. - -### Type Safety - -Everything uses Pydantic models: -- Messages, actions, observations are typed -- Validation happens automatically -- Schemas generate from types - -**Why?** Catches errors early, provides IDE support, self-documenting. - -## Next Steps - -### For Usage Examples - -- [Getting Started](/sdk/getting-started) - Build your first agent -- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities -- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers -- [Conversation Management](/sdk/guides/convo-persistence) - State handling - -### For Related Architecture - -- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations -- [Workspace Architecture](/sdk/arch/workspace) - Execution environments -- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution - -### For Implementation Details - -- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code -- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code -- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples - - -# Security -Source: https://docs.openhands.dev/sdk/arch/security - -The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. - -**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) - -## Core Responsibilities - -The Security system has four primary responsibilities: - -1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions -2. **Confirmation Policy** - Determine when user approval is required based on risk -3. **Action Validation** - Enforce security policies before execution -4. **Audit Trail** - Record security decisions in event history - -## Architecture - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph Interface["Abstract Interface"] - Base["SecurityAnalyzerBase
Abstract analyzer"] - end - - subgraph Implementations["Concrete Analyzers"] - LLM["LLMSecurityAnalyzer
Inline risk prediction"] - NoOp["NoOpSecurityAnalyzer
No analysis"] - end - - subgraph Risk["Risk Levels"] - Low["LOW
Safe operations"] - Medium["MEDIUM
Moderate risk"] - High["HIGH
Dangerous ops"] - Unknown["UNKNOWN
Unanalyzed"] - end - - subgraph Policy["Confirmation Policy"] - Check["should_require_confirmation()"] - Mode["Confirmation Mode"] - Decision["Require / Allow"] - end - - Base --> LLM - Base --> NoOp - - Implementations --> Low - Implementations --> Medium - Implementations --> High - Implementations --> Unknown - - Low --> Check - Medium --> Check - High --> Check - Unknown --> Check - - Check --> Mode - Mode --> Decision - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - - class Base primary - class LLM secondary - class High danger - class Check tertiary -``` - -### Key Components - -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | -| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | -| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | -| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | -| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | - -## Risk Levels - -Security analyzers return one of four risk levels: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - Action["ActionEvent"] - Analyze["Security Analyzer"] - - subgraph Levels["Risk Levels"] - Low["LOW
Read-only, safe"] - Medium["MEDIUM
Modify files"] - High["HIGH
Delete, execute"] - Unknown["UNKNOWN
Not analyzed"] - end - - Action --> Analyze - Analyze --> Low - Analyze --> Medium - Analyze --> High - Analyze --> Unknown - - style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px - style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px - style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px -``` - -### Risk Level Definitions - -| Level | Characteristics | Examples | -|-------|----------------|----------| -| **LOW** | Read-only, no state changes | File reading, directory listing, search | -| **MEDIUM** | Modifies user data | File editing, creating files, API calls | -| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | -| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | - -## Security Analyzers - -### LLMSecurityAnalyzer - -Leverages the LLM's inline risk assessment during action generation: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Schema["Tool Schema
+ security_risk param"] - LLM["LLM generates action
with security_risk"] - ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] - Extract["Extract security_risk
from arguments"] - ActionEvent["ActionEvent
with security_risk set"] - Analyzer["LLMSecurityAnalyzer
returns security_risk"] - - Schema --> LLM - LLM --> ToolCall - ToolCall --> Extract - Extract --> ActionEvent - ActionEvent --> Analyzer - - style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Analysis Process:** - -1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema -2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments -3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments -4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` -5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level -6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step - -**Example Tool Call:** -```json -{ - "name": "execute_bash", - "arguments": { - "command": "rm -rf /tmp/cache", - "security_risk": "HIGH" - } -} -``` - -The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. - -**Configuration:** -- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent -- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools -- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation - -### NoOpSecurityAnalyzer - -Passthrough analyzer that skips analysis: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Action["ActionEvent"] - NoOp["NoOpSecurityAnalyzer"] - Unknown["SecurityRisk.UNKNOWN"] - - Action --> NoOp --> Unknown - - style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` - -**Use Case:** Development, trusted environments, or when confirmation mode handles all actions - -## Confirmation Policy - -The confirmation policy determines when user approval is required. There are three policy implementations: - -**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) - -### Policy Types - -| Policy | Behavior | Use Case | -|--------|----------|----------| -| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | -| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | -| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | - -### ConfirmRisky (Default Policy) - -The most flexible policy with configurable thresholds: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Risk["SecurityRisk"] - CheckUnknown{"Risk ==
UNKNOWN?"} - UseConfirmUnknown{"confirm_unknown
setting?"} - CheckThreshold{"risk.is_riskier
(threshold)?"} - - Confirm["Require Confirmation"] - Allow["Allow Execution"] - - Risk --> CheckUnknown - CheckUnknown -->|Yes| UseConfirmUnknown - CheckUnknown -->|No| CheckThreshold - - UseConfirmUnknown -->|True| Confirm - UseConfirmUnknown -->|False| Allow - - CheckThreshold -->|Yes| Confirm - CheckThreshold -->|No| Allow - - style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px -``` - -**Configuration:** -- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required - - Cannot be set to `UNKNOWN` - - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` -- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation - -### Confirmation Rules by Policy - -#### ConfirmRisky with threshold=HIGH (Default) - -| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | -|------------|----------------------------------|-------------------------| -| **LOW** | ✅ Allow | ✅ Allow | -| **MEDIUM** | ✅ Allow | ✅ Allow | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | - -#### ConfirmRisky with threshold=MEDIUM - -| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | -|------------|------------------------|-------------------------| -| **LOW** | ✅ Allow | ✅ Allow | -| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | - -#### ConfirmRisky with threshold=LOW - -| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | -|------------|------------------------|-------------------------| -| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | -| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | - -**Key Rules:** -- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` -- **UNKNOWN handling** is configurable via `confirm_unknown` flag -- **Threshold cannot be UNKNOWN** - validated at policy creation time - - -## Component Relationships - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Security["Security Analyzer"] - Agent["Agent"] - Conversation["Conversation"] - Tools["Tools"] - MCP["MCP Tools"] - - Agent -->|Validates actions| Security - Security -->|Checks| Tools - Security -->|Uses hints| MCP - Conversation -->|Pauses for confirmation| Agent - - style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Relationship Characteristics:** -- **Agent → Security**: Validates actions before execution -- **Security → Tools**: Examines tool characteristics (annotations) -- **Security → MCP**: Uses MCP hints for risk assessment -- **Conversation → Agent**: Pauses for user confirmation when required -- **Optional Component**: Security analyzer can be disabled for trusted environments - -## See Also - -- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers -- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints -- **[Security Guide](/sdk/guides/security)** - Configuring security policies - - -# Skill -Source: https://docs.openhands.dev/sdk/arch/skill - -The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. - -**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) - -## Core Responsibilities - -The Skill system has four primary responsibilities: - -1. **Context Injection** - Add specialized prompts to agent context based on triggers -2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) -3. **MCP Integration** - Load MCP tools associated with repository skills -4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats - -## Architecture - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% -flowchart TB - subgraph Types["Skill Types"] - Repo["Repository Skill
trigger: None"] - Knowledge["Knowledge Skill
trigger: KeywordTrigger"] - Task["Task Skill
trigger: TaskTrigger"] - end - - subgraph Triggers["Trigger Evaluation"] - Always["Always Active
Repository guidelines"] - Keyword["Keyword Match
String matching on user messages"] - TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] - end - - subgraph Content["Skill Content"] - Markdown["Markdown with Frontmatter"] - MCPTools["MCP Tools Config
Repo skills only"] - Inputs["Input Metadata
Task skills only"] - end - - subgraph Integration["Agent Integration"] - Context["Agent Context"] - Prompt["System Prompt"] - end - - Repo --> Always - Knowledge --> Keyword - Task --> TaskMatch - - Always --> Markdown - Keyword --> Markdown - TaskMatch --> Markdown - - Repo -.->|Optional| MCPTools - Task -.->|Requires| Inputs - - Markdown --> Context - MCPTools --> Context - Context --> Prompt - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Repo,Knowledge,Task primary - class Always,Keyword,TaskMatch secondary - class Context tertiary -``` - -### Key Components - -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | -| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | -| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | -| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | -| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | - -## Skill Types - -### Repository Skills - -Always-active, repository-specific guidelines. - -**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - File["AGENTS.md"] - Parse["Parse Frontmatter"] - Skill["Skill(trigger=None)"] - Context["Always in Context"] - - File --> Parse - Parse --> Skill - Skill --> Context - - style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Characteristics:** -- **Trigger:** `None` (always active) -- **Purpose:** Project conventions, coding standards, architecture rules -- **MCP Tools:** Can include MCP tool configuration -- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) - -**Example Files (permanent context):** -- `AGENTS.md` - General agent instructions -- `GEMINI.md` - Gemini-specific instructions -- `CLAUDE.md` - Claude-specific instructions - -**Other supported formats:** -- `.cursorrules` - Cursor IDE guidelines -- `agents.md` / `agent.md` - General agent instructions - -### Knowledge Skills - -Keyword-triggered skills for specialized domains: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - User["User Message"] - Check["Check Keywords"] - Match{"Match?"} - Activate["Activate Skill"] - Skip["Skip Skill"] - Context["Add to Context"] - - User --> Check - Check --> Match - Match -->|Yes| Activate - Match -->|No| Skip - Activate --> Context - - style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Characteristics:** -- **Trigger:** `KeywordTrigger` with regex patterns -- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") -- **Activation:** Keywords detected in user messages -- **Location:** System or user-defined knowledge base - -**Trigger Example:** -```yaml ---- -name: kubernetes -trigger: - type: keyword - keywords: ["kubernetes", "k8s", "kubectl"] ---- -``` - -### Task Skills - -Keyword-triggered skills with structured inputs for guided workflows: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - User["User Message"] - Match{"Keyword
Match?"} - Inputs["Collect User Inputs"] - Template["Apply Template"] - Context["Add to Context"] - Skip["Skip Skill"] - - User --> Match - Match -->|Yes| Inputs - Match -->|No| Skip - Inputs --> Template - Template --> Context - - style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Characteristics:** -- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) -- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) -- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) -- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) -- **Location:** System-defined or custom task templates - -**Trigger Example:** -```yaml ---- -name: bug_fix -triggers: ["/bug_fix", "fix bug", "bug report"] -inputs: - - name: bug_description - description: "Describe the bug" - required: true ---- -``` - -**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. - -## Trigger Evaluation - -Skills are evaluated at different points in the agent lifecycle: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Start["Agent Step Start"] - - Repo["Check Repository Skills
trigger: None"] - AddRepo["Always Add to Context"] - - Message["Check User Message"] - Keyword["Match Keyword Triggers"] - AddKeyword["Add Matched Skills"] - - TaskType["Check Task Type"] - TaskMatch["Match Task Triggers"] - AddTask["Add Task Skill"] - - Build["Build Agent Context"] - - Start --> Repo - Repo --> AddRepo - - Start --> Message - Message --> Keyword - Keyword --> AddKeyword - - Start --> TaskType - TaskType --> TaskMatch - TaskMatch --> AddTask - - AddRepo --> Build - AddKeyword --> Build - AddTask --> Build - - style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Evaluation Rules:** - -| Trigger Type | Evaluation Point | Activation Condition | -|--------------|------------------|----------------------| -| **None** | Every step | Always active | -| **KeywordTrigger** | On user message | Keyword/string match in message | -| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | - -**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. - -## MCP Tool Integration - -Repository skills can include MCP tool configurations: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Skill["Repository Skill"] - MCPConfig["mcp_tools Config"] - Client["MCP Client"] - Tools["Tool Registry"] - - Skill -->|Contains| MCPConfig - MCPConfig -->|Spawns| Client - Client -->|Registers| Tools - - style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**MCP Configuration Format:** - -Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): - -```yaml ---- -name: repo_skill -mcp_tools: - mcpServers: - filesystem: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] ---- -``` - -**Workflow:** -1. **Load Skill:** Parse markdown file with frontmatter -2. **Extract MCP Config:** Read `mcp_tools` field -3. **Spawn MCP Servers:** Create MCP clients for each server -4. **Register Tools:** Add MCP tools to agent's tool registry -5. **Inject Context:** Add skill content to agent prompt - -## Skill File Format - -Skills are defined in markdown files with YAML frontmatter: - -```markdown ---- -name: skill_name -trigger: - type: keyword - keywords: ["pattern1", "pattern2"] ---- - -# Skill Content - -This is the instruction text that will be added to the agent's context. -``` - -**Frontmatter Fields:** - -| Field | Required | Description | -|-------|----------|-------------| -| **name** | Yes | Unique skill identifier | -| **trigger** | Yes* | Activation trigger (`null` for always active) | -| **mcp_tools** | No | MCP server configuration (repo skills only) | -| **inputs** | No | User input metadata (task skills only) | - -*Repository skills use `trigger: null` (or omit trigger field) - -## Component Relationships - -### How Skills Integrate - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Skills["Skill System"] - Context["Agent Context"] - Agent["Agent"] - MCP["MCP Client"] - - Skills -->|Injects content| Context - Skills -.->|Spawns tools| MCP - Context -->|System prompt| Agent - MCP -->|Tool| Agent - - style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Relationship Characteristics:** -- **Skills → Agent Context**: Active skills contribute their content to system prompt -- **Skills → MCP**: Repository skills can spawn MCP servers and register tools -- **Context → Agent**: Combined skill content becomes part of agent's instructions -- **Skills Lifecycle**: Loaded at conversation start, evaluated each step - -## See Also - -- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context -- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management -- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications - - -# Tool System & MCP -Source: https://docs.openhands.dev/sdk/arch/tool-system - -The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. - -**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) - -## Core Responsibilities - -The Tool System has four primary responsibilities: - -1. **Type Safety** - Enforce action/observation schemas via Pydantic models -2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas -3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs -4. **Tool Registry** - Discover and resolve tools by name or pattern - -## Tool System - -### Architecture Overview - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph Definition["Tool Definition"] - Action["Action
Input schema"] - Observation["Observation
Output schema"] - Executor["Executor
Business logic"] - end - - subgraph Framework["Tool Framework"] - Base["ToolBase
Abstract base"] - Impl["Tool Implementation
Concrete tool"] - Registry["Tool Registry
Spec → Tool"] - end - - Agent["Agent"] - LLM["LLM"] - ToolSpec["Tool Spec
name + params"] - - Base -.->|Extends| Impl - - ToolSpec -->|resolve_tool| Registry - Registry -->|Create instances| Impl - Impl -->|Available in| Agent - Impl -->|Generate schema| LLM - LLM -->|Generate tool call| Agent - Agent -->|Parse & validate| Action - Agent -->|Execute via Tool.\_\_call\_\_| Executor - Executor -->|Return| Observation - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Base primary - class Action,Observation,Executor secondary - class Registry tertiary -``` - -### Key Components - -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | -| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | -| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | -| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | -| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | -| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | -| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | -| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | - -### Action-Observation Pattern - -The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Agent["Agent Layer"] - ToolCall["MessageToolCall
from LLM"] - ParseJSON["Parse JSON
arguments"] - CreateAction["tool.action_from_arguments()
Pydantic validation"] - WrapAction["ActionEvent
wraps Action"] - WrapObs["ObservationEvent
wraps Observation"] - Error["AgentErrorEvent"] - end - - subgraph ToolSystem["Tool System"] - ActionType["Action
Pydantic model"] - ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] - Execute["ToolExecutor
business logic"] - ObsType["Observation
Pydantic model"] - end - - ToolCall --> ParseJSON - ParseJSON -->|Valid JSON| CreateAction - ParseJSON -->|Invalid JSON| Error - CreateAction -->|Valid| ActionType - CreateAction -->|Invalid| Error - ActionType --> WrapAction - ActionType --> ToolCall2 - ToolCall2 --> Execute - Execute --> ObsType - ObsType --> WrapObs - - style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px - style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px -``` - -**Tool System Boundary:** -- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance -- **Output**: `Observation` instance with structured result -- **No knowledge of**: Events, LLM messages, conversation state - -### Tool Definition - -Tools are defined using two patterns depending on complexity: - -#### Pattern 1: Direct Instantiation (Simple Tools) - -For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% -flowchart LR - Action["Define Action
with visualize"] - Obs["Define Observation
with to_llm_content"] - Exec["Define Executor
stateless logic"] - Tool["ToolDefinition(...,
executor=Executor())"] - - Action --> Tool - Obs --> Tool - Exec --> Tool - - style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` - -**Components:** -1. **Action** - Pydantic model with `visualize` property for display -2. **Observation** - Pydantic model with `to_llm_content` property for LLM -3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` -4. **ToolDefinition** - Direct instantiation with executor instance - -#### Pattern 2: Subclass with Factory (Stateful Tools) - -For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% -flowchart LR - Action["Define Action
with visualize"] - Obs["Define Observation
with to_llm_content"] - Exec["Define Executor
with \_\_init\_\_ and state"] - Subclass["class MyTool(ToolDefinition)
with create() method"] - Instance["Return [MyTool(...,
executor=instance)]"] - - Action --> Subclass - Obs --> Subclass - Exec --> Subclass - Subclass --> Instance - - style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Components:** -1. **Action/Observation** - Same as Pattern 1 -2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup -3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method -4. **Factory Method** - Returns sequence of configured tool instances - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - subgraph Pattern1["Pattern 1: Direct Instantiation"] - P1A["Define Action/Observation
with visualize/to_llm_content"] - P1E["Define ToolExecutor
with \_\_call\_\_()"] - P1T["ToolDefinition(...,
executor=Executor())"] - end - - subgraph Pattern2["Pattern 2: Subclass with Factory"] - P2A["Define Action/Observation
with visualize/to_llm_content"] - P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] - P2C["class MyTool(ToolDefinition)
@classmethod create()"] - P2I["Return [MyTool(...,
executor=instance)]"] - end - - P1A --> P1E - P1E --> P1T - - P2A --> P2E - P2E --> P2C - P2C --> P2I - - style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Key Design Elements:** - -| Component | Purpose | Requirements | -|-----------|---------|--------------| -| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | -| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | -| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | -| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | - -**When to Use Each Pattern:** - -| Pattern | Use Case | Examples | -|---------|----------|----------| -| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | -| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | - -### Tool Annotations - -Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: - -| Field | Meaning | Examples | -|-------|---------|----------| -| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | -| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | -| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | -| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | - -**Key Behaviors:** -- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` -- Annotations help LLMs reason about tool safety and side effects - -### Tool Registry - -The registry enables **dynamic tool discovery** and instantiation from tool specifications: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - ToolSpec["Tool Spec
name + params"] - - subgraph Registry["Tool Registry"] - Resolver["Resolver
name → factory"] - Factory["Factory
create(params)"] - end - - Instance["Tool Instance
with executor"] - Agent["Agent"] - - ToolSpec -->|"resolve_tool(spec)"| Resolver - Resolver -->|Lookup factory| Factory - Factory -->|"create(**params)"| Instance - Instance -->|Used by| Agent - - style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Resolution Workflow:** - -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) -2. **Resolver Lookup** - Registry finds the registered resolver for the tool name -3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -4. **Instance Creation** - Tool instance(s) are created with configured executors -5. **Agent Usage** - Instances are added to the agent's tools_map for execution - -**Registration Types:** - -| Type | Registration | Resolver Behavior | -|------|-------------|-------------------| -| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | -| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | -| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | - -### File Organization - -Tools follow a consistent file structure for maintainability: - -``` -openhands-tools/openhands/tools/my_tool/ -├── __init__.py # Export MyTool -├── definition.py # Action, Observation, MyTool(ToolDefinition) -├── impl.py # MyExecutor(ToolExecutor) -└── [other modules] # Tool-specific utilities -``` - -**File Responsibilities:** - -| File | Contains | Purpose | -|------|----------|---------| -| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | -| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | -| `__init__.py` | Tool exports | Package interface | - -**Benefits:** -- **Separation of Concerns** - Public API separate from implementation -- **Avoid Circular Imports** - Import `impl` only inside `create()` method -- **Consistency** - All tools follow same structure for discoverability - -**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation - - -## MCP Integration - -The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. - -**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) - -### Architecture Overview +### Architecture Overview ```mermaid %%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% @@ -19697,9 +18668,8 @@ flowchart TB - **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools - **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library - -# Workspace -Source: https://docs.openhands.dev/sdk/arch/workspace +### Workspace +Source: https://docs.openhands.dev/sdk/arch/workspace.md The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. @@ -19904,9 +18874,8 @@ flowchart LR - **[Agent Server](/sdk/arch/agent-server)** - Remote execution API - **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution - -# FAQ -Source: https://docs.openhands.dev/sdk/faq +### FAQ +Source: https://docs.openhands.dev/sdk/faq.md ## How do I use AWS Bedrock with the SDK? @@ -20180,9 +19149,8 @@ If you have additional questions: - **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community - **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion - -# Getting Started -Source: https://docs.openhands.dev/sdk/getting-started +### Getting Started +Source: https://docs.openhands.dev/sdk/getting-started.md The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. @@ -20395,9 +19363,8 @@ ls examples/01_standalone_sdk/ - **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features - **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples - -# Browser Use -Source: https://docs.openhands.dev/sdk/guides/agent-browser-use +### Browser Use +Source: https://docs.openhands.dev/sdk/guides/agent-browser-use.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -20517,9 +19484,8 @@ for i, message in enumerate(llm_messages): - **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools - **[MCP Integration](/sdk/guides/mcp)** - Connect external services - -# Creating Custom Agent -Source: https://docs.openhands.dev/sdk/guides/agent-custom +### Creating Custom Agent +Source: https://docs.openhands.dev/sdk/guides/agent-custom.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -20741,9 +19707,8 @@ For a complete implementation example showing all these components working toget - **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management - **[MCP Integration](/sdk/guides/mcp)** - Add MCP - -# Sub-Agent Delegation -Source: https://docs.openhands.dev/sdk/guides/agent-delegation +### Sub-Agent Delegation +Source: https://docs.openhands.dev/sdk/guides/agent-delegation.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -21102,9 +20067,8 @@ print(f"EXAMPLE_COST: {cost_1 + cost_2}") - -# Interactive Terminal -Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal +### Interactive Terminal +Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -21216,9 +20180,8 @@ for i, message in enumerate(llm_messages): - **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases - -# API-based Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox +### API-based Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md > A ready-to-run example is available [here](#ready-to-run-example)! @@ -21424,9 +20387,8 @@ uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server - **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details - **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture - -# Apptainer Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox +### Apptainer Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -21692,9 +20654,8 @@ Apptainer should work without root. If you see permission errors: - **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing - **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution - -# OpenHands Cloud Workspace -Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace +### OpenHands Cloud Workspace +Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md > A ready-to-run example is available [here](#ready-to-run-example)! @@ -21906,9 +20867,8 @@ uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py - **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers - **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details - -# Custom Tools with Remote Agent Server -Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools +### Custom Tools with Remote Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md > A ready-to-run example is available [here](#ready-to-run-example)! @@ -22416,9 +21376,8 @@ uv run python custom_tool_example.py - **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server - **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers - -# Docker Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox +### Docker Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -23068,9 +22027,8 @@ with DockerWorkspace( - **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service - **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture - -# Local Agent Server -Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server +### Local Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -23433,9 +22391,8 @@ with ManagedAPIServer(port=8001) as server: - **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details - **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture - -# Overview -Source: https://docs.openhands.dev/sdk/guides/agent-server/overview +### Overview +Source: https://docs.openhands.dev/sdk/guides/agent-server/overview.md Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. @@ -23597,9 +22554,8 @@ Explore different deployment options: For architectural details: - **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment - -# Stuck Detector -Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector +### Stuck Detector +Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -23728,9 +22684,8 @@ print(f"EXAMPLE_COST: {cost}") - **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control - **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK - -# Theory of Mind (TOM) Agent -Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent +### Theory of Mind (TOM) Agent +Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -24040,9 +22995,8 @@ conversation.send_message("Make it better") - **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively - **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights - -# Browser Session Recording -Source: https://docs.openhands.dev/sdk/guides/browser-session-recording +### Browser Session Recording +Source: https://docs.openhands.dev/sdk/guides/browser-session-recording.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -24258,9 +23212,8 @@ print(f"EXAMPLE_COST: {cost}") - -# Context Condenser -Source: https://docs.openhands.dev/sdk/guides/context-condenser +### Context Condenser +Source: https://docs.openhands.dev/sdk/guides/context-condenser.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -24493,9 +23446,8 @@ print(f"EXAMPLE_COST: {cost}") - **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings - -# Ask Agent Questions -Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent +### Ask Agent Questions +Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -24721,9 +23673,8 @@ print(f"EXAMPLE_COST: {cost:.4f}") - **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow - **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress - -# Conversation with Async -Source: https://docs.openhands.dev/sdk/guides/convo-async +### Conversation with Async +Source: https://docs.openhands.dev/sdk/guides/convo-async.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -24865,9 +23816,8 @@ if __name__ == "__main__": - **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state - **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents - -# Custom Visualizer -Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer +### Custom Visualizer +Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; @@ -25026,102 +23976,643 @@ from openhands.tools.preset.default import get_default_agent class MinimalVisualizer(ConversationVisualizerBase): """A minimal visualizer that print the raw events as they occur.""" - def on_event(self, event: Event) -> None: - """Handle events for minimal progress visualization.""" - print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="agent", +) +agent = get_default_agent(llm=llm, cli_mode=True) + +# ============================================================================ +# Configure Visualization +# ============================================================================ +# Set logging level to reduce verbosity +logging.getLogger().setLevel(logging.WARNING) + +# Start a conversation with custom visualizer +cwd = os.getcwd() +conversation = Conversation( + agent=agent, + workspace=cwd, + visualizer=MinimalVisualizer(), +) + +# Send a message and let the agent run +print("Sending task to agent...") +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("Task completed!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + +## Next Steps + +Now that you understand custom visualizers, explore these related topics: + +- **[Events](/sdk/arch/events)** - Learn more about different event types +- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic + +### Pause and Resume +Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Pausing Execution + +Pause the agent from another thread or after a delay using `conversation.pause()`, and +Resume the paused conversation after performing operations by calling `conversation.run()` again. + +```python icon="python" focus={9, 15} wrap +import time +thread = threading.Thread(target=conversation.run) +thread.start() + +print("Letting agent work for 5 seconds...") +time.sleep(5) + +print("Pausing the agent...") +conversation.pause() + +print("Waiting for 5 seconds...") +time.sleep(5) + +print("Resuming the execution...") +conversation.run() +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + + +Pause agent execution mid-task by calling `conversation.pause()`: + +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) + +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() + +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) + +print(f"Initial status: {conversation.state.execution_status}") +print() + +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() + +# Let the agent work for a few seconds +print("Letting agent work for 2 seconds...") +time.sleep(2) + +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() + +# Wait for the thread to finish (it will stop when paused) +thread.join() + +print(f"Agent status after pause: {conversation.state.execution_status}") +print() + +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() + +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.execution_status}") + +# Resume execution +conversation.run() + +print(f"Final status: {conversation.state.execution_status}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + +### Persistence +Source: https://docs.openhands.dev/sdk/guides/convo-persistence.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## How to use Persistence + +Save conversation state to disk and restore it later for long-running or multi-session workflows. + +### Saving State + +Create a conversation with a unique ID to enable persistence: + +```python focus={3-4,10-11} icon="python" wrap +import uuid + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` + +### Restoring State + +Restore a conversation using the same ID and persistence directory: + +```python focus={9-10} icon="python" +# Later, in a different session +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +conversation.send_message("Continue task") +conversation.run() # Continues from saved state +``` + +## What Gets Persisted + +The conversation state includes information that allows seamless restoration: + +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys +- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) + + + For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. + + +## Persistence Directory Structure + +When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each +conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` +(unless you specify a custom path). + +**Directory structure:** + + + + + + + + + + + + + + + + + + + + + +Each conversation directory contains: +- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata +- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) + +The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + + +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Reading serialized events + +Convert persisted events into LLM-ready messages for reuse or analysis. + + +This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) + + +```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py +"""Load persisted events and convert them into LLM-ready messages.""" + +import json +import os +import uuid +from pathlib import Path + +from pydantic import SecretStr + + +conversation_id = uuid.uuid4() +persistence_root = Path(".conversations") +log_dir = ( + persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +) + +os.environ.setdefault("LOG_JSON", "true") +os.environ.setdefault("LOG_TO_FILE", "true") +os.environ.setdefault("LOG_DIR", str(log_dir)) +os.environ.setdefault("LOG_LEVEL", "INFO") + +from openhands.sdk import ( # noqa: E402 + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + Tool, +) +from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 +from openhands.tools.terminal import TerminalTool # noqa: E402 + +setup_logging(log_to_file=True, log_dir=str(log_dir)) +logger = get_logger(__name__) api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +if not api_key: + raise RuntimeError("LLM_API_KEY environment variable is not set.") + llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), ) -agent = get_default_agent(llm=llm, cli_mode=True) -# ============================================================================ -# Configure Visualization -# ============================================================================ -# Set logging level to reduce verbosity -logging.getLogger().setLevel(logging.WARNING) +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +###### +# Create a conversation that persists its events +###### -# Start a conversation with custom visualizer -cwd = os.getcwd() conversation = Conversation( agent=agent, - workspace=cwd, - visualizer=MinimalVisualizer(), + workspace=os.getcwd(), + persistence_dir=str(persistence_root), + conversation_id=conversation_id, ) -# Send a message and let the agent run -print("Sending task to agent...") -conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.send_message( + "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " + "Reply with a short confirmation once done." +) conversation.run() -print("Task completed!") + +conversation.send_message( + "Without using any tools, summarize in one sentence what you did." +) +conversation.run() + +assert conversation.state.persistence_dir is not None +persistence_dir = Path(conversation.state.persistence_dir) +event_dir = persistence_dir / "events" + +event_paths = sorted(event_dir.glob("event-*.json")) + +if not event_paths: + raise RuntimeError("No event files found. Was persistence enabled?") + +###### +# Read from serialized events +###### + + +events = [Event.model_validate_json(path.read_text()) for path in event_paths] + +convertible_events = [ + event for event in events if isinstance(event, LLMConvertibleEvent) +] +llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) + +if llm.uses_responses_api(): + logger.info("Formatting messages for the OpenAI Responses API.") + instructions, input_items = llm.format_messages_for_responses(llm_messages) + logger.info("Responses instructions:\n%s", instructions) + logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) +else: + logger.info("Formatting messages for the OpenAI Chat Completions API.") + chat_messages = llm.format_messages_for_llm(llm_messages) + logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) # Report cost cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost:.4f}") +print(f"EXAMPLE_COST: {cost}") ``` - + -## Next Steps -Now that you understand custom visualizers, explore these related topics: +## How State Persistence Works -- **[Events](/sdk/arch/events)** - Learn more about different event types -- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data -- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic +The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. +### Auto-Save Mechanism -# Pause and Resume -Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume +When you modify any public field on `ConversationState`, the SDK automatically: -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +1. Detects the field change via a custom `__setattr__` implementation +2. Serializes the entire base state to `base_state.json` +3. Triggers any registered state change callbacks -> A ready-to-run example is available [here](#ready-to-run-example)! +This happens transparently—you don't need to call any save methods manually. -### Pausing Execution +```python +# These changes are automatically persisted: +conversation.state.execution_status = ConversationExecutionStatus.RUNNING +conversation.state.max_iterations = 100 +``` -Pause the agent from another thread or after a delay using `conversation.pause()`, and -Resume the paused conversation after performing operations by calling `conversation.run()` again. +### Events vs Base State -```python icon="python" focus={9, 15} wrap -import time -thread = threading.Thread(target=conversation.run) -thread.start() +The persistence system separates data into two categories: -print("Letting agent work for 5 seconds...") -time.sleep(5) +| Category | Storage | Contents | +|----------|---------|----------| +| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | +| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | -print("Pausing the agent...") -conversation.pause() +Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: +- **Fast event appends**: No need to rewrite the entire history +- **Atomic state updates**: Base state is always consistent +- **Efficient restoration**: Events can be loaded lazily -print("Waiting for 5 seconds...") -time.sleep(5) -print("Resuming the execution...") -conversation.run() -``` -## Ready-to-run Example +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Send Message While Running +Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + -This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) -Pause agent execution mid-task by calling `conversation.pause()`: +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: + +```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. + +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. + +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document + +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) + +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. + +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing + +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa -```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py import os import threading import time +from datetime import datetime from pydantic import SecretStr @@ -25148,6 +24639,7 @@ llm = LLM( ) # Tools +cwd = os.getcwd() tools = [ Tool( name=TerminalTool.name, @@ -25157,4696 +24649,5210 @@ tools = [ # Agent agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent, workspace=os.getcwd()) +conversation = Conversation(agent) -print("=" * 60) -print("Pause and Continue Example") -print("=" * 60) -print() -# Phase 1: Start a long-running task -print("Phase 1: Starting agent with a task...") +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Send Message While Processing Example ===") + +# Step 1: Send initial message +start_time = timestamp() conversation.send_message( - "Create a file called countdown.txt and write numbers from 100 down to 1, " - "one number per line. After you finish, summarize what you did." + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa ) -print(f"Initial status: {conversation.state.execution_status}") -print() - -# Start the agent in a background thread +# Step 2: Start agent processing in background thread = threading.Thread(target=conversation.run) thread.start() -# Let the agent work for a few seconds -print("Letting agent work for 2 seconds...") -time.sleep(2) - -# Phase 2: Pause the agent -print() -print("Phase 2: Pausing the agent...") -conversation.pause() - -# Wait for the thread to finish (it will stop when paused) -thread.join() +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working -print(f"Agent status after pause: {conversation.state.execution_status}") -print() +second_time = timestamp() -# Phase 3: Send a new message while paused -print("Phase 3: Sending a new message while agent is paused...") conversation.send_message( - "Actually, stop working on countdown.txt. Instead, create a file called " - "hello.txt with just the text 'Hello, World!' in it." + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." ) -print() -# Phase 4: Resume the agent with .run() -print("Phase 4: Resuming agent with .run()...") -print(f"Status before resume: {conversation.state.execution_status}") +# Wait for completion +thread.join() -# Resume execution -conversation.run() +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() -print(f"Final status: {conversation.state.execution_status}") + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") + + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") + + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") # Report cost cost = llm.metrics.accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - + +### Sending Messages During Execution +As shown in the example above, use threading to send messages while the agent is running: -## Next Steps +```python icon="python" +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() -- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state -- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() +``` + +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion -# Persistence -Source: https://docs.openhands.dev/sdk/guides/convo-persistence +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Critic (Experimental) +Source: https://docs.openhands.dev/sdk/guides/critic.md + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + > A ready-to-run example is available [here](#ready-to-run-example)! -## How to use Persistence -Save conversation state to disk and restore it later for long-running or multi-session workflows. +## What is a Critic? -### Saving State +A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: -Create a conversation with a unique ID to enable persistence: +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion +- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold -```python focus={3-4,10-11} icon="python" wrap -import uuid +You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. -conversation_id = uuid.uuid4() -persistence_dir = "./.conversations" + +This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. + -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, +## Quick Start + +When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. + +## Understanding Critic Results + +Critic evaluations produce scores and feedback: + +- **`score`**: Float between 0.0 and 1.0 representing predicted success probability +- **`message`**: Optional feedback with detailed probabilities +- **`success`**: Boolean property (True if score >= 0.5) + +Results are automatically displayed in the conversation visualizer: + +![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) + +### Accessing Results Programmatically + +```python icon="python" focus={4-7} +from openhands.sdk import Event, ActionEvent, MessageEvent + +def callback(event: Event): + if isinstance(event, (ActionEvent, MessageEvent)): + if event.critic_result is not None: + print(f"Critic score: {event.critic_result.score:.3f}") + print(f"Success: {event.critic_result.success}") + +conversation = Conversation(agent=agent, callbacks=[callback]) +``` + +## Iterative Refinement with a Critic + +The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. + +### How It Works + +1. Agent completes a task and calls `FinishAction` +2. Critic evaluates the result and produces a score +3. If score < `success_threshold`, a follow-up prompt is sent automatically +4. Agent continues working to address issues +5. Process repeats until score meets threshold or `max_iterations` is reached + +### Configuration + +Use `IterativeRefinementConfig` to enable automatic retries: + +```python icon="python" focus={1,4-7,12} +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig + +# Configure iterative refinement +iterative_config = IterativeRefinementConfig( + success_threshold=0.7, # Retry if score < 70% + max_iterations=3, # Maximum retry attempts +) + +# Attach to critic +critic = APIBasedCritic( + server_url="https://llm-proxy.eval.all-hands.dev/vllm", + api_key=api_key, + model_name="critic", + iterative_refinement=iterative_config, ) -conversation.send_message("Start long task") -conversation.run() # State automatically saved ``` -### Restoring State +### Parameters -Restore a conversation using the same ID and persistence directory: +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | +| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | -```python focus={9-10} icon="python" -# Later, in a different session -del conversation +### Custom Follow-up Prompts + +By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: + +```python icon="python" focus={4-12} +from openhands.sdk.critic.base import CriticBase, CriticResult + +class CustomCritic(APIBasedCritic): + def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: + score_percent = critic_result.score * 100 + return f""" +Your solution scored {score_percent:.1f}% (iteration {iteration}). + +Please review your work carefully: +1. Check that all requirements are met +2. Verify tests pass +3. Fix any issues and try again +""" +``` + +### Example Workflow + +Here's what happens during iterative refinement: + +``` +Iteration 1: + → Agent creates files, runs tests + → Agent calls FinishAction + → Critic evaluates: score = 0.45 (below 0.7 threshold) + → Follow-up prompt sent automatically + +Iteration 2: + → Agent reviews and fixes issues + → Agent calls FinishAction + → Critic evaluates: score = 0.72 (above threshold) + → ✅ Success! Conversation ends +``` + +## Troubleshooting + +### Critic Evaluations Not Appearing + +- Verify the critic is properly configured and passed to the Agent +- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) + +### API Authentication Errors + +- Verify `LLM_API_KEY` is set correctly +- Check that the API key has not expired + +### Iterative Refinement Not Triggering + +- Ensure `iterative_refinement` config is attached to the critic +- Check that `success_threshold` is set appropriately (higher values trigger more retries) +- Verify the agent is using `FinishAction` to complete tasks + +## Ready-to-run Example + + +The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) + + +This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. + +```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py +"""Iterative Refinement with Critic Model Example. + +This is EXPERIMENTAL. + +This example demonstrates how to use a critic model to shepherd an agent through +complex, multi-step tasks. The critic evaluates the agent's progress and provides +feedback that can trigger follow-up prompts when the agent hasn't completed the +task successfully. + +Key concepts demonstrated: +1. Setting up a critic with IterativeRefinementConfig for automatic retry +2. Conversation.run() automatically handles retries based on critic scores +3. Custom follow-up prompt generation via critic.get_followup_prompt() +4. Iterating until the task is completed successfully or max iterations reached + +For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured +using the same base_url with /vllm suffix and "critic" as the model name. +""" + +import os +import re +import tempfile +from pathlib import Path + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +from openhands.sdk.critic.base import CriticBase +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configuration +# Higher threshold (70%) makes it more likely the agent needs multiple iterations, +# which better demonstrates how iterative refinement works. +# Adjust as needed to see different behaviors. +SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) -# Deserialize the conversation -print("Deserializing conversation...") -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) -conversation.send_message("Continue task") -conversation.run() # Continues from saved state -``` +def get_required_env(name: str) -> str: + value = os.getenv(name) + if value: + return value + raise ValueError( + f"Missing required environment variable: {name}. " + f"Set {name} before running this example." + ) -## What Gets Persisted -The conversation state includes information that allows seamless restoration: +def get_default_critic(llm: LLM) -> CriticBase | None: + """Auto-configure critic for All-Hands LLM proxy. -- **Message History**: Complete event log including user messages, agent responses, and system events -- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters -- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings -- **Tool Outputs**: Results from bash commands, file operations, and other tool executions -- **Statistics**: LLM usage metrics like token counts and API calls -- **Workspace Context**: Working directory and file system state -- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation -- **Secrets**: Managed credentials and API keys -- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) + When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an + APIBasedCritic configured with: + - server_url: {base_url}/vllm + - api_key: same as LLM + - model_name: "critic" - - For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. - + Args: + llm: The LLM instance to derive critic configuration from. -## Persistence Directory Structure + Returns: + An APIBasedCritic if the LLM is configured for All-Hands proxy, + None otherwise. -When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each -conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` -(unless you specify a custom path). + Example: + llm = LLM( + model="anthropic/claude-sonnet-4-5", + api_key=api_key, + base_url="https://llm-proxy.eval.all-hands.dev", + ) + critic = get_default_critic(llm) + if critic is None: + # Fall back to explicit configuration + critic = APIBasedCritic( + server_url="https://my-critic-server.com", + api_key="my-api-key", + model_name="my-critic-model", + ) + """ + base_url = llm.base_url + api_key = llm.api_key + if base_url is None or api_key is None: + return None -**Directory structure:** - - - - - - - - - - - - - - - - - - - - + # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) + pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" + if not re.match(pattern, base_url): + return None -Each conversation directory contains: -- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata -- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) + return APIBasedCritic( + server_url=f"{base_url.rstrip('/')}/vllm", + api_key=api_key, + model_name="critic", + ) -The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. -## Ready-to-run Example +# Task prompt designed to be moderately complex with subtle requirements. +# The task is simple enough to complete in 1-2 iterations, but has specific +# requirements that are easy to miss - triggering critic feedback. +INITIAL_TASK_PROMPT = """\ +Create a Python word statistics tool called `wordstats` that analyzes text files. - -This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) - +## Structure -```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py -import os -import uuid +Create directory `wordstats/` with: +- `stats.py` - Main module with `analyze_file(filepath)` function +- `cli.py` - Command-line interface +- `tests/test_stats.py` - Unit tests -from pydantic import SecretStr +## Requirements for stats.py -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +The `analyze_file(filepath)` function must return a dict with these EXACT keys: +- `lines`: total line count (including empty lines) +- `words`: word count +- `chars`: character count (including whitespace) +- `unique_words`: count of unique words (case-insensitive) +### Important edge cases (often missed!): +1. Empty files must return all zeros, not raise an exception +2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) +3. Numbers like "123" or "3.14" are NOT counted as words +4. Contractions like "don't" count as ONE word +5. File not found must raise FileNotFoundError with a clear message -logger = get_logger(__name__) +## Requirements for cli.py -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +When run as `python cli.py `: +- Print each stat on its own line: "Lines: X", "Words: X", etc. +- Exit with code 1 if file not found, printing error to stderr +- Exit with code 0 on success -# Tools -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +## Required Tests (test_stats.py) -# Add MCP Tools -mcp_config = { - "mcpServers": { - "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, - } -} -# Agent -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) +Write tests that verify: +1. Basic counting on normal text +2. Empty file returns all zeros +3. Hyphenated words counted correctly +4. Numbers are excluded from word count +5. FileNotFoundError raised for missing files -llm_messages = [] # collect raw LLM messages +## Verification Steps +1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): +``` +Hello world! +This is a well-known test file. -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -conversation_id = uuid.uuid4() -persistence_dir = "./.conversations" +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " - "about the project into FACTS.txt." +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), ) -conversation.run() -conversation.send_message("Great! Now delete that file.") -conversation.run() +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -# Conversation persistence -print("Serializing conversation...") +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -del conversation +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -# Deserialize the conversation -print("Deserializing conversation...") +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config conversation = Conversation( agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, + workspace=str(workspace), ) -print("Sending message to deserialized conversation...") -conversation.send_message("Hey what did you create? Return an agent finish action") +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) conversation.run() +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + # Report cost cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print(f"\nEXAMPLE_COST: {cost:.4f}") ``` +Hello world! +This is a well-known test file. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` - - -## Reading serialized events - -Convert persisted events into LLM-ready messages for reuse or analysis. - - -This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) - - -```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py -"""Load persisted events and convert them into LLM-ready messages.""" +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -import json -import os -import uuid -from pathlib import Path +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -from pydantic import SecretStr +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -conversation_id = uuid.uuid4() -persistence_root = Path(".conversations") -log_dir = ( - persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), ) -os.environ.setdefault("LOG_JSON", "true") -os.environ.setdefault("LOG_TO_FILE", "true") -os.environ.setdefault("LOG_DIR", str(log_dir)) -os.environ.setdefault("LOG_LEVEL", "INFO") - -from openhands.sdk import ( # noqa: E402 - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - Tool, +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, ) -from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 -from openhands.tools.terminal import TerminalTool # noqa: E402 - - -setup_logging(log_to_file=True, log_dir=str(log_dir)) -logger = get_logger(__name__) - -api_key = os.getenv("LLM_API_KEY") -if not api_key: - raise RuntimeError("LLM_API_KEY environment variable is not set.") -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +# Create agent with critic (iterative refinement is built into the critic) agent = Agent( llm=llm, - tools=[Tool(name=TerminalTool.name)], + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, ) -###### -# Create a conversation that persists its events -###### +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config conversation = Conversation( agent=agent, - workspace=os.getcwd(), - persistence_dir=str(persistence_root), - conversation_id=conversation_id, + workspace=str(workspace), ) -conversation.send_message( - "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " - "Reply with a short confirmation once done." -) +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) conversation.run() -conversation.send_message( - "Without using any tools, summarize in one sentence what you did." +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), ) -conversation.run() -assert conversation.state.persistence_dir is not None -persistence_dir = Path(conversation.state.persistence_dir) -event_dir = persistence_dir / "events" +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -event_paths = sorted(event_dir.glob("event-*.json")) +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -if not event_paths: - raise RuntimeError("No event files found. Was persistence enabled?") +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -###### -# Read from serialized events -###### +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -events = [Event.model_validate_json(path.read_text()) for path in event_paths] +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -convertible_events = [ - event for event in events if isinstance(event, LLMConvertibleEvent) -] -llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -if llm.uses_responses_api(): - logger.info("Formatting messages for the OpenAI Responses API.") - instructions, input_items = llm.format_messages_for_responses(llm_messages) - logger.info("Responses instructions:\n%s", instructions) - logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) -else: - logger.info("Formatting messages for the OpenAI Chat Completions API.") - chat_messages = llm.format_messages_for_llm(llm_messages) - logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") # Report cost cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print(f"\nEXAMPLE_COST: {cost:.4f}") ``` +Hello world! +This is a well-known test file. - - - -## How State Persistence Works - -The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. - -### Auto-Save Mechanism - -When you modify any public field on `ConversationState`, the SDK automatically: +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -1. Detects the field change via a custom `__setattr__` implementation -2. Serializes the entire base state to `base_state.json` -3. Triggers any registered state change callbacks +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -This happens transparently—you don't need to call any save methods manually. +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -```python -# These changes are automatically persisted: -conversation.state.execution_status = ConversationExecutionStatus.RUNNING -conversation.state.max_iterations = 100 -``` +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -### Events vs Base State -The persistence system separates data into two categories: +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -| Category | Storage | Contents | -|----------|---------|----------| -| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | -| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: -- **Fast event appends**: No need to rewrite the entire history -- **Atomic state updates**: Base state is always consistent -- **Efficient restoration**: Events can be loaded lazily +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -## Next Steps +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -# Send Message While Running -Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` - -This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) - +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass """ -Example demonstrating that user messages can be sent and processed while -an agent is busy. - -This example demonstrates a key capability of the OpenHands agent system: the ability -to receive and process new user messages even while the agent is actively working on -a previous task. This is made possible by the agent's event-driven architecture. - -Demonstration Flow: -1. Send initial message asking agent to: - - Write "Message 1 sent at [time], written at [CURRENT_TIME]" - - Wait 3 seconds - - Write "Message 2 sent at [time], written at [CURRENT_TIME]" - [time] is the time the message was sent to the agent - [CURRENT_TIME] is the time the agent writes the line -2. Start agent processing in a background thread -3. While agent is busy (during the 3-second delay), send a second message asking to add: - - "Message 3 sent at [time], written at [CURRENT_TIME]" -4. Verify that all three lines are processed and included in the final document -Expected Evidence: -The final document will contain three lines with dual timestamps: -- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) -- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) -- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) -The timestamps will show that Message 3 was sent while the agent was running, -but was still successfully processed and written to the document. +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -This proves that: -- The second user message was sent while the agent was processing the first task -- The agent successfully received and processed the second message -- The agent's event system allows for real-time message integration during processing +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -Key Components Demonstrated: -- Conversation.send_message(): Adds messages to events list immediately -- Agent.step(): Processes all events including newly added messages -- Threading: Allows message sending while agent is actively processing -""" # noqa +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -import os -import threading -import time -from datetime import datetime +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -from pydantic import SecretStr +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -from openhands.sdk import ( - LLM, - Agent, - Conversation, +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), ) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent) +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -def timestamp() -> str: - return datetime.now().strftime("%H:%M:%S") +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -print("=== Send Message While Processing Example ===") -# Step 1: Send initial message -start_time = timestamp() -conversation.send_message( - f"Create a file called document.txt and write this first sentence: " - f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write the line. " - f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), ) -# Step 2: Start agent processing in background -thread = threading.Thread(target=conversation.run) -thread.start() - -# Step 3: Wait then send second message while agent is processing -time.sleep(2) # Give agent time to start working +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -second_time = timestamp() +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -conversation.send_message( - f"Please also add this second sentence to document.txt: " - f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write this line." +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, ) -# Wait for completion -thread.join() +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -# Verification -document_path = os.path.join(cwd, "document.txt") -if os.path.exists(document_path): - with open(document_path) as f: - content = f.read() +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) - print("\nDocument contents:") - print("─────────────────────") - print(content) - print("─────────────────────") +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") - # Check if both messages were processed - if "Message 1" in content and "Message 2" in content: - print("\nSUCCESS: Agent processed both messages!") - print( - "This proves the agent received the second message while processing the first task." # noqa - ) - else: - print("\nWARNING: Agent may not have processed the second message") +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() - # Clean up - os.remove(document_path) -else: - print("WARNING: Document.txt was not created") +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") # Report cost cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print(f"\nEXAMPLE_COST: {cost:.4f}") ``` - +```bash Running the Example icon="terminal" +LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ + uv run python examples/01_standalone_sdk/34_critic_example.py +``` -### Sending Messages During Execution +### Example Output -As shown in the example above, use threading to send messages while the agent is running: +``` +📁 Created workspace: /tmp/critic_demo_abc123 -```python icon="python" -# Start agent processing in background -thread = threading.Thread(target=conversation.run) -thread.start() +====================================================================== +🚀 Starting Iterative Refinement with Critic Model +====================================================================== +Success threshold: 70% +Max iterations: 3 -# Wait then send second message while agent is processing -time.sleep(2) # Give agent time to start working +... agent works on the task ... -second_time = timestamp() +✓ Critic evaluation: score=0.758, success=True -conversation.send_message( - f"Please also add this second sentence to document.txt: " - f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write this line." -) +Created files: + - sample.txt + - wordstats/cli.py + - wordstats/stats.py + - wordstats/tests/test_stats.py -# Wait for completion -thread.join() +EXAMPLE_COST: 0.0234 ``` -The key steps are: -1. Start `conversation.run()` in a background thread -2. Send additional messages using `conversation.send_message()` while the agent is processing -3. Use `thread.join()` to wait for completion - -The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. - ## Next Steps -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations - - -# Critic (Experimental) -Source: https://docs.openhands.dev/sdk/guides/critic - - -**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. - - -> A ready-to-run example is available [here](#ready-to-run-example)! - - -## What is a Critic? - -A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: - -- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success -- **Real-time feedback**: Scores computed during agent execution, not just at completion -- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold - -You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. +- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior +- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics +- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns - -This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. - +### Custom Tools +Source: https://docs.openhands.dev/sdk/guides/custom-tools.md -## Quick Start +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. +> The ready-to-run example is available [here](#ready-to-run-example)! -## Understanding Critic Results +## Understanding the Tool System -Critic evaluations produce scores and feedback: +The SDK's tool system is built around three core components: -- **`score`**: Float between 0.0 and 1.0 representing predicted success probability -- **`message`**: Optional feedback with detailed probabilities -- **`success`**: Boolean property (True if score >= 0.5) +1. **Action** - Defines input parameters (what the tool accepts) +2. **Observation** - Defines output data (what the tool returns) +3. **Executor** - Implements the tool's logic (what the tool does) -Results are automatically displayed in the conversation visualizer: +These components are tied together by a **ToolDefinition** that registers the tool with the agent. -![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) +## Built-in Tools -### Accessing Results Programmatically +The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. -```python icon="python" focus={4-7} -from openhands.sdk import Event, ActionEvent, MessageEvent +```python icon="python" wrap +from openhands.tools import BashTool, FileEditorTool +from openhands.tools.preset import get_default_tools -def callback(event: Event): - if isinstance(event, (ActionEvent, MessageEvent)): - if event.critic_result is not None: - print(f"Critic score: {event.critic_result.score:.3f}") - print(f"Success: {event.critic_result.success}") +# Use specific tools +agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) -conversation = Conversation(agent=agent, callbacks=[callback]) +# Or use preset +tools = get_default_tools() +agent = Agent(llm=llm, tools=tools) ``` -## Iterative Refinement with a Critic + +See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. + -The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. +## Creating a Custom Tool -### How It Works +Here's a minimal example of creating a custom grep tool: -1. Agent completes a task and calls `FinishAction` -2. Critic evaluates the result and produces a score -3. If score < `success_threshold`, a follow-up prompt is sent automatically -4. Agent continues working to address issues -5. Process repeats until score meets threshold or `max_iterations` is reached + + + ### Define the Action + Defines input parameters (what the tool accepts) -### Configuration + ```python icon="python" wrap + class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", + description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, + description="Optional glob to filter files (e.g. '*.py')" + ) + ``` + + + ### Define the Observation + Defines output data (what the tool returns) -Use `IterativeRefinementConfig` to enable automatic retries: + ```python icon="python" wrap + class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 -```python icon="python" focus={1,4-7,12} -from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + ``` + + The to_llm_content() property formats observations for the LLM. + + + + ### Define the Executor + Implements the tool’s logic (what the tool does) -# Configure iterative refinement -iterative_config = IterativeRefinementConfig( - success_threshold=0.7, # Retry if score < 70% - max_iterations=3, # Maximum retry attempts -) + ```python icon="python" wrap + class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal -# Attach to critic -critic = APIBasedCritic( - server_url="https://llm-proxy.eval.all-hands.dev/vllm", - api_key=api_key, - model_name="critic", - iterative_refinement=iterative_config, -) -``` + def __call__( + self, + action: GrepAction, + conversation=None, + ) -> GrepObservation: + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) -### Parameters + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q}" + else: + cmd = f"grep -rHnE {pat} {root_q}" + cmd += " 2>/dev/null | head -100" + result = self.terminal(TerminalAction(command=cmd)) -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | -| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | + matches: list[str] = [] + files: set[str] = set() -### Custom Follow-up Prompts + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text -By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" + # take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) -```python icon="python" focus={4-12} -from openhands.sdk.critic.base import CriticBase, CriticResult + return GrepObservation( + matches=matches, + files=sorted(files), + count=len(matches), + ) + ``` + + + ### Finally, define the tool + ```python icon="python" wrap + class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """Custom grep tool that searches file contents using regular expressions.""" -class CustomCritic(APIBasedCritic): - def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: - score_percent = critic_result.score * 100 - return f""" -Your solution scored {score_percent:.1f}% (iteration {iteration}). + @classmethod + def create( + cls, + conv_state, + terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. -Please review your work carefully: -1. Check that all requirements are met -2. Verify tests pass -3. Fix any issues and try again -""" -``` + Args: + conv_state: Conversation state to get + working directory from. + terminal_executor: Optional terminal executor to reuse. + If not provided, a new one will be created. -### Example Workflow + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) -Here's what happens during iterative refinement: + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + ``` + + -``` -Iteration 1: - → Agent creates files, runs tests - → Agent calls FinishAction - → Critic evaluates: score = 0.45 (below 0.7 threshold) - → Follow-up prompt sent automatically +## Good to know +### Tool Registration +Tools are registered using `register_tool()` and referenced by name: -Iteration 2: - → Agent reviews and fixes issues - → Agent calls FinishAction - → Critic evaluates: score = 0.72 (above threshold) - → ✅ Success! Conversation ends +```python icon="python" wrap +# Register a simple tool class +register_tool("FileEditorTool", FileEditorTool) + +# Register a factory function that creates multiple tools +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +# Use registered tools by name +tools = [ + Tool(name="FileEditorTool"), + Tool(name="BashAndGrepToolSet"), +] ``` -## Troubleshooting +### Factory Functions +Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: -### Critic Evaluations Not Appearing +```python icon="python" wrap +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create execute_bash and custom grep tools sharing one executor.""" + bash_executor = BashExecutor( + working_dir=conv_state.workspace.working_dir + ) + # Create and configure tools... + return [bash_tool, grep_tool] +``` -- Verify the critic is properly configured and passed to the Agent -- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) +### Shared Executors +Multiple tools can share executors for efficiency and state consistency: -### API Authentication Errors +```python icon="python" wrap +bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) +bash_tool = execute_bash_tool.set_executor(executor=bash_executor) -- Verify `LLM_API_KEY` is set correctly -- Check that the API key has not expired +grep_executor = GrepExecutor(bash_executor) +grep_tool = ToolDefinition( + name="grep", + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, +) +``` -### Iterative Refinement Not Triggering +## When to Create Custom Tools -- Ensure `iterative_refinement` config is attached to the critic -- Check that `success_threshold` is set appropriately (higher values trigger more retries) -- Verify the agent is using `FinishAction` to complete tasks +Create custom tools when you need to: +- Combine multiple operations into a single, structured interface +- Add typed parameters with validation +- Format complex outputs for LLM consumption +- Integrate with external APIs or services ## Ready-to-run Example -The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) +This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) -This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. - -```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py -"""Iterative Refinement with Critic Model Example. - -This is EXPERIMENTAL. - -This example demonstrates how to use a critic model to shepherd an agent through -complex, multi-step tasks. The critic evaluates the agent's progress and provides -feedback that can trigger follow-up prompts when the agent hasn't completed the -task successfully. - -Key concepts demonstrated: -1. Setting up a critic with IterativeRefinementConfig for automatic retry -2. Conversation.run() automatically handles retries based on critic scores -3. Custom follow-up prompt generation via critic.get_followup_prompt() -4. Iterating until the task is completed successfully or max iterations reached - -For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured -using the same base_url with /vllm suffix and "critic" as the model name. -""" +```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py +"""Advanced example showing explicit executor usage and custom grep tool.""" import os -import re -import tempfile -from pathlib import Path - -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig -from openhands.sdk.critic.base import CriticBase -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +import shlex +from collections.abc import Sequence +from pydantic import Field, SecretStr -# Configuration -# Higher threshold (70%) makes it more likely the agent needs multiple iterations, -# which better demonstrates how iterative refinement works. -# Adjust as needed to see different behaviors. -SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) -MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) +from openhands.sdk import ( + LLM, + Action, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Observation, + TextContent, + ToolDefinition, + get_logger, +) +from openhands.sdk.tool import ( + Tool, + ToolExecutor, + register_tool, +) +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import ( + TerminalAction, + TerminalExecutor, + TerminalTool, +) -def get_required_env(name: str) -> str: - value = os.getenv(name) - if value: - return value - raise ValueError( - f"Missing required environment variable: {name}. " - f"Set {name} before running this example." - ) +logger = get_logger(__name__) +# --- Action / Observation --- -def get_default_critic(llm: LLM) -> CriticBase | None: - """Auto-configure critic for All-Hands LLM proxy. - When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an - APIBasedCritic configured with: - - server_url: {base_url}/vllm - - api_key: same as LLM - - model_name: "critic" +class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, description="Optional glob to filter files (e.g. '*.py')" + ) - Args: - llm: The LLM instance to derive critic configuration from. - Returns: - An APIBasedCritic if the LLM is configured for All-Hands proxy, - None otherwise. +class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 - Example: - llm = LLM( - model="anthropic/claude-sonnet-4-5", - api_key=api_key, - base_url="https://llm-proxy.eval.all-hands.dev", + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" ) - critic = get_default_critic(llm) - if critic is None: - # Fall back to explicit configuration - critic = APIBasedCritic( - server_url="https://my-critic-server.com", - api_key="my-api-key", - model_name="my-critic-model", - ) - """ - base_url = llm.base_url - api_key = llm.api_key - if base_url is None or api_key is None: - return None - - # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) - pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" - if not re.match(pattern, base_url): - return None - - return APIBasedCritic( - server_url=f"{base_url.rstrip('/')}/vllm", - api_key=api_key, - model_name="critic", - ) - + return [TextContent(text=ret)] -# Task prompt designed to be moderately complex with subtle requirements. -# The task is simple enough to complete in 1-2 iterations, but has specific -# requirements that are easy to miss - triggering critic feedback. -INITIAL_TASK_PROMPT = """\ -Create a Python word statistics tool called `wordstats` that analyzes text files. -## Structure +# --- Executor --- -Create directory `wordstats/` with: -- `stats.py` - Main module with `analyze_file(filepath)` function -- `cli.py` - Command-line interface -- `tests/test_stats.py` - Unit tests -## Requirements for stats.py +class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal -The `analyze_file(filepath)` function must return a dict with these EXACT keys: -- `lines`: total line count (including empty lines) -- `words`: word count -- `chars`: character count (including whitespace) -- `unique_words`: count of unique words (case-insensitive) + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) -### Important edge cases (often missed!): -1. Empty files must return all zeros, not raise an exception -2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) -3. Numbers like "123" or "3.14" are NOT counted as words -4. Contractions like "don't" count as ONE word -5. File not found must raise FileNotFoundError with a clear message + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" + else: + cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" -## Requirements for cli.py + result = self.terminal(TerminalAction(command=cmd)) -When run as `python cli.py `: -- Print each stat on its own line: "Lines: X", "Words: X", etc. -- Exit with code 1 if file not found, printing error to stderr -- Exit with code 0 on success + matches: list[str] = [] + files: set[str] = set() -## Required Tests (test_stats.py) + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text -Write tests that verify: -1. Basic counting on normal text -2. Empty file returns all zeros -3. Hyphenated words counted correctly -4. Numbers are excluded from word count -5. FileNotFoundError raised for missing files + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" — take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) -## Verification Steps + return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) -1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): -``` -Hello world! -This is a well-known test file. -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` +# Tool description +_GREP_DESCRIPTION = """Fast content search tool. +* Searches file contents using regular expressions +* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) +* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") +* Returns matching file paths sorted by modification time. +* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. +* Use this tool when you need to find files containing specific patterns +* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead +""" # noqa: E501 -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +# --- Tool Definition --- -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """A custom grep tool that searches file contents using regular expressions.""" -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) + @classmethod + def create( + cls, conv_state, terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) + Args: + conv_state: Conversation state to get working directory from. + terminal_executor: Optional terminal executor to reuse. If not provided, + a new one will be created. -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), ) -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +# Tools - demonstrating both simplified and advanced patterns +cwd = os.getcwd() -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create terminal and custom grep tools sharing one executor.""" -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. + terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) + # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) + terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` + # Use the GrepTool.create() method with shared terminal_executor + grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 + return [terminal_tool, grep_tool] -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) +tools = [ + Tool(name=FileEditorTool.name), + Tool(name="BashAndGrepToolSet"), +] -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +# Agent +agent = Agent(llm=llm, tools=tools) -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +llm_messages = [] # collect raw LLM messages -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config conversation = Conversation( - agent=agent, - workspace=str(workspace), + agent=agent, callbacks=[conversation_callback], workspace=cwd ) -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +conversation.send_message( + "Hello! Can you use the grep tool to find all files " + "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 + "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 +) +conversation.run() -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) +conversation.send_message("Great! Now delete that file.") conversation.run() -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") # Report cost cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. - -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. +print(f"EXAMPLE_COST: {cost}") ``` -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 + -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +## Next Steps -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers +- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation +### Assign Reviews +Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +> The reference workflow is available [here](#reference-workflow)! -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +## How it works -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +**Core Components:** +- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts +- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +**Prompt Options:** +1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) +2. **`PROMPT_LOCATION`** - URL or file path for external prompts -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +## Assign Reviews Use Case -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +This specific implementation uses the basic action template to handle three PR management scenarios: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. +**1. Need Reviewer Action** +- Identifies PRs waiting for review +- Notifies reviewers to take action -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` +**2. Need Author Action** +- Finds stale PRs with no activity for 5+ days +- Prompts authors to update, request review, or close -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +**3. Need Reviewers** +- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) +- Uses git blame analysis to identify relevant contributors +- Automatically assigns reviewers based on file ownership and contribution history +- Balances reviewer workload across team members -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +## Quick Start -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" + + + ```bash icon="terminal" + cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml + ``` + + + Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". + + + The default is: Daily at 12 PM UTC. + + +## Features -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership +- **Automated Notifications** - Sends contextual reminders to reviewers and authors +- **Workload Balancing** - Distributes review requests evenly across team members +- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +## Reference Workflow -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) + -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml +--- +# To set this up: +# 1. Change the name below to something relevant to your task +# 2. Modify the "env" section below with your prompt +# 3. Add your LLM_API_KEY to the repository secrets +# 4. Commit this file to your repository +# 5. Trigger the workflow manually or set up a schedule +name: Assign Reviews -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +on: + # Manual trigger + workflow_dispatch: + # Scheduled trigger (disabled by default, uncomment and customize as needed) + schedule: + # Run at 12 PM UTC every day + - cron: 0 12 * * * -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +permissions: + contents: write + pull-requests: write + issues: write -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +jobs: + run-task: + runs-on: ubuntu-24.04 + env: + # Configuration (modify these values as needed) + AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py + # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both + # Option 1: Use a URL or file path for the prompt + PROMPT_LOCATION: '' + # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + # Option 2: Use direct text for the prompt + PROMPT_STRING: > + Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. + Read the sections below in order, and perform each in order. Do NOT take action + on the same issue or PR twice. -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() + # Issues with needs-info - Check for OP Response -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") + Find all open issues that have the "needs-info" label. For each issue: + 1. Identify the original poster (issue author) + 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added + 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline + and look for "labeled" events with the label "needs-info" + 4. If the original poster has commented after the label was added: + - Remove the "needs-info" label + - Add the "needs-triage" label + - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. + # Issues with needs-triage -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` + Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last + activity: + 1. First, check if the issue has already been triaged by verifying it does NOT have: + - The "enhancement" label + - Any "priority" label (priority:low, priority:medium, priority:high, etc.) + 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label + 3. For issues that have NOT been triaged yet: + - Read the issue description and comments + - Determine if it requires maintainer attention by checking: + * Is it a bug report, feature request, or question? + * Does it have enough information to be actionable? + * Has a maintainer already commented? + * Is the last comment older than 4 days? + - If it needs maintainer attention and no maintainer has commented: + * Find an appropriate maintainer based on the issue topic and recent activity + * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have + a chance?" -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 + # Need Reviewer Action -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. + Find all open PRs where: + 1. The PR is waiting for review (there are no open review comments or change requests) + 2. The PR is in a "clean" state (CI passing, no merge conflicts) + 3. The PR is not marked as draft (draft: false) + 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" + In this case, send a message to the reviewers: + [Automatic Post]: This PR seems to be currently waiting for review. + {reviewer_names}, could you please take a look when you have a chance? + # Need Author Action -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) + Find all open PRs where the most recent change or comment was made on the pull + request more than 5 days ago (use 14 days if the PR is marked as draft). -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) + And send a message to the author: -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + [Automatic Post]: It has been a while since there was any activity on this PR. + {author}, are you still working on it? If so, please go ahead, if not then + please request review, close it, or request that someone else follow up. -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) + # Need Reviewers -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") + Find all open pull requests that: + 1. Have no reviewers assigned to them. + 2. Are not marked as draft. + 3. Were created more than 1 day ago. + 4. CI is passing and there are no merge conflicts. -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) + For each of these pull requests, read the git blame information for the files, + and find the most recent and active contributors to the file/location of the changes. + Assign one of these people as a reviewer, but try not to assign too many reviews to + any single person. Add this message: -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") + [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. + Thanks in advance for the help! -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() + LLM_MODEL: + LLM_BASE_URL: + steps: + - name: Checkout repository + uses: actions/checkout@v5 -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") + - name: Set up Python + uses: actions/setup-python@v6 + with: + python-version: '3.13' -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. + - name: Install uv + uses: astral-sh/setup-uv@v7 + with: + enable-cache: true -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 + - name: Check required configuration + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + run: | + if [ -z "$LLM_API_KEY" ]; then + echo "Error: LLM_API_KEY secret is not set." + exit 1 + fi -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. + # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set + if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then + echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." + echo "Please provide only one in the env section of the workflow file." + exit 1 + fi -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" + if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then + echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." + echo "Please set one in the env section of the workflow file." + exit 1 + fi + if [ -n "$PROMPT_LOCATION" ]; then + echo "Prompt location: $PROMPT_LOCATION" + else + echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" + fi + echo "LLM model: $LLM_MODEL" + if [ -n "$LLM_BASE_URL" ]; then + echo "LLM base URL: $LLM_BASE_URL" + fi -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) + - name: Run task + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + PYTHONPATH: '' + run: | + echo "Running agent script: $AGENT_SCRIPT_URL" -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) + # Download script if it's a URL + if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then + echo "Downloading agent script from URL..." + curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py + AGENT_SCRIPT_PATH="/tmp/agent_script.py" + else + AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" + fi -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + # Run with appropriate prompt argument + if [ -n "$PROMPT_LOCATION" ]; then + echo "Using prompt from: $PROMPT_LOCATION" + uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" + else + echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" + uv run python "$AGENT_SCRIPT_PATH" + fi -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) + - name: Upload logs as artifact + uses: actions/upload-artifact@v4 + if: always() + with: + name: openhands-task-logs + path: | + *.log + output/ + retention-days: 7 +``` -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +## Related Files -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +### PR Review +Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +> The reference workflow is available [here](#reference-workflow)! -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: +- Requesting `openhands-agent` as a reviewer +- Adding the `review-this` label to the PR -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` + +The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. + -```bash Running the Example icon="terminal" -LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ - uv run python examples/01_standalone_sdk/34_critic_example.py -``` +## Quick Start -### Example Output +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY +# 3. (Optional) Create a "review-this" label in your repository +# Go to Issues → Labels → New label +# You can also trigger reviews by requesting "openhands-agent" as a reviewer ``` -📁 Created workspace: /tmp/critic_demo_abc123 -====================================================================== -🚀 Starting Iterative Refinement with Critic Model -====================================================================== -Success threshold: 70% -Max iterations: 3 +## Features -... agent works on the task ... +- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR +- **Customizable** - Add your own code review guidelines without forking -✓ Critic evaluation: score=0.758, success=True +## Security -Created files: - - sample.txt - - wordstats/cli.py - - wordstats/stats.py - - wordstats/tests/test_stats.py +- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. +- Maintainers need to read the PR to make sure it's safe to run. -EXAMPLE_COST: 0.0234 -``` +## Customizing the Code Review -## Next Steps +Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. -- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior -- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics -- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns +### How It Works +The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. -# Custom Tools -Source: https://docs.openhands.dev/sdk/guides/custom-tools + +**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Example: Custom Code Review Skill -> The ready-to-run example is available [here](#ready-to-run-example)! +Create `.agents/skills/custom-codereview-guide.md` in your repository: -## Understanding the Tool System +```markdown +--- +name: custom-codereview-guide +description: Project-specific review guidelines for MyProject +triggers: +- /codereview +--- -The SDK's tool system is built around three core components: +# MyProject-Specific Review Guidelines -1. **Action** - Defines input parameters (what the tool accepts) -2. **Observation** - Defines output data (what the tool returns) -3. **Executor** - Implements the tool's logic (what the tool does) +In addition to general code review practices, check for: -These components are tied together by a **ToolDefinition** that registers the tool with the agent. +## Project Conventions -## Built-in Tools +- All API endpoints must have OpenAPI documentation +- Database migrations must be reversible +- Feature flags required for new features -The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. +## Architecture Rules -```python icon="python" wrap -from openhands.tools import BashTool, FileEditorTool -from openhands.tools.preset import get_default_tools +- No direct database access from controllers +- All external API calls must go through the gateway service -# Use specific tools -agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) +## Communication Style -# Or use preset -tools = get_default_tools() -agent = Agent(llm=llm, tools=tools) +- Be direct and constructive +- Use GitHub suggestion syntax for code fixes ``` + +**Note**: These rules supplement the default `code-review` skill, not replace it. + + -See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. +**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. + +If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. -## Creating a Custom Tool + +**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. + -Here's a minimal example of creating a custom grep tool: +### Benefits of Custom Skills - - - ### Define the Action - Defines input parameters (what the tool accepts) +1. **No forking required**: Keep using the official SDK while customizing behavior +2. **Version controlled**: Your review guidelines live in your repository +3. **Easy updates**: SDK updates don't overwrite your customizations +4. **Team alignment**: Everyone uses the same review standards +5. **Composable**: Add project-specific rules alongside default guidelines - ```python icon="python" wrap - class GrepAction(Action): - pattern: str = Field(description="Regex to search for") - path: str = Field( - default=".", - description="Directory to search (absolute or relative)" - ) - include: str | None = Field( - default=None, - description="Optional glob to filter files (e.g. '*.py')" - ) - ``` - - - ### Define the Observation - Defines output data (what the tool returns) + +See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. + - ```python icon="python" wrap - class GrepObservation(Observation): - matches: list[str] = Field(default_factory=list) - files: list[str] = Field(default_factory=list) - count: int = 0 +## Reference Workflow - @property - def to_llm_content(self) -> Sequence[TextContent | ImageContent]: - if not self.count: - return [TextContent(text="No matches found.")] - files_list = "\n".join(f"- {f}" for f in self.files[:20]) - sample = "\n".join(self.matches[:10]) - more = "\n..." if self.count > 10 else "" - ret = ( - f"Found {self.count} matching lines.\n" - f"Files:\n{files_list}\n" - f"Sample:\n{sample}{more}" - ) - return [TextContent(text=ret)] - ``` - - The to_llm_content() property formats observations for the LLM. - - - - ### Define the Executor - Implements the tool’s logic (what the tool does) + +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) + - ```python icon="python" wrap - class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): - def __init__(self, terminal: TerminalExecutor): - self.terminal: TerminalExecutor = terminal +```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml +--- +# OpenHands PR Review Workflow +# +# To set this up: +# 1. Copy this file to .github/workflows/pr-review.yml in your repository +# 2. Add LLM_API_KEY to repository secrets +# 3. Customize the inputs below as needed +# 4. Commit this file to your repository +# 5. Trigger the review by either: +# - Adding the "review-this" label to any PR, OR +# - Requesting openhands-agent as a reviewer +# +# For more information, see: +# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review +name: PR Review by OpenHands - def __call__( - self, - action: GrepAction, - conversation=None, - ) -> GrepObservation: - root = os.path.abspath(action.path) - pat = shlex.quote(action.pattern) - root_q = shlex.quote(root) +on: + # Trigger when a label is added or a reviewer is requested + pull_request: + types: [labeled, review_requested] - # Use grep -r; add --include when provided - if action.include: - inc = shlex.quote(action.include) - cmd = f"grep -rHnE --include {inc} {pat} {root_q}" - else: - cmd = f"grep -rHnE {pat} {root_q}" - cmd += " 2>/dev/null | head -100" - result = self.terminal(TerminalAction(command=cmd)) +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + pr-review: + # Run when review-this label is added OR openhands-agent is requested as reviewer + if: | + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Checkout for composite action + uses: actions/checkout@v4 + with: + repository: OpenHands/software-agent-sdk + # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') + ref: main + sparse-checkout: .github/actions/pr-review + + - name: Run PR Review + uses: ./.github/actions/pr-review + with: + # LLM configuration + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-base-url: '' + # Review style: roasted (other option: standard) + review-style: roasted + # SDK version to use (version tag or branch name) + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - matches: list[str] = [] - files: set[str] = set() +### TODO Management +Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md - # grep returns exit code 1 when no matches; treat as empty - output_text = result.text +> The reference workflow is available [here](#reference-workflow)! - if output_text.strip(): - for line in output_text.strip().splitlines(): - matches.append(line) - # Expect "path:line:content" - # take the file part before first ":" - file_path = line.split(":", 1)[0] - if file_path: - files.add(os.path.abspath(file_path)) - return GrepObservation( - matches=matches, - files=sorted(files), - count=len(matches), - ) - ``` - - - ### Finally, define the tool - ```python icon="python" wrap - class GrepTool(ToolDefinition[GrepAction, GrepObservation]): - """Custom grep tool that searches file contents using regular expressions.""" +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership - @classmethod - def create( - cls, - conv_state, - terminal_executor: TerminalExecutor | None = None - ) -> Sequence[ToolDefinition]: - """Create GrepTool instance with a GrepExecutor. +## Quick Start - Args: - conv_state: Conversation state to get - working directory from. - terminal_executor: Optional terminal executor to reuse. - If not provided, a new one will be created. + + + ```bash icon="terminal" + cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + ``` + + + Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `Settings → Actions → General → Workflow permissions` and enable: + - `Read and write permissions` + - `Allow GitHub Actions to create and approve pull requests` + + + Trigger the agent by adding TODO comments into your code. - Returns: - A sequence containing a single GrepTool instance. - """ - if terminal_executor is None: - terminal_executor = TerminalExecutor( - working_dir=conv_state.workspace.working_dir - ) - grep_executor = GrepExecutor(terminal_executor) + Example: `# TODO(openhands): Add input validation for user email` - return [ - cls( - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, - ) - ] - ``` + + The workflow is configurable and any identifier can be used in place of `TODO(openhands)` + -## Good to know -### Tool Registration -Tools are registered using `register_tool()` and referenced by name: -```python icon="python" wrap -# Register a simple tool class -register_tool("FileEditorTool", FileEditorTool) +## Features -# Register a factory function that creates multiple tools -register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers -# Use registered tools by name -tools = [ - Tool(name="FileEditorTool"), - Tool(name="BashAndGrepToolSet"), -] -``` +## Best Practices -### Factory Functions -Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments +- **Review PRs** - Always review the generated PRs before merging -```python icon="python" wrap -def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: - """Create execute_bash and custom grep tools sharing one executor.""" - bash_executor = BashExecutor( - working_dir=conv_state.workspace.working_dir - ) - # Create and configure tools... - return [bash_tool, grep_tool] -``` +## Reference Workflow -### Shared Executors -Multiple tools can share executors for efficiency and state consistency: + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + -```python icon="python" wrap -bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) -bash_tool = execute_bash_tool.set_executor(executor=bash_executor) +```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml +--- +# Automated TODO Management Workflow +# Make sure to replace and with +# appropriate values for your LLM setup. +# +# This workflow automatically scans for TODO(openhands) comments and creates +# pull requests to implement them using the OpenHands agent. +# +# Setup: +# 1. Add LLM_API_KEY to repository secrets +# 2. Ensure GITHUB_TOKEN has appropriate permissions +# 3. Make sure Github Actions are allowed to create and review PRs +# 4. Commit this file to .github/workflows/ in your repository +# 5. Configure the schedule or trigger manually -grep_executor = GrepExecutor(bash_executor) -grep_tool = ToolDefinition( - name="grep", - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, -) -``` +name: Automated TODO Management -## When to Create Custom Tools +on: + # Manual trigger + workflow_dispatch: + inputs: + max_todos: + description: Maximum number of TODOs to process in this run + required: false + default: '3' + type: string + todo_identifier: + description: TODO identifier to search for (e.g., TODO(openhands)) + required: false + default: TODO(openhands) + type: string -Create custom tools when you need to: -- Combine multiple operations into a single, structured interface -- Add typed parameters with validation -- Format complex outputs for LLM consumption -- Integrate with external APIs or services + # Trigger when 'automatic-todo' label is added to a PR + pull_request: + types: [labeled] -## Ready-to-run Example + # Scheduled trigger (disabled by default, uncomment and customize as needed) + # schedule: + # # Run every Monday at 9 AM UTC + # - cron: "0 9 * * 1" - -This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) - +permissions: + contents: write + pull-requests: write + issues: write -```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py -"""Advanced example showing explicit executor usage and custom grep tool.""" +jobs: + scan-todos: + runs-on: ubuntu-latest + # Only run if triggered manually or if 'automatic-todo' label was added + if: > + github.event_name == 'workflow_dispatch' || + (github.event_name == 'pull_request' && + github.event.label.name == 'automatic-todo') + outputs: + todos: ${{ steps.scan.outputs.todos }} + todo-count: ${{ steps.scan.outputs.todo-count }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for better context -import os -import shlex -from collections.abc import Sequence + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' -from pydantic import Field, SecretStr + - name: Copy TODO scanner + run: | + cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py + chmod +x /tmp/scanner.py -from openhands.sdk import ( - LLM, - Action, - Agent, - Conversation, - Event, - ImageContent, - LLMConvertibleEvent, - Observation, - TextContent, - ToolDefinition, - get_logger, -) -from openhands.sdk.tool import ( - Tool, - ToolExecutor, - register_tool, -) -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import ( - TerminalAction, - TerminalExecutor, - TerminalTool, -) + - name: Scan for TODOs + id: scan + run: | + echo "Scanning for TODO comments..." + # Run the scanner and capture output + TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" + python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json -logger = get_logger(__name__) + # Count TODOs + TODO_COUNT=$(python -c \ + "import json; data=json.load(open('todos.json')); print(len(data))") + echo "Found $TODO_COUNT $TODO_IDENTIFIER items" -# --- Action / Observation --- + # Limit the number of TODOs to process + MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" + if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then + echo "Limiting to first $MAX_TODOS TODOs" + python -c " + import json + data = json.load(open('todos.json')) + limited = data[:$MAX_TODOS] + json.dump(limited, open('todos.json', 'w'), indent=2) + " + TODO_COUNT=$MAX_TODOS + fi + # Set outputs + echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT + echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT -class GrepAction(Action): - pattern: str = Field(description="Regex to search for") - path: str = Field( - default=".", description="Directory to search (absolute or relative)" - ) - include: str | None = Field( - default=None, description="Optional glob to filter files (e.g. '*.py')" - ) + # Display found TODOs + echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY + if [ "$TODO_COUNT" -eq 0 ]; then + echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY + else + echo "Found $TODO_COUNT TODO(openhands) items:" \ + >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + python -c " + import json + data = json.load(open('todos.json')) + for i, todo in enumerate(data, 1): + print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + + f'{todo[\"description\"]}') + " >> $GITHUB_STEP_SUMMARY + fi + process-todos: + needs: scan-todos + if: needs.scan-todos.outputs.todo-count > 0 + runs-on: ubuntu-latest + strategy: + matrix: + todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} + max-parallel: 1 # Process one TODO at a time to avoid conflicts + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + token: ${{ secrets.GITHUB_TOKEN }} -class GrepObservation(Observation): - matches: list[str] = Field(default_factory=list) - files: list[str] = Field(default_factory=list) - count: int = 0 + - name: Switch to feature branch with TODO management files + run: | + git checkout openhands/todo-management-example + git pull origin openhands/todo-management-example - @property - def to_llm_content(self) -> Sequence[TextContent | ImageContent]: - if not self.count: - return [TextContent(text="No matches found.")] - files_list = "\n".join(f"- {f}" for f in self.files[:20]) - sample = "\n".join(self.matches[:10]) - more = "\n..." if self.count > 10 else "" - ret = ( - f"Found {self.count} matching lines.\n" - f"Files:\n{files_list}\n" - f"Sample:\n{sample}{more}" - ) - return [TextContent(text=ret)] + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true -# --- Executor --- + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + - name: Copy agent files + run: | + cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py + cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py + chmod +x agent.py -class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): - def __init__(self, terminal: TerminalExecutor): - self.terminal: TerminalExecutor = terminal + - name: Configure Git + run: | + git config --global user.name "openhands-bot" + git config --global user.email \ + "openhands-bot@users.noreply.github.com" - def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 - root = os.path.abspath(action.path) - pat = shlex.quote(action.pattern) - root_q = shlex.quote(root) + - name: Process TODO + env: + LLM_MODEL: + LLM_BASE_URL: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_REPOSITORY: ${{ github.repository }} + TODO_FILE: ${{ matrix.todo.file }} + TODO_LINE: ${{ matrix.todo.line }} + TODO_DESCRIPTION: ${{ matrix.todo.description }} + PYTHONPATH: '' + run: | + echo "Processing TODO: $TODO_DESCRIPTION" + echo "File: $TODO_FILE:$TODO_LINE" - # Use grep -r; add --include when provided - if action.include: - inc = shlex.quote(action.include) - cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" - else: - cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" + # Create a unique branch name for this TODO + BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ + sed 's/[^a-zA-Z0-9]/-/g' | \ + sed 's/--*/-/g' | \ + sed 's/^-\|-$//g' | \ + tr '[:upper:]' '[:lower:]' | \ + cut -c1-50)" + echo "Branch name: $BRANCH_NAME" - result = self.terminal(TerminalAction(command=cmd)) + # Create and switch to new branch (force create if exists) + git checkout -B "$BRANCH_NAME" - matches: list[str] = [] - files: set[str] = set() + # Run the agent to process the TODO + # Stay in repository directory for git operations - # grep returns exit code 1 when no matches; treat as empty - output_text = result.text + # Create JSON payload for the agent + TODO_JSON=$(cat <&1 | tee agent_output.log + AGENT_EXIT_CODE=$? + set -e -# Tool description -_GREP_DESCRIPTION = """Fast content search tool. -* Searches file contents using regular expressions -* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) -* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") -* Returns matching file paths sorted by modification time. -* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. -* Use this tool when you need to find files containing specific patterns -* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead -""" # noqa: E501 + echo "Agent exit code: $AGENT_EXIT_CODE" + echo "Agent output log:" + cat agent_output.log + # Show files in working directory + echo "Files in working directory:" + ls -la -# --- Tool Definition --- + # If agent failed, show more details + if [ $AGENT_EXIT_CODE -ne 0 ]; then + echo "Agent failed with exit code $AGENT_EXIT_CODE" + echo "Last 50 lines of agent output:" + tail -50 agent_output.log + exit $AGENT_EXIT_CODE + fi + + # Check if any changes were made + cd "$GITHUB_WORKSPACE" + if git diff --quiet; then + echo "No changes made by agent, skipping PR creation" + exit 0 + fi + # Commit changes + git add -A + git commit -m "Implement TODO: $TODO_DESCRIPTION -class GrepTool(ToolDefinition[GrepAction, GrepObservation]): - """A custom grep tool that searches file contents using regular expressions.""" + Automatically implemented by OpenHands agent. - @classmethod - def create( - cls, conv_state, terminal_executor: TerminalExecutor | None = None - ) -> Sequence[ToolDefinition]: - """Create GrepTool instance with a GrepExecutor. + Co-authored-by: openhands " - Args: - conv_state: Conversation state to get working directory from. - terminal_executor: Optional terminal executor to reuse. If not provided, - a new one will be created. + # Push branch + git push origin "$BRANCH_NAME" - Returns: - A sequence containing a single GrepTool instance. - """ - if terminal_executor is None: - terminal_executor = TerminalExecutor( - working_dir=conv_state.workspace.working_dir - ) - grep_executor = GrepExecutor(terminal_executor) + # Create pull request + PR_TITLE="Implement TODO: $TODO_DESCRIPTION" + PR_BODY="## 🤖 Automated TODO Implementation - return [ - cls( - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, - ) - ] + This PR automatically implements the following TODO: + **File:** \`$TODO_FILE:$TODO_LINE\` + **Description:** $TODO_DESCRIPTION -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + ### Implementation + The OpenHands agent has analyzed the TODO and implemented the + requested functionality. -# Tools - demonstrating both simplified and advanced patterns -cwd = os.getcwd() + ### Review Notes + - Please review the implementation for correctness + - Test the changes in your development environment + - The original TODO comment will be updated with this PR URL + once merged + --- + *This PR was created automatically by the TODO Management workflow.*" -def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: - """Create terminal and custom grep tools sharing one executor.""" + # Create PR using GitHub CLI or API + curl -X POST \ + -H "Authorization: token $GITHUB_TOKEN" \ + -H "Accept: application/vnd.github.v3+json" \ + "https://api.github.com/repos/${{ github.repository }}/pulls" \ + -d "{ + \"title\": \"$PR_TITLE\", + \"body\": \"$PR_BODY\", + \"head\": \"$BRANCH_NAME\", + \"base\": \"${{ github.ref_name }}\" + }" - terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) - # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) - terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] + summary: + needs: [scan-todos, process-todos] + if: always() + runs-on: ubuntu-latest + steps: + - name: Generate Summary + run: | + echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY - # Use the GrepTool.create() method with shared terminal_executor - grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] + TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" + echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY - return [terminal_tool, grep_tool] + if [ "$TODO_COUNT" -gt 0 ]; then + echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "Check the pull requests created for each TODO" \ + "implementation." >> $GITHUB_STEP_SUMMARY + else + echo "**Status:** ℹ️ No TODOs found to process" \ + >> $GITHUB_STEP_SUMMARY + fi + echo "" >> $GITHUB_STEP_SUMMARY + echo "---" >> $GITHUB_STEP_SUMMARY + echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY +``` -register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) +## Related Documentation -tools = [ - Tool(name=FileEditorTool.name), - Tool(name="BashAndGrepToolSet"), -] +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) -# Agent -agent = Agent(llm=llm, tools=tools) +### Hello World +Source: https://docs.openhands.dev/sdk/guides/hello-world.md -llm_messages = [] # collect raw LLM messages +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +> A ready-to-run example is available [here](#ready-to-run-example)! -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Your First Agent +This is the most basic example showing how to set up and run an OpenHands agent. -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) + + + ### LLM Configuration -conversation.send_message( - "Hello! Can you use the grep tool to find all files " - "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 - "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 -) -conversation.run() + Configure the language model that will power your agent: + ```python icon="python" + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, # Optional + service_id="agent" + ) + ``` + + + ### Select an Agent + Use the preset agent with common built-in tools: + ```python icon="python" + agent = get_default_agent(llm=llm, cli_mode=True) + ``` + The default agent includes `BashTool`, `FileEditorTool`, etc. + + For the complete list of available tools see the + [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). + -conversation.send_message("Great! Now delete that file.") -conversation.run() + + + ### Start a Conversation + Start a conversation to manage the agent's lifecycle: + ```python icon="python" + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message( + "Write 3 facts about the current project into FACTS.txt." + ) + conversation.run() + ``` + + + ### Expected Behavior + When you run this example: + 1. The agent analyzes the current directory + 2. Gathers information about the project + 3. Creates `FACTS.txt` with 3 relevant facts + 4. Completes and exits -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") + Example output file: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` + ```text icon="text" wrap + FACTS.txt + --------- + 1. This is a Python project using the OpenHands Software Agent SDK. + 2. The project includes examples demonstrating various agent capabilities. + 3. The SDK provides tools for file manipulation, bash execution, and more. + ``` + + - +## Ready-to-run Example -## Next Steps + +This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) + -- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers -- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation +```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py +import os +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -# Assign Reviews -Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews -> The reference workflow is available [here](#reference-workflow)! +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) -Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) -## How it works +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) -It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` -**Core Components:** -- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts -- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent + -**Prompt Options:** -1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) -2. **`PROMPT_LOCATION`** - URL or file path for external prompts +## Next Steps -The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs +- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage -## Assign Reviews Use Case +### Hooks +Source: https://docs.openhands.dev/sdk/guides/hooks.md -This specific implementation uses the basic action template to handle three PR management scenarios: +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -**1. Need Reviewer Action** -- Identifies PRs waiting for review -- Notifies reviewers to take action +> A ready-to-run example is available [here](#ready-to-run-example)! -**2. Need Author Action** -- Finds stale PRs with no activity for 5+ days -- Prompts authors to update, request review, or close +## Overview -**3. Need Reviewers** -- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) -- Uses git blame analysis to identify relevant contributors -- Automatically assigns reviewers based on file ownership and contribution history -- Balances reviewer workload across team members +Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: +- Logging and analytics +- Emitting custom metrics +- Auditing or compliance +- Tracing and debugging -## Quick Start +## Hook Types - - - ```bash icon="terminal" - cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml - ``` - - - Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` - (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). - - - Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". - - - The default is: Daily at 12 PM UTC. - - +| Hook | When it runs | Can block? | +|------|--------------|------------| +| PreToolUse | Before tool execution | Yes (exit 2) | +| PostToolUse | After tool execution | No | +| UserPromptSubmit | Before processing user message | Yes (exit 2) | +| Stop | When agent tries to finish | Yes (exit 2) | +| SessionStart | When conversation starts | No | +| SessionEnd | When conversation ends | No | -## Features +## Key Concepts -- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership -- **Automated Notifications** - Sends contextual reminders to reviewers and authors -- **Workload Balancing** - Distributes review requests evenly across team members -- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch +- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution +- Isolation: hooks run outside the agent loop logic, avoiding core modifications +- Composition: enable or disable hooks per environment (local vs. prod) -## Reference Workflow +## Ready-to-run Example -This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) -```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml ---- -# To set this up: -# 1. Change the name below to something relevant to your task -# 2. Modify the "env" section below with your prompt -# 3. Add your LLM_API_KEY to the repository secrets -# 4. Commit this file to your repository -# 5. Trigger the workflow manually or set up a schedule -name: Assign Reviews +```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py +"""OpenHands Agent SDK — Hooks Example -on: - # Manual trigger - workflow_dispatch: - # Scheduled trigger (disabled by default, uncomment and customize as needed) - schedule: - # Run at 12 PM UTC every day - - cron: 0 12 * * * +Demonstrates the OpenHands hooks system. +Hooks are shell scripts that run at key lifecycle events: -permissions: - contents: write - pull-requests: write - issues: write +- PreToolUse: Block dangerous commands before execution +- PostToolUse: Log tool usage after execution +- UserPromptSubmit: Inject context into user messages +- Stop: Enforce task completion criteria -jobs: - run-task: - runs-on: ubuntu-24.04 - env: - # Configuration (modify these values as needed) - AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py - # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both - # Option 1: Use a URL or file path for the prompt - PROMPT_LOCATION: '' - # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' - # Option 2: Use direct text for the prompt - PROMPT_STRING: > - Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. - Read the sections below in order, and perform each in order. Do NOT take action - on the same issue or PR twice. +The hook scripts are in the scripts/ directory alongside this file. +""" - # Issues with needs-info - Check for OP Response +import os +import signal +import tempfile +from pathlib import Path - Find all open issues that have the "needs-info" label. For each issue: - 1. Identify the original poster (issue author) - 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added - 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline - and look for "labeled" events with the label "needs-info" - 4. If the original poster has commented after the label was added: - - Remove the "needs-info" label - - Add the "needs-triage" label - - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." +from pydantic import SecretStr - # Issues with needs-triage +from openhands.sdk import LLM, Conversation +from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher +from openhands.tools.preset.default import get_default_agent - Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last - activity: - 1. First, check if the issue has already been triaged by verifying it does NOT have: - - The "enhancement" label - - Any "priority" label (priority:low, priority:medium, priority:high, etc.) - 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label - 3. For issues that have NOT been triaged yet: - - Read the issue description and comments - - Determine if it requires maintainer attention by checking: - * Is it a bug report, feature request, or question? - * Does it have enough information to be actionable? - * Has a maintainer already commented? - * Is the last comment older than 4 days? - - If it needs maintainer attention and no maintainer has commented: - * Find an appropriate maintainer based on the issue topic and recent activity - * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have - a chance?" - # Need Reviewer Action +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) - Find all open PRs where: - 1. The PR is waiting for review (there are no open review comments or change requests) - 2. The PR is in a "clean" state (CI passing, no merge conflicts) - 3. The PR is not marked as draft (draft: false) - 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. +SCRIPT_DIR = Path(__file__).parent / "hook_scripts" - In this case, send a message to the reviewers: - [Automatic Post]: This PR seems to be currently waiting for review. - {reviewer_names}, could you please take a look when you have a chance? +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") - # Need Author Action +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) - Find all open PRs where the most recent change or comment was made on the pull - request more than 5 days ago (use 14 days if the PR is marked as draft). +# Create temporary workspace with git repo +with tempfile.TemporaryDirectory() as tmpdir: + workspace = Path(tmpdir) + os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") - And send a message to the author: + log_file = workspace / "tool_usage.log" + summary_file = workspace / "summary.txt" - [Automatic Post]: It has been a while since there was any activity on this PR. - {author}, are you still working on it? If so, please go ahead, if not then - please request review, close it, or request that someone else follow up. + # Configure hooks using the typed approach (recommended) + # This provides better type safety and IDE support + hook_config = HookConfig( + pre_tool_use=[ + HookMatcher( + matcher="terminal", + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "block_dangerous.sh"), + timeout=10, + ) + ], + ) + ], + post_tool_use=[ + HookMatcher( + matcher="*", + hooks=[ + HookDefinition( + command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), + timeout=5, + ) + ], + ) + ], + user_prompt_submit=[ + HookMatcher( + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "inject_git_context.sh"), + ) + ], + ) + ], + stop=[ + HookMatcher( + hooks=[ + HookDefinition( + command=( + f"SUMMARY_FILE={summary_file} " + f"{SCRIPT_DIR / 'require_summary.sh'}" + ), + ) + ], + ) + ], + ) - # Need Reviewers + # Alternative: You can also use .from_dict() for loading from JSON config files + # Example with a single hook matcher: + # hook_config = HookConfig.from_dict({ + # "hooks": { + # "PreToolUse": [{ + # "matcher": "terminal", + # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] + # }] + # } + # }) - Find all open pull requests that: - 1. Have no reviewers assigned to them. - 2. Are not marked as draft. - 3. Were created more than 1 day ago. - 4. CI is passing and there are no merge conflicts. + agent = get_default_agent(llm=llm) + conversation = Conversation( + agent=agent, + workspace=str(workspace), + hook_config=hook_config, + ) - For each of these pull requests, read the git blame information for the files, - and find the most recent and active contributors to the file/location of the changes. - Assign one of these people as a reviewer, but try not to assign too many reviews to - any single person. Add this message: + # Demo 1: Safe command (PostToolUse logs it) + print("=" * 60) + print("Demo 1: Safe command - logged by PostToolUse") + print("=" * 60) + conversation.send_message("Run: echo 'Hello from hooks!'") + conversation.run() - [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. - Thanks in advance for the help! + if log_file.exists(): + print(f"\n[Log: {log_file.read_text().strip()}]") - LLM_MODEL: - LLM_BASE_URL: - steps: - - name: Checkout repository - uses: actions/checkout@v5 + # Demo 2: Dangerous command (PreToolUse blocks it) + print("\n" + "=" * 60) + print("Demo 2: Dangerous command - blocked by PreToolUse") + print("=" * 60) + conversation.send_message("Run: rm -rf /tmp/test") + conversation.run() - - name: Set up Python - uses: actions/setup-python@v6 - with: - python-version: '3.13' + # Demo 3: Context injection + Stop hook enforcement + print("\n" + "=" * 60) + print("Demo 3: Context injection + Stop hook") + print("=" * 60) + print("UserPromptSubmit injects git status; Stop requires summary.txt\n") + conversation.send_message( + "Check what files have changes, then create summary.txt describing the repo." + ) + conversation.run() - - name: Install uv - uses: astral-sh/setup-uv@v7 - with: - enable-cache: true + if summary_file.exists(): + print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") - - name: Install OpenHands dependencies - run: | - # Install OpenHands SDK and tools from git repository - uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" - uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + print("\n" + "=" * 60) + print("Example Complete!") + print("=" * 60) - - name: Check required configuration - env: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - run: | - if [ -z "$LLM_API_KEY" ]; then - echo "Error: LLM_API_KEY secret is not set." - exit 1 - fi + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") +``` + - # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set - if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then - echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." - echo "Please provide only one in the env section of the workflow file." - exit 1 - fi - if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then - echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." - echo "Please set one in the env section of the workflow file." - exit 1 - fi +### Hook Scripts + +The example uses external hook scripts in the `hook_scripts/` directory: + + +```bash +#!/bin/bash +# PreToolUse hook: Block dangerous rm -rf commands +# Uses jq for JSON parsing (needed for nested fields like tool_input.command) + +input=$(cat) +command=$(echo "$input" | jq -r '.tool_input.command // ""') - if [ -n "$PROMPT_LOCATION" ]; then - echo "Prompt location: $PROMPT_LOCATION" - else - echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" - fi - echo "LLM model: $LLM_MODEL" - if [ -n "$LLM_BASE_URL" ]; then - echo "LLM base URL: $LLM_BASE_URL" - fi +# Block rm -rf commands +if [[ "$command" =~ "rm -rf" ]]; then + echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' + exit 2 # Exit code 2 = block the operation +fi - - name: Run task - env: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - PYTHONPATH: '' - run: | - echo "Running agent script: $AGENT_SCRIPT_URL" +exit 0 # Exit code 0 = allow the operation +``` + - # Download script if it's a URL - if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then - echo "Downloading agent script from URL..." - curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py - AGENT_SCRIPT_PATH="/tmp/agent_script.py" - else - AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" - fi + +```bash +#!/bin/bash +# PostToolUse hook: Log all tool usage +# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) - # Run with appropriate prompt argument - if [ -n "$PROMPT_LOCATION" ]; then - echo "Using prompt from: $PROMPT_LOCATION" - uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" - else - echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" - uv run python "$AGENT_SCRIPT_PATH" - fi +# LOG_FILE should be set by the calling script +LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" - - name: Upload logs as artifact - uses: actions/upload-artifact@v4 - if: always() - with: - name: openhands-task-logs - path: | - *.log - output/ - retention-days: 7 +echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" +exit 0 ``` + -## Related Files + +```bash +#!/bin/bash +# UserPromptSubmit hook: Inject git status when user asks about code changes -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) -- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) +input=$(cat) +# Check if user is asking about changes, diff, or git +if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then + # Get git status if in a git repo + if git rev-parse --git-dir > /dev/null 2>&1; then + status=$(git status --short 2>/dev/null | head -10) + if [ -n "$status" ]; then + # Escape for JSON + escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') + echo "{\"additionalContext\": \"Current git status: $escaped\"}" + fi + fi +fi +exit 0 +``` + -# PR Review -Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review + +```bash +#!/bin/bash +# Stop hook: Require a summary.txt file before allowing agent to finish +# SUMMARY_FILE should be set by the calling script -> The reference workflow is available [here](#reference-workflow)! +SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" -Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: -- Requesting `openhands-agent` as a reviewer -- Adding the `review-this` label to the PR +if [ ! -f "$SUMMARY_FILE" ]; then + echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' + exit 2 +fi +exit 0 +``` + - -The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. - -## Quick Start +## Next Steps -```bash -# 1. Copy workflow to your repository -cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml +- See also: [Metrics and Observability](/sdk/guides/metrics) +- Architecture: [Events](/sdk/arch/events) -# 2. Configure secrets in GitHub Settings → Secrets -# Add: LLM_API_KEY +### Iterative Refinement +Source: https://docs.openhands.dev/sdk/guides/iterative-refinement.md -# 3. (Optional) Create a "review-this" label in your repository -# Go to Issues → Labels → New label -# You can also trigger reviews by requesting "openhands-agent" as a reviewer -``` +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -## Features +> The ready-to-run example is available [here](#ready-to-run-example)! -- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes -- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices -- **GitHub Integration** - Posts comments directly to the PR -- **Customizable** - Add your own code review guidelines without forking +## Overview -## Security +Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: +1. A **refactoring agent** performs the main task (e.g., code conversion) +2. A **critique agent** evaluates the quality and provides detailed feedback +3. If quality is below threshold, the refactoring agent tries again with the feedback -- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. -- Maintainers need to read the PR to make sure it's safe to run. +This pattern is useful for: +- Code refactoring and modernization (e.g., COBOL to Java) +- Document translation and localization +- Content generation with quality requirements +- Any task requiring iterative improvement -## Customizing the Code Review +## How It Works -Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. +### The Iteration Loop -### How It Works +The core workflow runs in a loop until quality threshold is met: -The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. +```python icon="python" wrap +QUALITY_THRESHOLD = 90.0 +MAX_ITERATIONS = 5 - -**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. - +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Phase 1: Refactoring agent converts COBOL to Java + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir) + ) + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() -### Example: Custom Code Review Skill + # Phase 2: Critique agent evaluates the conversion + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir) + ) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() -Create `.agents/skills/custom-codereview-guide.md` in your repository: + # Parse score and decide whether to continue + current_score = parse_critique_score(critique_file) -```markdown ---- -name: custom-codereview-guide -description: Project-specific review guidelines for MyProject -triggers: -- /codereview ---- + iteration += 1 +``` -# MyProject-Specific Review Guidelines +### Critique Scoring -In addition to general code review practices, check for: +The critique agent evaluates each file on four dimensions (0-25 pts each): +- **Correctness**: Does the Java code preserve the original business logic? +- **Code Quality**: Is the code clean and following Java conventions? +- **Completeness**: Are all COBOL features properly converted? +- **Best Practices**: Does it use proper OOP, error handling, and documentation? -## Project Conventions +### Feedback Loop -- All API endpoints must have OpenAPI documentation -- Database migrations must be reversible -- Feature flags required for new features +When the score is below threshold, the refactoring agent receives the critique file location: -## Architecture Rules +```python icon="python" wrap +if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" +``` -- No direct database access from controllers -- All external API calls must go through the gateway service +## Customization -## Communication Style +### Adjusting Thresholds -- Be direct and constructive -- Use GitHub suggestion syntax for code fixes +```python icon="python" wrap +QUALITY_THRESHOLD = 95.0 # Require higher quality +MAX_ITERATIONS = 10 # Allow more iterations ``` - -**Note**: These rules supplement the default `code-review` skill, not replace it. - +### Using Real COBOL Files - -**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. +The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). -If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. - +## Ready-to-run Example -**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. +This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) -### Benefits of Custom Skills +```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py +#!/usr/bin/env python3 +""" +Iterative Refinement Example: COBOL to Java Refactoring -1. **No forking required**: Keep using the official SDK while customizing behavior -2. **Version controlled**: Your review guidelines live in your repository -3. **Easy updates**: SDK updates don't overwrite your customizations -4. **Team alignment**: Everyone uses the same review standards -5. **Composable**: Add project-specific rules alongside default guidelines +This example demonstrates an iterative refinement workflow where: +1. A refactoring agent converts COBOL files to Java files +2. A critique agent evaluates the quality of each conversion and provides scores +3. If the average score is below 90%, the process repeats with feedback - -See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. - +The workflow continues until the refactoring meets the quality threshold. -## Reference Workflow +Source COBOL files can be obtained from: +https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl +""" - -This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - +import os +import re +import tempfile +from pathlib import Path -```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml ---- -# OpenHands PR Review Workflow -# -# To set this up: -# 1. Copy this file to .github/workflows/pr-review.yml in your repository -# 2. Add LLM_API_KEY to repository secrets -# 3. Customize the inputs below as needed -# 4. Commit this file to your repository -# 5. Trigger the review by either: -# - Adding the "review-this" label to any PR, OR -# - Requesting openhands-agent as a reviewer -# -# For more information, see: -# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review -name: PR Review by OpenHands +from pydantic import SecretStr -on: - # Trigger when a label is added or a reviewer is requested - pull_request: - types: [labeled, review_requested] +from openhands.sdk import LLM, Conversation +from openhands.tools.preset.default import get_default_agent -permissions: - contents: read - pull-requests: write - issues: write -jobs: - pr-review: - # Run when review-this label is added OR openhands-agent is requested as reviewer - if: | - github.event.label.name == 'review-this' || - github.event.requested_reviewer.login == 'openhands-agent' - runs-on: ubuntu-latest - steps: - - name: Checkout for composite action - uses: actions/checkout@v4 - with: - repository: OpenHands/software-agent-sdk - # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') - ref: main - sparse-checkout: .github/actions/pr-review +QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) - - name: Run PR Review - uses: ./.github/actions/pr-review - with: - # LLM configuration - llm-model: anthropic/claude-sonnet-4-5-20250929 - llm-base-url: '' - # Review style: roasted (other option: standard) - review-style: roasted - # SDK version to use (version tag or branch name) - sdk-version: main - # Secrets - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} -``` -### Action Inputs +def setup_workspace() -> tuple[Path, Path, Path]: + """Create workspace directories for the refactoring workflow.""" + workspace_dir = Path(tempfile.mkdtemp()) + cobol_dir = workspace_dir / "cobol" + java_dir = workspace_dir / "java" + critique_dir = workspace_dir / "critiques" -| Input | Description | Required | Default | -|-------|-------------|----------|---------| -| `llm-model` | LLM model to use | Yes | - | -| `llm-base-url` | LLM base URL (optional) | No | `''` | -| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | -| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | -| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | -| `llm-api-key` | LLM API key | Yes | - | -| `github-token` | GitHub token for API access | Yes | - | + cobol_dir.mkdir(parents=True, exist_ok=True) + java_dir.mkdir(parents=True, exist_ok=True) + critique_dir.mkdir(parents=True, exist_ok=True) -## Related Files + return workspace_dir, cobol_dir, java_dir -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) -- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) -- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) +def create_sample_cobol_files(cobol_dir: Path) -> list[str]: + """Create sample COBOL files for demonstration. -# TODO Management -Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management + In a real scenario, you would clone files from: + https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl + """ + sample_files = { + "CBACT01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBACT01C. + ***************************************************************** + * Program: CBACT01C - Account Display Program + * Purpose: Display account information for a given account number + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-ACCOUNT-ID PIC 9(11). + 01 WS-ACCOUNT-STATUS PIC X(1). + 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. + 01 WS-CUSTOMER-NAME PIC X(50). + 01 WS-ERROR-MSG PIC X(80). -> The reference workflow is available [here](#reference-workflow)! + PROCEDURE DIVISION. + PERFORM 1000-INIT. + PERFORM 2000-PROCESS. + PERFORM 3000-TERMINATE. + STOP RUN. + 1000-INIT. + INITIALIZE WS-ACCOUNT-ID + INITIALIZE WS-ACCOUNT-STATUS + INITIALIZE WS-ACCOUNT-BALANCE + INITIALIZE WS-CUSTOMER-NAME. -Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership + 2000-PROCESS. + DISPLAY "ENTER ACCOUNT NUMBER: " + ACCEPT WS-ACCOUNT-ID + IF WS-ACCOUNT-ID = ZEROS + MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG + DISPLAY WS-ERROR-MSG + ELSE + DISPLAY "ACCOUNT: " WS-ACCOUNT-ID + DISPLAY "STATUS: " WS-ACCOUNT-STATUS + DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE + END-IF. -## Quick Start + 3000-TERMINATE. + DISPLAY "PROGRAM COMPLETE". +""", + "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBCUS01C. + ***************************************************************** + * Program: CBCUS01C - Customer Information Program + * Purpose: Manage customer data operations + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-CUSTOMER-ID PIC 9(9). + 01 WS-FIRST-NAME PIC X(25). + 01 WS-LAST-NAME PIC X(25). + 01 WS-ADDRESS PIC X(100). + 01 WS-PHONE PIC X(15). + 01 WS-EMAIL PIC X(50). + 01 WS-OPERATION PIC X(1). + 88 OP-ADD VALUE 'A'. + 88 OP-UPDATE VALUE 'U'. + 88 OP-DELETE VALUE 'D'. + 88 OP-DISPLAY VALUE 'V'. - - - ```bash icon="terminal" - cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml - ``` - - - Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` - (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). - - - Go to `Settings → Actions → General → Workflow permissions` and enable: - - `Read and write permissions` - - `Allow GitHub Actions to create and approve pull requests` - - - Trigger the agent by adding TODO comments into your code. + PROCEDURE DIVISION. + PERFORM 1000-MAIN-PROCESS. + STOP RUN. + + 1000-MAIN-PROCESS. + DISPLAY "CUSTOMER MANAGEMENT SYSTEM" + DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" + ACCEPT WS-OPERATION + EVALUATE TRUE + WHEN OP-ADD + PERFORM 2000-ADD-CUSTOMER + WHEN OP-UPDATE + PERFORM 3000-UPDATE-CUSTOMER + WHEN OP-DELETE + PERFORM 4000-DELETE-CUSTOMER + WHEN OP-DISPLAY + PERFORM 5000-DISPLAY-CUSTOMER + WHEN OTHER + DISPLAY "INVALID OPERATION" + END-EVALUATE. + + 2000-ADD-CUSTOMER. + DISPLAY "ADDING NEW CUSTOMER" + ACCEPT WS-CUSTOMER-ID + ACCEPT WS-FIRST-NAME + ACCEPT WS-LAST-NAME + DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. + + 3000-UPDATE-CUSTOMER. + DISPLAY "UPDATING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. + + 4000-DELETE-CUSTOMER. + DISPLAY "DELETING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. + + 5000-DISPLAY-CUSTOMER. + DISPLAY "DISPLAYING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "ID: " WS-CUSTOMER-ID + DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. +""", + "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBTRN01C. + ***************************************************************** + * Program: CBTRN01C - Transaction Processing Program + * Purpose: Process financial transactions + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-TRANS-ID PIC 9(16). + 01 WS-TRANS-TYPE PIC X(2). + 88 TRANS-CREDIT VALUE 'CR'. + 88 TRANS-DEBIT VALUE 'DB'. + 88 TRANS-TRANSFER VALUE 'TR'. + 01 WS-TRANS-AMOUNT PIC S9(13)V99. + 01 WS-FROM-ACCOUNT PIC 9(11). + 01 WS-TO-ACCOUNT PIC 9(11). + 01 WS-TRANS-DATE PIC 9(8). + 01 WS-TRANS-STATUS PIC X(10). + + PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE. + PERFORM 2000-PROCESS-TRANSACTION. + PERFORM 3000-FINALIZE. + STOP RUN. + + 1000-INITIALIZE. + MOVE ZEROS TO WS-TRANS-ID + MOVE SPACES TO WS-TRANS-TYPE + MOVE ZEROS TO WS-TRANS-AMOUNT + MOVE "PENDING" TO WS-TRANS-STATUS. + + 2000-PROCESS-TRANSACTION. + DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " + ACCEPT WS-TRANS-TYPE + DISPLAY "ENTER AMOUNT: " + ACCEPT WS-TRANS-AMOUNT + EVALUATE TRUE + WHEN TRANS-CREDIT + PERFORM 2100-PROCESS-CREDIT + WHEN TRANS-DEBIT + PERFORM 2200-PROCESS-DEBIT + WHEN TRANS-TRANSFER + PERFORM 2300-PROCESS-TRANSFER + WHEN OTHER + MOVE "INVALID" TO WS-TRANS-STATUS + END-EVALUATE. - Example: `# TODO(openhands): Add input validation for user email` + 2100-PROCESS-CREDIT. + DISPLAY "PROCESSING CREDIT" + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. - - The workflow is configurable and any identifier can be used in place of `TODO(openhands)` - - - + 2200-PROCESS-DEBIT. + DISPLAY "PROCESSING DEBIT" + ACCEPT WS-FROM-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. + 2300-PROCESS-TRANSFER. + DISPLAY "PROCESSING TRANSFER" + ACCEPT WS-FROM-ACCOUNT + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. -## Features + 3000-FINALIZE. + DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. +""", + } -- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. -- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it -- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers + created_files = [] + for filename, content in sample_files.items(): + file_path = cobol_dir / filename + file_path.write_text(content) + created_files.append(filename) -## Best Practices + return created_files -- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow -- **Clear Descriptions** - Write descriptive TODO comments -- **Review PRs** - Always review the generated PRs before merging -## Reference Workflow +def get_refactoring_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], + critique_file: Path | None = None, +) -> str: + """Generate the prompt for the refactoring agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) - -This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) - + base_prompt = f"""Convert the following COBOL files to Java: -```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml ---- -# Automated TODO Management Workflow -# Make sure to replace and with -# appropriate values for your LLM setup. -# -# This workflow automatically scans for TODO(openhands) comments and creates -# pull requests to implement them using the OpenHands agent. -# -# Setup: -# 1. Add LLM_API_KEY to repository secrets -# 2. Ensure GITHUB_TOKEN has appropriate permissions -# 3. Make sure Github Actions are allowed to create and review PRs -# 4. Commit this file to .github/workflows/ in your repository -# 5. Configure the schedule or trigger manually +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} -name: Automated TODO Management +Files to convert: +{files_list} -on: - # Manual trigger - workflow_dispatch: - inputs: - max_todos: - description: Maximum number of TODOs to process in this run - required: false - default: '3' - type: string - todo_identifier: - description: TODO identifier to search for (e.g., TODO(openhands)) - required: false - default: TODO(openhands) - type: string +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices - # Trigger when 'automatic-todo' label is added to a PR - pull_request: - types: [labeled] +Read each COBOL file and create the corresponding Java file in the target directory. +""" - # Scheduled trigger (disabled by default, uncomment and customize as needed) - # schedule: - # # Run every Monday at 9 AM UTC - # - cron: "0 9 * * 1" + if critique_file and critique_file.exists(): + base_prompt += f""" -permissions: - contents: write - pull-requests: write - issues: write +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" -jobs: - scan-todos: - runs-on: ubuntu-latest - # Only run if triggered manually or if 'automatic-todo' label was added - if: > - github.event_name == 'workflow_dispatch' || - (github.event_name == 'pull_request' && - github.event.label.name == 'automatic-todo') - outputs: - todos: ${{ steps.scan.outputs.todos }} - todo-count: ${{ steps.scan.outputs.todo-count }} - steps: - - name: Checkout repository - uses: actions/checkout@v4 - with: - fetch-depth: 0 # Full history for better context + return base_prompt - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.13' - - name: Copy TODO scanner - run: | - cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py - chmod +x /tmp/scanner.py +def get_critique_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], +) -> str: + """Generate the prompt for the critique agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) - - name: Scan for TODOs - id: scan - run: | - echo "Scanning for TODO comments..." + return f"""Evaluate the quality of COBOL to Java refactoring. - # Run the scanner and capture output - TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" - python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} - # Count TODOs - TODO_COUNT=$(python -c \ - "import json; data=json.load(open('todos.json')); print(len(data))") - echo "Found $TODO_COUNT $TODO_IDENTIFIER items" +Original COBOL files: +{files_list} - # Limit the number of TODOs to process - MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" - if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then - echo "Limiting to first $MAX_TODOS TODOs" - python -c " - import json - data = json.load(open('todos.json')) - limited = data[:$MAX_TODOS] - json.dump(limited, open('todos.json', 'w'), indent=2) - " - TODO_COUNT=$MAX_TODOS - fi +Please evaluate each converted Java file against its original COBOL source. - # Set outputs - echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT - echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT +For each file, assess: +1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) +2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) +3. Completeness: Are all COBOL features properly converted? (0-25 pts) +4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) - # Display found TODOs - echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY - if [ "$TODO_COUNT" -eq 0 ]; then - echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY - else - echo "Found $TODO_COUNT TODO(openhands) items:" \ - >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - python -c " - import json - data = json.load(open('todos.json')) - for i, todo in enumerate(data, 1): - print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + - f'{todo[\"description\"]}') - " >> $GITHUB_STEP_SUMMARY - fi +Create a critique report in the following EXACT format: - process-todos: - needs: scan-todos - if: needs.scan-todos.outputs.todo-count > 0 - runs-on: ubuntu-latest - strategy: - matrix: - todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} - max-parallel: 1 # Process one TODO at a time to avoid conflicts - steps: - - name: Checkout repository - uses: actions/checkout@v4 - with: - fetch-depth: 0 - token: ${{ secrets.GITHUB_TOKEN }} +# COBOL to Java Refactoring Critique Report - - name: Switch to feature branch with TODO management files - run: | - git checkout openhands/todo-management-example - git pull origin openhands/todo-management-example +## Summary +[Brief overall assessment] - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.13' +## File Evaluations - - name: Install uv - uses: astral-sh/setup-uv@v6 - with: - enable-cache: true +### [Original COBOL filename] +- **Java File**: [corresponding Java filename or "NOT FOUND"] +- **Correctness**: [score]/25 - [brief explanation] +- **Code Quality**: [score]/25 - [brief explanation] +- **Completeness**: [score]/25 - [brief explanation] +- **Best Practices**: [score]/25 - [brief explanation] +- **File Score**: [total]/100 +- **Issues to Address**: + - [specific issue 1] + - [specific issue 2] + ... - - name: Install OpenHands dependencies - run: | - # Install OpenHands SDK and tools from git repository - uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" - uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" +[Repeat for each file] - - name: Copy agent files - run: | - cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py - cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py - chmod +x agent.py +## Overall Score +- **Average Score**: [calculated average of all file scores] +- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] - - name: Configure Git - run: | - git config --global user.name "openhands-bot" - git config --global user.email \ - "openhands-bot@users.noreply.github.com" +## Priority Improvements +1. [Most critical improvement needed] +2. [Second priority] +3. [Third priority] - - name: Process TODO - env: - LLM_MODEL: - LLM_BASE_URL: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - GITHUB_REPOSITORY: ${{ github.repository }} - TODO_FILE: ${{ matrix.todo.file }} - TODO_LINE: ${{ matrix.todo.line }} - TODO_DESCRIPTION: ${{ matrix.todo.description }} - PYTHONPATH: '' - run: | - echo "Processing TODO: $TODO_DESCRIPTION" - echo "File: $TODO_FILE:$TODO_LINE" +Save this report to: {java_dir.parent}/critiques/critique_report.md +""" - # Create a unique branch name for this TODO - BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ - sed 's/[^a-zA-Z0-9]/-/g' | \ - sed 's/--*/-/g' | \ - sed 's/^-\|-$//g' | \ - tr '[:upper:]' '[:lower:]' | \ - cut -c1-50)" - echo "Branch name: $BRANCH_NAME" - # Create and switch to new branch (force create if exists) - git checkout -B "$BRANCH_NAME" +def parse_critique_score(critique_file: Path) -> float: + """Parse the average score from the critique report.""" + if not critique_file.exists(): + return 0.0 - # Run the agent to process the TODO - # Stay in repository directory for git operations + content = critique_file.read_text() - # Create JSON payload for the agent - TODO_JSON=$(cat <&1 | tee agent_output.log - AGENT_EXIT_CODE=$? - set -e - echo "Agent exit code: $AGENT_EXIT_CODE" - echo "Agent output log:" - cat agent_output.log +def run_iterative_refinement() -> None: + """Run the iterative refinement workflow.""" + # Setup + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") - # Show files in working directory - echo "Files in working directory:" - ls -la + llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="iterative_refinement", + ) - # If agent failed, show more details - if [ $AGENT_EXIT_CODE -ne 0 ]; then - echo "Agent failed with exit code $AGENT_EXIT_CODE" - echo "Last 50 lines of agent output:" - tail -50 agent_output.log - exit $AGENT_EXIT_CODE - fi + workspace_dir, cobol_dir, java_dir = setup_workspace() + critique_dir = workspace_dir / "critiques" - # Check if any changes were made - cd "$GITHUB_WORKSPACE" - if git diff --quiet; then - echo "No changes made by agent, skipping PR creation" - exit 0 - fi + print(f"Workspace: {workspace_dir}") + print(f"COBOL Directory: {cobol_dir}") + print(f"Java Directory: {java_dir}") + print(f"Critique Directory: {critique_dir}") + print() - # Commit changes - git add -A - git commit -m "Implement TODO: $TODO_DESCRIPTION + # Create sample COBOL files + cobol_files = create_sample_cobol_files(cobol_dir) + print(f"Created {len(cobol_files)} sample COBOL files:") + for f in cobol_files: + print(f" - {f}") + print() - Automatically implemented by OpenHands agent. + critique_file = critique_dir / "critique_report.md" + current_score = 0.0 + iteration = 0 - Co-authored-by: openhands " + while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + iteration += 1 + print("=" * 80) + print(f"ITERATION {iteration}") + print("=" * 80) - # Push branch - git push origin "$BRANCH_NAME" + # Phase 1: Refactoring + print("\n--- Phase 1: Refactoring Agent ---") + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir), + ) - # Create pull request - PR_TITLE="Implement TODO: $TODO_DESCRIPTION" - PR_BODY="## 🤖 Automated TODO Implementation + previous_critique = critique_file if iteration > 1 else None + refactoring_prompt = get_refactoring_prompt( + cobol_dir, java_dir, cobol_files, previous_critique + ) - This PR automatically implements the following TODO: + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + print("Refactoring phase complete.") - **File:** \`$TODO_FILE:$TODO_LINE\` - **Description:** $TODO_DESCRIPTION + # Phase 2: Critique + print("\n--- Phase 2: Critique Agent ---") + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir), + ) - ### Implementation - The OpenHands agent has analyzed the TODO and implemented the - requested functionality. + critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + print("Critique phase complete.") - ### Review Notes - - Please review the implementation for correctness - - Test the changes in your development environment - - The original TODO comment will be updated with this PR URL - once merged + # Parse the score + current_score = parse_critique_score(critique_file) + print(f"\nCurrent Score: {current_score:.1f}%") - --- - *This PR was created automatically by the TODO Management workflow.*" + if current_score >= QUALITY_THRESHOLD: + print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") + else: + print( + f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " + "Continuing refinement..." + ) - # Create PR using GitHub CLI or API - curl -X POST \ - -H "Authorization: token $GITHUB_TOKEN" \ - -H "Accept: application/vnd.github.v3+json" \ - "https://api.github.com/repos/${{ github.repository }}/pulls" \ - -d "{ - \"title\": \"$PR_TITLE\", - \"body\": \"$PR_BODY\", - \"head\": \"$BRANCH_NAME\", - \"base\": \"${{ github.ref_name }}\" - }" + # Final summary + print("\n" + "=" * 80) + print("ITERATIVE REFINEMENT COMPLETE") + print("=" * 80) + print(f"Total iterations: {iteration}") + print(f"Final score: {current_score:.1f}%") + print(f"Workspace: {workspace_dir}") - summary: - needs: [scan-todos, process-todos] - if: always() - runs-on: ubuntu-latest - steps: - - name: Generate Summary - run: | - echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY + # List created Java files + print("\nCreated Java files:") + for java_file in java_dir.glob("*.java"): + print(f" - {java_file.name}") - TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" - echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY + # Show critique file location + if critique_file.exists(): + print(f"\nFinal critique report: {critique_file}") - if [ "$TODO_COUNT" -gt 0 ]; then - echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - echo "Check the pull requests created for each TODO" \ - "implementation." >> $GITHUB_STEP_SUMMARY - else - echo "**Status:** ℹ️ No TODOs found to process" \ - >> $GITHUB_STEP_SUMMARY - fi + # Report cost + cost = llm.metrics.accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") - echo "" >> $GITHUB_STEP_SUMMARY - echo "---" >> $GITHUB_STEP_SUMMARY - echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY + +if __name__ == "__main__": + run_iterative_refinement() ``` -## Related Documentation + -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) -- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) -- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) +## Next Steps +- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents +- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow -# Hello World -Source: https://docs.openhands.dev/sdk/guides/hello-world +### Exception Handling +Source: https://docs.openhands.dev/sdk/guides/llm-error-handling.md -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. -> A ready-to-run example is available [here](#ready-to-run-example)! +This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. -## Your First Agent +## Why typed exceptions? -This is the most basic example showing how to set up and run an OpenHands agent. +LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: - - - ### LLM Configuration +- One code path to handle auth, rate limits, timeouts, service issues, and bad requests +- Clear behavior when conversation history exceeds the context window +- Backward compatibility when you switch providers or SDK versions - Configure the language model that will power your agent: - ```python icon="python" - llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, # Optional - service_id="agent" - ) - ``` - - - ### Select an Agent - Use the preset agent with common built-in tools: - ```python icon="python" - agent = get_default_agent(llm=llm, cli_mode=True) - ``` - The default agent includes `BashTool`, `FileEditorTool`, etc. - - For the complete list of available tools see the - [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). - +## Quick start: Using agents and conversations - - - ### Start a Conversation - Start a conversation to manage the agent's lifecycle: - ```python icon="python" - conversation = Conversation(agent=agent, workspace=cwd) +Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import Agent, Conversation, LLM +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +agent = Agent(llm=llm, tools=[]) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) + +try: conversation.send_message( - "Write 3 facts about the current project into FACTS.txt." + "Continue the long analysis we started earlier…" ) conversation.run() - ``` - - - ### Expected Behavior - When you run this example: - 1. The agent analyzes the current directory - 2. Gathers information about the project - 3. Creates `FACTS.txt` with 3 relevant facts - 4. Completes and exits - Example output file: +except LLMContextWindowExceedError: + # Conversation is longer than the model’s context window + # Options: + # 1) Enable a condenser (recommended for long sessions) + # 2) Shorten inputs or reset conversation + print("Hit the context limit. Consider enabling a condenser.") - ```text icon="text" wrap - FACTS.txt - --------- - 1. This is a Python project using the OpenHands Software Agent SDK. - 2. The project includes examples demonstrating various agent capabilities. - 3. The SDK provides tools for file manipulation, bash execution, and more. - ``` - - +except LLMAuthenticationError: + print( + "Invalid or missing API credentials." + "Check your API key or auth setup." + ) -## Ready-to-run Example +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") - -This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) - +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") -```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py -import os +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") + +except LLMError as e: + # Fallback for other SDK LLM errors (parsing/validation, etc.) + print(f"Unhandled LLM error: {e}") +``` -llm = LLM( - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - api_key=os.getenv("LLM_API_KEY"), - base_url=os.getenv("LLM_BASE_URL", None), + +### Avoiding context‑window errors with a condenser + +If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. + +```python icon="python" focus={5-6, 9-14} wrap +from openhands.sdk.context.condenser import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), + max_size=10, + keep_first=2, ) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], +agent = Agent(llm=llm, tools=[], condenser=condenser) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", ) +``` -cwd = os.getcwd() -conversation = Conversation(agent=agent, workspace=cwd) + + See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). + -conversation.send_message("Write 3 facts about the current project into FACTS.txt.") -conversation.run() -print("All done!") -``` +## Handling errors with direct LLM calls - +The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. -## Next Steps +### Example: Using `.completion()` -- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs -- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers -- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) -# Hooks -Source: https://docs.openhands.dev/sdk/guides/hooks +try: + response = llm.completion([ + Message.user([TextContent(text="Summarize our design doc")]) + ]) + print(response.message) -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMAuthenticationError: + print("Invalid or missing API credentials.") +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") +except LLMError as e: + print(f"Unhandled LLM error: {e}") +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +### Example: Using `.responses()` -## Overview +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError -Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: -- Logging and analytics -- Emitting custom metrics -- Auditing or compliance -- Tracing and debugging +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) -## Hook Types +try: + resp = llm.responses([ + Message.user( + [TextContent(text="Write a one-line haiku about code.")] + ) + ]) + print(resp.message) +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMError as e: + print(f"LLM error: {e}") +``` -| Hook | When it runs | Can block? | -|------|--------------|------------| -| PreToolUse | Before tool execution | Yes (exit 2) | -| PostToolUse | After tool execution | No | -| UserPromptSubmit | Before processing user message | Yes (exit 2) | -| Stop | When agent tries to finish | Yes (exit 2) | -| SessionStart | When conversation starts | No | -| SessionEnd | When conversation ends | No | +## Exception reference -## Key Concepts +All exceptions live under `openhands.sdk.llm.exceptions` unless noted. -- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution -- Isolation: hooks run outside the agent loop logic, avoiding core modifications -- Composition: enable or disable hooks per environment (local vs. prod) +| Category | Error | Description | +|--------|------|-------------| +| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | +| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | +| | `LLMRateLimitError` | Provider rate limit exceeded. | +| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | +| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | +| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | +| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | +| | `LLMNoActionError` | Model did not return an action when one was expected. | +| | `LLMResponseError` | Could not extract an action from the response. | +| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | +| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | +| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | +| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | +| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | +| | `OperationCancelled` | A running operation was cancelled programmatically. | -## Ready-to-run Example + + All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all + for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. + - -This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) - +### LLM Fallback Strategy +Source: https://docs.openhands.dev/sdk/guides/llm-fallback.md -```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py -"""OpenHands Agent SDK — Hooks Example +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Demonstrates the OpenHands hooks system. -Hooks are shell scripts that run at key lifecycle events: +> A ready-to-run example is available [here](#ready-to-run-example)! -- PreToolUse: Block dangerous commands before execution -- PostToolUse: Log tool usage after execution -- UserPromptSubmit: Inject context into user messages -- Stop: Enforce task completion criteria +`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. -The hook scripts are in the scripts/ directory alongside this file. -""" +## Basic Usage -import os -import signal -import tempfile -from pathlib import Path +Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): +```python icon="python" wrap focus={16, 17, 21, 22, 23} from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk.llm import FallbackStrategy -from openhands.sdk import LLM, Conversation -from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher -from openhands.tools.preset.default import get_default_agent +# Menage persisted LLM profiles +# default store directory: .openhands/profiles +store = LLMProfileStore() +fallback_llm = LLM( + usage_id="fallback-1", + model="openai/gpt-4o", + api_key=SecretStr("your-openai-key"), +) +store.save("fallback-1", fallback_llm, include_secrets=True) -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) +# Configure an LLM with a fallback strategy +primary_llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1"], + ), +) +``` -SCRIPT_DIR = Path(__file__).parent / "hook_scripts" +## How It Works -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +1. The primary LLM handles the request as normal +2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order +3. The first successful fallback response is returned to the caller +4. If all fallbacks fail, the original primary error is raised +5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + + +Only transient errors trigger fallback. +Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. +For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) + + +## Multiple Fallback Levels + +Chain as many fallback LLMs as you need. They are tried in list order: +```python icon="python" wrap focus={5-7} llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", api_key=SecretStr(api_key), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + ), ) +``` -# Create temporary workspace with git repo -with tempfile.TemporaryDirectory() as tmpdir: - workspace = Path(tmpdir) - os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") +If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. - log_file = workspace / "tool_usage.log" - summary_file = workspace / "summary.txt" +## Custom Profile Store Directory - # Configure hooks using the typed approach (recommended) - # This provides better type safety and IDE support - hook_config = HookConfig( - pre_tool_use=[ - HookMatcher( - matcher="terminal", - hooks=[ - HookDefinition( - command=str(SCRIPT_DIR / "block_dangerous.sh"), - timeout=10, - ) - ], - ) - ], - post_tool_use=[ - HookMatcher( - matcher="*", - hooks=[ - HookDefinition( - command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), - timeout=5, - ) - ], - ) - ], - user_prompt_submit=[ - HookMatcher( - hooks=[ - HookDefinition( - command=str(SCRIPT_DIR / "inject_git_context.sh"), - ) - ], - ) - ], - stop=[ - HookMatcher( - hooks=[ - HookDefinition( - command=( - f"SUMMARY_FILE={summary_file} " - f"{SCRIPT_DIR / 'require_summary.sh'}" - ), - ) - ], - ) - ], - ) +By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: - # Alternative: You can also use .from_dict() for loading from JSON config files - # Example with a single hook matcher: - # hook_config = HookConfig.from_dict({ - # "hooks": { - # "PreToolUse": [{ - # "matcher": "terminal", - # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] - # }] - # } - # }) +```python icon="python" wrap focus={3} +FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir="/path/to/my/profiles", +) +``` - agent = get_default_agent(llm=llm) - conversation = Conversation( - agent=agent, - workspace=str(workspace), - hook_config=hook_config, - ) +## Metrics - # Demo 1: Safe command (PostToolUse logs it) - print("=" * 60) - print("Demo 1: Safe command - logged by PostToolUse") - print("=" * 60) - conversation.send_message("Run: echo 'Hello from hooks!'") - conversation.run() +Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: - if log_file.exists(): - print(f"\n[Log: {log_file.read_text().strip()}]") +```python icon="python" wrap +# After running a conversation +metrics = llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") - # Demo 2: Dangerous command (PreToolUse blocks it) - print("\n" + "=" * 60) - print("Demo 2: Dangerous command - blocked by PreToolUse") - print("=" * 60) - conversation.send_message("Run: rm -rf /tmp/test") - conversation.run() +for usage in metrics.token_usages: + print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +``` - # Demo 3: Context injection + Stop hook enforcement - print("\n" + "=" * 60) - print("Demo 3: Context injection + Stop hook") - print("=" * 60) - print("UserPromptSubmit injects git status; Stop requires summary.txt\n") - conversation.send_message( - "Check what files have changes, then create summary.txt describing the repo." - ) - conversation.run() +Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. - if summary_file.exists(): - print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") +## Use Cases - print("\n" + "=" * 60) - print("Example Complete!") - print("=" * 60) +- **Rate limit handling** — When one provider throttles you, seamlessly switch to another +- **High availability** — Keep your agent running during provider outages +- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure +- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"\nEXAMPLE_COST: {cost}") -``` - +## Ready-to-run Example + +This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) + + +```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py +"""Example: Using FallbackStrategy for LLM resilience. + +When the primary LLM fails with a transient error (rate limit, timeout, etc.), +FallbackStrategy automatically tries alternate LLMs in order. Fallback is +per-call: each new request starts with the primary model. Token usage and +cost from fallback calls are merged into the primary LLM's metrics. + +This example: + 1. Saves two fallback LLM profiles to a temporary store. + 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. + 3. Runs a conversation — if the primary model is unavailable, the agent + transparently falls back to the next available model. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool +from openhands.sdk.llm import FallbackStrategy +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Read configuration from environment +api_key = os.getenv("LLM_API_KEY", None) +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -### Hook Scripts +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +profile_store_dir = tempfile.mkdtemp() +store = LLMProfileStore(base_dir=profile_store_dir) -The example uses external hook scripts in the `hook_scripts/` directory: +fallback_1 = LLM( + usage_id="fallback-1", + model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +) +store.save("fallback-1", fallback_1, include_secrets=True) - -```bash -#!/bin/bash -# PreToolUse hook: Block dangerous rm -rf commands -# Uses jq for JSON parsing (needed for nested fields like tool_input.command) +fallback_2 = LLM( + usage_id="fallback-2", + model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), +) +store.save("fallback-2", fallback_2, include_secrets=True) -input=$(cat) -command=$(echo "$input" | jq -r '.tool_input.command // ""') +print(f"Saved fallback profiles: {store.list()}") -# Block rm -rf commands -if [[ "$command" =~ "rm -rf" ]]; then - echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' - exit 2 # Exit code 2 = block the operation -fi -exit 0 # Exit code 0 = allow the operation -``` - +# Configure the primary LLM with a FallbackStrategy +primary_llm = LLM( + usage_id="agent-primary", + model=primary_model, + api_key=SecretStr(api_key), + base_url=base_url, + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir=profile_store_dir, + ), +) - -```bash -#!/bin/bash -# PostToolUse hook: Log all tool usage -# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) -# LOG_FILE should be set by the calling script -LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" +# Run a conversation +agent = Agent( + llm=primary_llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) -echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" -exit 0 -``` - +conversation = Conversation(agent=agent, workspace=os.getcwd()) +conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.run() - -```bash -#!/bin/bash -# UserPromptSubmit hook: Inject git status when user asks about code changes -input=$(cat) +# Inspect metrics (includes any fallback usage) +metrics = primary_llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +print(f"Token usage records: {len(metrics.token_usages)}") +for usage in metrics.token_usages: + print( + f" model={usage.model}" + f" prompt={usage.prompt_tokens}" + f" completion={usage.completion_tokens}" + ) -# Check if user is asking about changes, diff, or git -if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then - # Get git status if in a git repo - if git rev-parse --git-dir > /dev/null 2>&1; then - status=$(git status --short 2>/dev/null | head -10) - if [ -n "$status" ]; then - # Escape for JSON - escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') - echo "{\"additionalContext\": \"Current git status: $escaped\"}" - fi - fi -fi -exit 0 +print(f"EXAMPLE_COST: {metrics.accumulated_cost}") ``` - - -```bash -#!/bin/bash -# Stop hook: Require a summary.txt file before allowing agent to finish -# SUMMARY_FILE should be set by the calling script + -SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" +## Next Steps -if [ ! -f "$SUMMARY_FILE" ]; then - echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' - exit 2 -fi -exit 0 -``` - +- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles +- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) +- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models +### Image Input +Source: https://docs.openhands.dev/sdk/guides/llm-image-input.md -## Next Steps +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -- See also: [Metrics and Observability](/sdk/guides/metrics) -- Architecture: [Events](/sdk/arch/events) +> A ready-to-run example is available [here](#ready-to-run-example)! -# Iterative Refinement -Source: https://docs.openhands.dev/sdk/guides/iterative-refinement +### Sending Images -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). -> The ready-to-run example is available [here](#ready-to-run-example)! +Pass images along with text in the message content: -## Overview +```python focus={14} icon="python" wrap +from openhands.sdk import ImageContent -Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: -1. A **refactoring agent** performs the main task (e.g., code conversion) -2. A **critique agent** evaluates the quality and provides detailed feedback -3. If quality is below threshold, the refactoring agent tries again with the feedback +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +``` -This pattern is useful for: -- Code refactoring and modernization (e.g., COBOL to Java) -- Document translation and localization -- Content generation with quality requirements -- Any task requiring iterative improvement +Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. -## How It Works +## Ready-to-run Example -### The Iteration Loop + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + -The core workflow runs in a loop until quality threshold is met: +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: -```python icon="python" wrap -QUALITY_THRESHOLD = 90.0 -MAX_ITERATIONS = 5 +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. -while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - # Phase 1: Refactoring agent converts COBOL to Java - refactoring_agent = get_default_agent(llm=llm, cli_mode=True) - refactoring_conversation = Conversation( - agent=refactoring_agent, - workspace=str(workspace_dir) - ) - refactoring_conversation.send_message(refactoring_prompt) - refactoring_conversation.run() +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" - # Phase 2: Critique agent evaluates the conversion - critique_agent = get_default_agent(llm=llm, cli_mode=True) - critique_conversation = Conversation( - agent=critique_agent, - workspace=str(workspace_dir) - ) - critique_conversation.send_message(critique_prompt) - critique_conversation.run() +import os - # Parse score and decide whether to continue - current_score = parse_critique_score(critique_file) +from pydantic import SecretStr - iteration += 1 -``` +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -### Critique Scoring -The critique agent evaluates each file on four dimensions (0-25 pts each): -- **Correctness**: Does the Java code preserve the original business logic? -- **Code Quality**: Is the code clean and following Java conventions? -- **Completeness**: Are all COBOL features properly converted? -- **Best Practices**: Does it use proper OOP, error handling, and documentation? +logger = get_logger(__name__) -### Feedback Loop +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." -When the score is below threshold, the refactoring agent receives the critique file location: +cwd = os.getcwd() -```python icon="python" wrap -if critique_file and critique_file.exists(): - base_prompt += f""" -IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. -Please review the critique at: {critique_file} -Address all issues mentioned in the critique to improve the conversion quality. -""" -``` +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) -## Customization +llm_messages = [] # collect raw LLM messages for inspection -### Adjusting Thresholds -```python icon="python" wrap -QUALITY_THRESHOLD = 95.0 # Require higher quality -MAX_ITERATIONS = 10 # Allow more iterations -``` +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -### Using Real COBOL Files -The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -## Ready-to-run Example +IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" - -This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) - +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() -```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py -#!/usr/bin/env python3 -""" -Iterative Refinement Example: COBOL to Java Refactoring +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() -This example demonstrates an iterative refinement workflow where: -1. A refactoring agent converts COBOL files to Java files -2. A critique agent evaluates the quality of each conversion and provides scores -3. If the average score is below 90%, the process repeats with feedback +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -The workflow continues until the refactoring meets the quality threshold. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -Source COBOL files can be obtained from: -https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl -""" + -import os -import re -import tempfile -from pathlib import Path +## Next Steps -from pydantic import SecretStr +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently -from openhands.sdk import LLM, Conversation -from openhands.tools.preset.default import get_default_agent +### LLM Profile Store +Source: https://docs.openhands.dev/sdk/guides/llm-profile-store.md +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) -MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) +> A ready-to-run example is available [here](#ready-to-run-example)! +The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. +Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. -def setup_workspace() -> tuple[Path, Path, Path]: - """Create workspace directories for the refactoring workflow.""" - workspace_dir = Path(tempfile.mkdtemp()) - cobol_dir = workspace_dir / "cobol" - java_dir = workspace_dir / "java" - critique_dir = workspace_dir / "critiques" +## Benefits +- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. +- **Reusability:** Import a defined profile into any script or session with a single identifier. +- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. - cobol_dir.mkdir(parents=True, exist_ok=True) - java_dir.mkdir(parents=True, exist_ok=True) - critique_dir.mkdir(parents=True, exist_ok=True) +## How It Works - return workspace_dir, cobol_dir, java_dir + + + ### Create a Store + The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, + but you can point it anywhere. -def create_sample_cobol_files(cobol_dir: Path) -> list[str]: - """Create sample COBOL files for demonstration. + ```python icon="python" focus={3, 4, 6, 7} + from openhands.sdk import LLMProfileStore - In a real scenario, you would clone files from: - https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl - """ - sample_files = { - "CBACT01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBACT01C. - ***************************************************************** - * Program: CBACT01C - Account Display Program - * Purpose: Display account information for a given account number - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-ACCOUNT-ID PIC 9(11). - 01 WS-ACCOUNT-STATUS PIC X(1). - 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. - 01 WS-CUSTOMER-NAME PIC X(50). - 01 WS-ERROR-MSG PIC X(80). + # Default location: ~/.openhands/profiles + store = LLMProfileStore() - PROCEDURE DIVISION. - PERFORM 1000-INIT. - PERFORM 2000-PROCESS. - PERFORM 3000-TERMINATE. - STOP RUN. + # Or bring your own directory + store = LLMProfileStore(base_dir="./my-profiles") + ``` + + + ### Save a Profile - 1000-INIT. - INITIALIZE WS-ACCOUNT-ID - INITIALIZE WS-ACCOUNT-STATUS - INITIALIZE WS-ACCOUNT-BALANCE - INITIALIZE WS-CUSTOMER-NAME. + Got an LLM configured just right? Save it for later. - 2000-PROCESS. - DISPLAY "ENTER ACCOUNT NUMBER: " - ACCEPT WS-ACCOUNT-ID - IF WS-ACCOUNT-ID = ZEROS - MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG - DISPLAY WS-ERROR-MSG - ELSE - DISPLAY "ACCOUNT: " WS-ACCOUNT-ID - DISPLAY "STATUS: " WS-ACCOUNT-STATUS - DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE - END-IF. + ```python icon="python" focus={11, 12} + from pydantic import SecretStr + from openhands.sdk import LLM, LLMProfileStore - 3000-TERMINATE. - DISPLAY "PROGRAM COMPLETE". -""", - "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBCUS01C. - ***************************************************************** - * Program: CBCUS01C - Customer Information Program - * Purpose: Manage customer data operations - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-CUSTOMER-ID PIC 9(9). - 01 WS-FIRST-NAME PIC X(25). - 01 WS-LAST-NAME PIC X(25). - 01 WS-ADDRESS PIC X(100). - 01 WS-PHONE PIC X(15). - 01 WS-EMAIL PIC X(50). - 01 WS-OPERATION PIC X(1). - 88 OP-ADD VALUE 'A'. - 88 OP-UPDATE VALUE 'U'. - 88 OP-DELETE VALUE 'D'. - 88 OP-DISPLAY VALUE 'V'. + fast_llm = LLM( + usage_id="fast", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("sk-..."), + temperature=0.0, + ) - PROCEDURE DIVISION. - PERFORM 1000-MAIN-PROCESS. - STOP RUN. + store = LLMProfileStore() + store.save("fast", fast_llm) + ``` - 1000-MAIN-PROCESS. - DISPLAY "CUSTOMER MANAGEMENT SYSTEM" - DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" - ACCEPT WS-OPERATION - EVALUATE TRUE - WHEN OP-ADD - PERFORM 2000-ADD-CUSTOMER - WHEN OP-UPDATE - PERFORM 3000-UPDATE-CUSTOMER - WHEN OP-DELETE - PERFORM 4000-DELETE-CUSTOMER - WHEN OP-DISPLAY - PERFORM 5000-DISPLAY-CUSTOMER - WHEN OTHER - DISPLAY "INVALID OPERATION" - END-EVALUATE. + + API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to + persist them; otherwise, they will be read from the environment at load time. + + + + ### Load a Profile - 2000-ADD-CUSTOMER. - DISPLAY "ADDING NEW CUSTOMER" - ACCEPT WS-CUSTOMER-ID - ACCEPT WS-FIRST-NAME - ACCEPT WS-LAST-NAME - DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. + Next time you need that LLM, just load it: - 3000-UPDATE-CUSTOMER. - DISPLAY "UPDATING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. + ```python icon="python" + # Same model, ready to go. + llm = store.load("fast") + ``` + + + ### List and Clean Up - 4000-DELETE-CUSTOMER. - DISPLAY "DELETING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. + See what you've got, delete what you don't need: - 5000-DISPLAY-CUSTOMER. - DISPLAY "DISPLAYING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "ID: " WS-CUSTOMER-ID - DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. -""", - "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBTRN01C. - ***************************************************************** - * Program: CBTRN01C - Transaction Processing Program - * Purpose: Process financial transactions - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-TRANS-ID PIC 9(16). - 01 WS-TRANS-TYPE PIC X(2). - 88 TRANS-CREDIT VALUE 'CR'. - 88 TRANS-DEBIT VALUE 'DB'. - 88 TRANS-TRANSFER VALUE 'TR'. - 01 WS-TRANS-AMOUNT PIC S9(13)V99. - 01 WS-FROM-ACCOUNT PIC 9(11). - 01 WS-TO-ACCOUNT PIC 9(11). - 01 WS-TRANS-DATE PIC 9(8). - 01 WS-TRANS-STATUS PIC X(10). + ```python icon="python" focus={1, 3, 4} + print(store.list()) # ['fast.json', 'creative.json'] - PROCEDURE DIVISION. - PERFORM 1000-INITIALIZE. - PERFORM 2000-PROCESS-TRANSACTION. - PERFORM 3000-FINALIZE. - STOP RUN. + store.delete("creative") + print(store.list()) # ['fast.json'] + ``` + + - 1000-INITIALIZE. - MOVE ZEROS TO WS-TRANS-ID - MOVE SPACES TO WS-TRANS-TYPE - MOVE ZEROS TO WS-TRANS-AMOUNT - MOVE "PENDING" TO WS-TRANS-STATUS. +## Good to Know - 2000-PROCESS-TRANSACTION. - DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " - ACCEPT WS-TRANS-TYPE - DISPLAY "ENTER AMOUNT: " - ACCEPT WS-TRANS-AMOUNT - EVALUATE TRUE - WHEN TRANS-CREDIT - PERFORM 2100-PROCESS-CREDIT - WHEN TRANS-DEBIT - PERFORM 2200-PROCESS-DEBIT - WHEN TRANS-TRANSFER - PERFORM 2300-PROCESS-TRANSFER - WHEN OTHER - MOVE "INVALID" TO WS-TRANS-STATUS - END-EVALUATE. +Profile names must be simple filenames (no slashes, no dots at the start). - 2100-PROCESS-CREDIT. - DISPLAY "PROCESSING CREDIT" - ACCEPT WS-TO-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. +## Ready-to-run Example - 2200-PROCESS-DEBIT. - DISPLAY "PROCESSING DEBIT" - ACCEPT WS-FROM-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. + +This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) + - 2300-PROCESS-TRANSFER. - DISPLAY "PROCESSING TRANSFER" - ACCEPT WS-FROM-ACCOUNT - ACCEPT WS-TO-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. +```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py +"""Example: Using LLMProfileStore to save and reuse LLM configurations. - 3000-FINALIZE. - DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. -""", - } +LLMProfileStore persists LLM configurations as JSON files, so you can define +a profile once and reload it across sessions without repeating setup code. +""" - created_files = [] - for filename, content in sample_files.items(): - file_path = cobol_dir / filename - file_path.write_text(content) - created_files.append(filename) +import os +import tempfile - return created_files +from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore -def get_refactoring_prompt( - cobol_dir: Path, - java_dir: Path, - cobol_files: list[str], - critique_file: Path | None = None, -) -> str: - """Generate the prompt for the refactoring agent.""" - files_list = "\n".join(f" - {f}" for f in cobol_files) - base_prompt = f"""Convert the following COBOL files to Java: +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +store = LLMProfileStore(base_dir=tempfile.mkdtemp()) -COBOL Source Directory: {cobol_dir} -Java Target Directory: {java_dir} -Files to convert: -{files_list} +# 1. Create two LLM profiles with different usage -Requirements: -1. Create a Java class for each COBOL program -2. Preserve the business logic and data structures -3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) -4. Convert COBOL data types to appropriate Java types -5. Implement proper error handling with try-catch blocks -6. Add JavaDoc comments explaining the purpose of each class and method -7. In JavaDoc comments, include traceability to the original COBOL source using - the format: @source : (e.g., @source CBACT01C.cbl:73-77) -8. Create a clean, maintainable object-oriented design -9. Each Java file should be compilable and follow Java best practices +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -Read each COBOL file and create the corresponding Java file in the target directory. -""" +fast_llm = LLM( + usage_id="fast", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.0, +) - if critique_file and critique_file.exists(): - base_prompt += f""" +creative_llm = LLM( + usage_id="creative", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.9, +) -IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. -Please review the critique at: {critique_file} -Address all issues mentioned in the critique to improve the conversion quality. -""" +# 2. Save profiles - return base_prompt +# Note that secrets are excluded by default for safety. +store.save("fast", fast_llm) +store.save("creative", creative_llm) +# To persist the API key as well, pass `include_secrets=True`: +# store.save("fast", fast_llm, include_secrets=True) -def get_critique_prompt( - cobol_dir: Path, - java_dir: Path, - cobol_files: list[str], -) -> str: - """Generate the prompt for the critique agent.""" - files_list = "\n".join(f" - {f}" for f in cobol_files) +# 3. List available persisted profiles - return f"""Evaluate the quality of COBOL to Java refactoring. +print(f"Stored profiles: {store.list()}") -COBOL Source Directory: {cobol_dir} -Java Target Directory: {java_dir} +# 4. Load a profile -Original COBOL files: -{files_list} +loaded = store.load("fast") +assert isinstance(loaded, LLM) +print( + "Loaded profile. " + f"usage:{loaded.usage_id}, " + f"model: {loaded.model}, " + f"temperature: {loaded.temperature}." +) -Please evaluate each converted Java file against its original COBOL source. +# 5. Delete a profile -For each file, assess: -1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) -2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) -3. Completeness: Are all COBOL features properly converted? (0-25 pts) -4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) +store.delete("creative") +print(f"After deletion: {store.list()}") -Create a critique report in the following EXACT format: +print("EXAMPLE_COST: 0") +``` -# COBOL to Java Refactoring Critique Report + -## Summary -[Brief overall assessment] +## Next Steps -## File Evaluations +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully -### [Original COBOL filename] -- **Java File**: [corresponding Java filename or "NOT FOUND"] -- **Correctness**: [score]/25 - [brief explanation] -- **Code Quality**: [score]/25 - [brief explanation] -- **Completeness**: [score]/25 - [brief explanation] -- **Best Practices**: [score]/25 - [brief explanation] -- **File Score**: [total]/100 -- **Issues to Address**: - - [specific issue 1] - - [specific issue 2] - ... +### Reasoning +Source: https://docs.openhands.dev/sdk/guides/llm-reasoning.md -[Repeat for each file] +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -## Overall Score -- **Average Score**: [calculated average of all file scores] -- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. -## Priority Improvements -1. [Most critical improvement needed] -2. [Second priority] -3. [Third priority] +This guide demonstrates two provider-specific approaches: +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter -Save this report to: {java_dir.parent}/critiques/critique_report.md -""" +## Anthropic Extended Thinking +> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! -def parse_critique_score(critique_file: Path) -> float: - """Parse the average score from the critique report.""" - if not critique_file.exists(): - return 0.0 +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process +through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. - content = critique_file.read_text() +### How It Works - # Look for "Average Score: X" pattern - patterns = [ - r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", - r"Average Score:\s*(\d+(?:\.\d+)?)", - r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", - ] +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: - for pattern in patterns: - match = re.search(pattern, content, re.IGNORECASE) - if match: - return float(match.group(1)) +```python focus={6-11} icon="python" wrap +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") - return 0.0 +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` +### Understanding Thinking Blocks -def run_iterative_refinement() -> None: - """Run the iterative refinement workflow.""" - # Setup - api_key = os.getenv("LLM_API_KEY") - assert api_key is not None, "LLM_API_KEY environment variable is not set." - model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") - base_url = os.getenv("LLM_BASE_URL") +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: - llm = LLM( - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - usage_id="iterative_refinement", - ) +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data - workspace_dir, cobol_dir, java_dir = setup_workspace() - critique_dir = workspace_dir / "critiques" +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, +giving you insight into how Claude is approaching the problem. - print(f"Workspace: {workspace_dir}") - print(f"COBOL Directory: {cobol_dir}") - print(f"Java Directory: {java_dir}") - print(f"Critique Directory: {critique_dir}") - print() +### Ready-to-run Example Antrophic - # Create sample COBOL files - cobol_files = create_sample_cobol_files(cobol_dir) - print(f"Created {len(cobol_files)} sample COBOL files:") - for f in cobol_files: - print(f" - {f}") - print() + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + - critique_file = critique_dir / "critique_report.md" - current_score = 0.0 - iteration = 0 +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" - while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - iteration += 1 - print("=" * 80) - print(f"ITERATION {iteration}") - print("=" * 80) +import os - # Phase 1: Refactoring - print("\n--- Phase 1: Refactoring Agent ---") - refactoring_agent = get_default_agent(llm=llm, cli_mode=True) - refactoring_conversation = Conversation( - agent=refactoring_agent, - workspace=str(workspace_dir), - ) +from pydantic import SecretStr - previous_critique = critique_file if iteration > 1 else None - refactoring_prompt = get_refactoring_prompt( - cobol_dir, java_dir, cobol_files, previous_critique - ) +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool - refactoring_conversation.send_message(refactoring_prompt) - refactoring_conversation.run() - print("Refactoring phase complete.") - # Phase 2: Critique - print("\n--- Phase 2: Critique Agent ---") - critique_agent = get_default_agent(llm=llm, cli_mode=True) - critique_conversation = Conversation( - agent=critique_agent, - workspace=str(workspace_dir), - ) +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") - critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) - critique_conversation.send_message(critique_prompt) - critique_conversation.run() - print("Critique phase complete.") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) - # Parse the score - current_score = parse_critique_score(critique_file) - print(f"\nCurrent Score: {current_score:.1f}%") +# Setup agent with bash tool +agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) - if current_score >= QUALITY_THRESHOLD: - print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") - else: - print( - f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " - "Continuing refinement..." - ) - # Final summary - print("\n" + "=" * 80) - print("ITERATIVE REFINEMENT COMPLETE") - print("=" * 80) - print(f"Total iterations: {iteration}") - print(f"Final score: {current_score:.1f}%") - print(f"Workspace: {workspace_dir}") +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") - # List created Java files - print("\nCreated Java files:") - for java_file in java_dir.glob("*.java"): - print(f" - {java_file.name}") - # Show critique file location - if critique_file.exists(): - print(f"\nFinal critique report: {critique_file}") +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) - # Report cost - cost = llm.metrics.accumulated_cost - print(f"\nEXAMPLE_COST: {cost}") +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("✅ Done!") -if __name__ == "__main__": - run_iterative_refinement() +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` - + -## Next Steps +## OpenAI Reasoning via Responses API -- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents -- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow +> A ready-to-run example is available [here](#ready-to-run-example-openai)! +OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) +that provides access to the model's reasoning process. +By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. -# Exception Handling -Source: https://docs.openhands.dev/sdk/guides/llm-error-handling +### How It Works -The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: -This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. +```python focus={5} icon="python" wrap +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` -## Why typed exceptions? +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of +reasoning performed by the model. -LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: +Then capture reasoning traces in your callback: -- One code path to handle auth, rate limits, timeouts, service issues, and bad requests -- Clear behavior when conversation history exceeds the context window -- Backward compatibility when you switch providers or SDK versions +```python focus={3-4} icon="python" wrap +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` -## Quick start: Using agents and conversations +### Understanding Reasoning Traces -Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. +These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. +Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + +### Ready-to-run Example OpenAI + + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + + +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation + +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" + +from __future__ import annotations + +import os -```python icon="python" wrap from pydantic import SecretStr -from openhands.sdk import Agent, Conversation, LLM -from openhands.sdk.llm.exceptions import ( - LLMError, - LLMAuthenticationError, - LLMRateLimitError, - LLMTimeoutError, - LLMServiceUnavailableError, - LLMBadRequestError, - LLMContextWindowExceedError, + +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, ) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) -agent = Agent(llm=llm, tools=[]) -conversation = Conversation( - agent=agent, - persistence_dir="./.conversations", - workspace=".", + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + +model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", ) -try: - conversation.send_message( - "Continue the long analysis we started earlier…" - ) - conversation.run() +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) + +llm_messages = [] # collect raw LLM-convertible messages for inspection -except LLMContextWindowExceedError: - # Conversation is longer than the model’s context window - # Options: - # 1) Enable a condenser (recommended for long sessions) - # 2) Shorten inputs or reset conversation - print("Hit the context limit. Consider enabling a condenser.") -except LLMAuthenticationError: - print( - "Invalid or missing API credentials." - "Check your API key or auth setup." - ) +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -except LLMRateLimitError: - print("Rate limit exceeded. Back off and retry later.") -except LLMTimeoutError: - print("Request timed out. Consider increasing timeout or retrying.") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) -except LLMServiceUnavailableError: - print("Service unavailable or connectivity issue. Retry with backoff.") +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() -except LLMBadRequestError: - print("Bad request to provider. Validate inputs and arguments.") +conversation.send_message("Now delete FACTS.txt.") +conversation.run() -except LLMError as e: - # Fallback for other SDK LLM errors (parsing/validation, etc.) - print(f"Unhandled LLM error: {e}") +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` + +## Use Cases -### Avoiding context‑window errors with a condenser +**Debugging**: Understand why the agent made specific decisions or took certain actions. -If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. +**Transparency**: Show users how the AI arrived at its conclusions. -```python icon="python" focus={5-6, 9-14} wrap -from openhands.sdk.context.condenser import LLMSummarizingCondenser +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. -condenser = LLMSummarizingCondenser( - llm=llm.model_copy(update={"usage_id": "condenser"}), - max_size=10, - keep_first=2, -) +**Learning**: Study how models approach complex problems. -agent = Agent(llm=llm, tools=[], condenser=condenser) -conversation = Conversation( - agent=agent, - persistence_dir="./.conversations", - workspace=".", -) -``` +## Next Steps - - See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). - +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities -## Handling errors with direct LLM calls +### LLM Registry +Source: https://docs.openhands.dev/sdk/guides/llm-registry.md -The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### Example: Using `.completion()` +> A ready-to-run example is available [here](#ready-to-run-example)! -```python icon="python" wrap -from pydantic import SecretStr -from openhands.sdk import LLM -from openhands.sdk.llm import Message, TextContent -from openhands.sdk.llm.exceptions import ( - LLMError, - LLMAuthenticationError, - LLMRateLimitError, - LLMTimeoutError, - LLMServiceUnavailableError, - LLMBadRequestError, - LLMContextWindowExceedError, -) +Use the LLM registry to manage multiple LLM providers and dynamically switch between models. -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +## Using the Registry -try: - response = llm.completion([ - Message.user([TextContent(text="Summarize our design doc")]) - ]) - print(response.message) +You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. -except LLMContextWindowExceedError: - print("Context window exceeded. Consider enabling a condenser.") -except LLMAuthenticationError: - print("Invalid or missing API credentials.") -except LLMRateLimitError: - print("Rate limit exceeded. Back off and retry later.") -except LLMTimeoutError: - print("Request timed out. Consider increasing timeout or retrying.") -except LLMServiceUnavailableError: - print("Service unavailable or connectivity issue. Retry with backoff.") -except LLMBadRequestError: - print("Bad request to provider. Validate inputs and arguments.") -except LLMError as e: - print(f"Unhandled LLM error: {e}") -``` +```python icon="python" focus={9,10,13} +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -### Example: Using `.responses()` +# define the registry and add an LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +... +# retrieve the LLM by its usage ID +llm = llm_registry.get("agent") +``` -```python icon="python" wrap -from pydantic import SecretStr -from openhands.sdk import LLM -from openhands.sdk.llm import Message, TextContent -from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError +## Ready-to-run Example -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + -try: - resp = llm.responses([ - Message.user( - [TextContent(text="Write a one-line haiku about code.")] - ) - ]) - print(resp.message) -except LLMContextWindowExceedError: - print("Context window exceeded. Consider enabling a condenser.") -except LLMError as e: - print(f"LLM error: {e}") -``` +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os -## Exception reference +from pydantic import SecretStr -All exceptions live under `openhands.sdk.llm.exceptions` unless noted. +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool -| Category | Error | Description | -|--------|------|-------------| -| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | -| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | -| | `LLMRateLimitError` | Provider rate limit exceeded. | -| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | -| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | -| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | -| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | -| | `LLMNoActionError` | Model did not return an action when one was expected. | -| | `LLMResponseError` | Could not extract an action from the response. | -| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | -| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | -| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | -| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | -| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | -| | `OperationCancelled` | A running operation was cancelled programmatically. | - - All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all - for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. - +logger = get_logger(__name__) +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -# LLM Fallback Strategy -Source: https://docs.openhands.dev/sdk/guides/llm-fallback +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) -> A ready-to-run example is available [here](#ready-to-run-example)! +# Get LLM from registry +llm = llm_registry.get("agent") -`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] -## Basic Usage +# Agent +agent = Agent(llm=llm, tools=tools) -Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): +llm_messages = [] # collect raw LLM messages -```python icon="python" wrap focus={16, 17, 21, 22, 23} -from pydantic import SecretStr -from openhands.sdk import LLM, LLMProfileStore -from openhands.sdk.llm import FallbackStrategy -# Menage persisted LLM profiles -# default store directory: .openhands/profiles -store = LLMProfileStore() +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -fallback_llm = LLM( - usage_id="fallback-1", - model="openai/gpt-4o", - api_key=SecretStr("your-openai-key"), -) -store.save("fallback-1", fallback_llm, include_secrets=True) -# Configure an LLM with a fallback strategy -primary_llm = LLM( - usage_id="agent-primary", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("your-api-key"), - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1"], - ), +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd ) -``` - -## How It Works -1. The primary LLM handles the request as normal -2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order -3. The first successful fallback response is returned to the caller -4. If all fallbacks fail, the original primary error is raised -5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model +conversation.send_message("Please echo 'Hello!'") +conversation.run() - -Only transient errors trigger fallback. -Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. -For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) - +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -## Multiple Fallback Levels +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") -Chain as many fallback LLMs as you need. They are tried in list order: +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") -```python icon="python" wrap focus={5-7} -llm = LLM( - usage_id="agent-primary", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - ), +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] ) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. + -## Custom Profile Store Directory -By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: +## Next Steps -```python icon="python" wrap focus={3} -FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - profile_store_dir="/path/to/my/profiles", -) -``` +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs -## Metrics +### Model Routing +Source: https://docs.openhands.dev/sdk/guides/llm-routing.md -Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -```python icon="python" wrap -# After running a conversation -metrics = llm.metrics -print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +This feature is under active development and more default routers will be available in future releases. -for usage in metrics.token_usages: - print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") -``` +> A ready-to-run example is available [here](#ready-to-run-example)! -Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. +### Using the built-in MultimodalRouter -## Use Cases +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: -- **Rate limit handling** — When one provider throttles you, seamlessly switch to another -- **High availability** — Keep your agent running during provider outages -- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure -- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. +```python icon="python" wrap focus={13-16} +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) +``` + +You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. ## Ready-to-run Example -This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) -```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py -"""Example: Using FallbackStrategy for LLM resilience. - -When the primary LLM fails with a transient error (rate limit, timeout, etc.), -FallbackStrategy automatically tries alternate LLMs in order. Fallback is -per-call: each new request starts with the primary model. Token usage and -cost from fallback calls are merged into the primary LLM's metrics. - -This example: - 1. Saves two fallback LLM profiles to a temporary store. - 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. - 3. Runs a conversation — if the primary model is unavailable, the agent - transparently falls back to the next available model. -""" +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py import os -import tempfile from pydantic import SecretStr -from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool -from openhands.sdk.llm import FallbackStrategy -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool - - -# Read configuration from environment -api_key = os.getenv("LLM_API_KEY", None) -assert api_key is not None, "LLM_API_KEY environment variable is not set." -base_url = os.getenv("LLM_BASE_URL") -primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") - -# Use a temporary directory so this example doesn't pollute your home folder. -# In real usage you can omit base_dir to use the default (~/.openhands/profiles). -profile_store_dir = tempfile.mkdtemp() -store = LLMProfileStore(base_dir=profile_store_dir) - -fallback_1 = LLM( - usage_id="fallback-1", - model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), - api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), - base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, ) -store.save("fallback-1", fallback_1, include_secrets=True) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools -fallback_2 = LLM( - usage_id="fallback-2", - model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), - api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), - base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), -) -store.save("fallback-2", fallback_2, include_secrets=True) -print(f"Saved fallback profiles: {store.list()}") +logger = get_logger(__name__) +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -# Configure the primary LLM with a FallbackStrategy primary_llm = LLM( usage_id="agent-primary", - model=primary_model, + model=model, + base_url=base_url, api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="openhands/devstral-small-2507", base_url=base_url, - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - profile_store_dir=profile_store_dir, - ), + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, ) +# Tools +tools = get_default_tools() # Use our default openhands experience -# Run a conversation -agent = Agent( - llm=primary_llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - ], +# Agent +agent = Agent(llm=multimodal_router, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() ) -conversation = Conversation(agent=agent, workspace=os.getcwd()) -conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) conversation.run() +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() -# Inspect metrics (includes any fallback usage) -metrics = primary_llm.metrics -print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") -print(f"Token usage records: {len(metrics.token_usages)}") -for usage in metrics.token_usages: - print( - f" model={usage.model}" - f" prompt={usage.prompt_tokens}" - f" completion={usage.completion_tokens}" +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], ) +) +conversation.run() -print(f"EXAMPLE_COST: {metrics.accumulated_cost}") +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` - + + ## Next Steps -- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles -- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) -- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application -- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + +### LLM Streaming +Source: https://docs.openhands.dev/sdk/guides/llm-streaming.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +This is currently only supported for the chat completion endpoint. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use +streaming callbacks to process and display tokens as they arrive from the language model. -# Image Input -Source: https://docs.openhands.dev/sdk/guides/llm-image-input -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +## How It Works -> A ready-to-run example is available [here](#ready-to-run-example)! +Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the +complete response. This creates a more responsive user experience, especially for long-form content generation. + + + ### Enable Streaming on LLM + Configure the LLM with streaming enabled: -### Sending Images + ```python focus={6} icon="python" wrap + llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, # Enable streaming + ) + ``` + + + ### Define Token Callback + Create a callback function that processes streaming chunks as they arrive: -The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + ```python icon="python" wrap + def on_token(chunk: ModelResponseStream) -> None: + """Process each streaming chunk as it arrives.""" + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + content = getattr(delta, "content", None) + if isinstance(content, str): + sys.stdout.write(content) + sys.stdout.flush() + ``` -Pass images along with text in the message content: + The callback receives a `ModelResponseStream` object containing: + - **`choices`**: List of response choices from the model + - **`delta`**: Incremental content changes for each choice + - **`content`**: The actual text tokens being streamed + + + ### Register Callback with Conversation -```python focus={14} icon="python" wrap -from openhands.sdk import ImageContent + Pass your token callback to the conversation: -IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" -conversation.send_message( - Message( - role="user", - content=[ - TextContent( - text=( - "Study this image and describe the key elements you see. " - "Summarize them in a short paragraph and suggest a catchy caption." - ) - ), - ImageContent(image_urls=[IMAGE_URL]), - ], - ) -) -``` + ```python focus={3} icon="python" wrap + conversation = Conversation( + agent=agent, + token_callbacks=[on_token], # Register streaming callback + workspace=os.getcwd(), + ) + ``` -Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. + The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers + if needed (e.g., one for display, another for logging). + + ## Ready-to-run Example -This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) +This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) -You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: - -```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py -"""OpenHands Agent SDK — Image Input Example. - -This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds -vision support by sending an image to the agent alongside text instructions. -""" - +```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py import os +import sys +from typing import Literal from pydantic import SecretStr from openhands.sdk import ( - LLM, - Agent, Conversation, - Event, - ImageContent, - LLMConvertibleEvent, - Message, - TextContent, get_logger, ) -from openhands.sdk.tool.spec import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +from openhands.sdk.llm import LLM +from openhands.sdk.llm.streaming import ModelResponseStream +from openhands.tools.preset.default import get_default_agent logger = get_logger(__name__) -# Configure LLM (vision-capable model) -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +if not api_key: + raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") llm = LLM( - usage_id="vision-llm", model=model, - base_url=base_url, api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, ) -assert llm.vision_is_active(), "The selected LLM model does not support vision input." -cwd = os.getcwd() +agent = get_default_agent(llm=llm, cli_mode=True) -agent = Agent( - llm=llm, - tools=[ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], -) -llm_messages = [] # collect raw LLM messages for inspection +# Define streaming states +StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] +# Track state across on_token calls for boundary detection +_current_state: StreamingState | None = None -def conversation_callback(event: Event) -> None: - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +def on_token(chunk: ModelResponseStream) -> None: + """ + Handle all types of streaming tokens including content, + tool calls, and thinking blocks with dynamic boundary detection. + """ + global _current_state + + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + # Handle thinking blocks (reasoning content) + reasoning_content = getattr(delta, "reasoning_content", None) + if isinstance(reasoning_content, str) and reasoning_content: + if _current_state != "thinking": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("THINKING: ") + _current_state = "thinking" + sys.stdout.write(reasoning_content) + sys.stdout.flush() + + # Handle regular content + content = getattr(delta, "content", None) + if isinstance(content, str) and content: + if _current_state != "content": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("CONTENT: ") + _current_state = "content" + sys.stdout.write(content) + sys.stdout.flush() + + # Handle tool calls + tool_calls = getattr(delta, "tool_calls", None) + if tool_calls: + for tool_call in tool_calls: + tool_name = ( + tool_call.function.name if tool_call.function.name else "" + ) + tool_args = ( + tool_call.function.arguments + if tool_call.function.arguments + else "" + ) + if tool_name: + if _current_state != "tool_name": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL NAME: ") + _current_state = "tool_name" + sys.stdout.write(tool_name) + sys.stdout.flush() + if tool_args: + if _current_state != "tool_args": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL ARGS: ") + _current_state = "tool_args" + sys.stdout.write(tool_args) + sys.stdout.flush() conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd + agent=agent, + workspace=os.getcwd(), + token_callbacks=[on_token], ) -IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" - -conversation.send_message( - Message( - role="user", - content=[ - TextContent( - text=( - "Study this image and describe the key elements you see. " - "Summarize them in a short paragraph and suggest a catchy caption." - ) - ), - ImageContent(image_urls=[IMAGE_URL]), - ], - ) +story_prompt = ( + "Tell me a long story about LLM streaming, write it a file, " + "make sure it has multiple paragraphs. " ) +conversation.send_message(story_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") conversation.run() -conversation.send_message( - "Great! Please save your description and caption into image_report.md." +cleanup_prompt = ( + "Thank you. Please delete the streaming story file now that I've read it, " + "then confirm the deletion." ) +conversation.send_message(cleanup_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") conversation.run() -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") - # Report cost cost = llm.metrics.accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - + ## Next Steps -- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns -- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently - +- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully +- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI -# LLM Profile Store -Source: https://docs.openhands.dev/sdk/guides/llm-profile-store +### LLM Subscriptions +Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -> A ready-to-run example is available [here](#ready-to-run-example)! + +OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. + -The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. -Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. +> A ready-to-run example is available [here](#ready-to-run-example)! -## Benefits -- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. -- **Reusability:** Import a defined profile into any script or session with a single identifier. -- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. +Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. ## How It Works - ### Create a Store - - The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, - but you can point it anywhere. + ### Call subscription_login() - ```python icon="python" focus={3, 4, 6, 7} - from openhands.sdk import LLMProfileStore + The `LLM.subscription_login()` class method handles the entire authentication flow: - # Default location: ~/.openhands/profiles - store = LLMProfileStore() + ```python icon="python" + from openhands.sdk import LLM - # Or bring your own directory - store = LLMProfileStore(base_dir="./my-profiles") + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") ``` + + On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. - ### Save a Profile + ### Use the LLM - Got an LLM configured just right? Save it for later. + Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. + + - ```python icon="python" focus={11, 12} - from pydantic import SecretStr - from openhands.sdk import LLM, LLMProfileStore +## Supported Models - fast_llm = LLM( - usage_id="fast", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("sk-..."), - temperature=0.0, - ) +The following models are available via ChatGPT subscription: - store = LLMProfileStore() - store.save("fast", fast_llm) - ``` +| Model | Description | +|-------|-------------| +| `gpt-5.2-codex` | Latest Codex model (default) | +| `gpt-5.2` | GPT-5.2 base model | +| `gpt-5.1-codex-max` | High-capacity Codex model | +| `gpt-5.1-codex-mini` | Lightweight Codex model | - - API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to - persist them; otherwise, they will be read from the environment at load time. - - - - ### Load a Profile +## Configuration Options - Next time you need that LLM, just load it: +### Force Fresh Login - ```python icon="python" - # Same model, ready to go. - llm = store.load("fast") - ``` - - - ### List and Clean Up +If your cached credentials become stale or you want to switch accounts: - See what you've got, delete what you don't need: +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + force_login=True, # Always perform fresh OAuth login +) +``` - ```python icon="python" focus={1, 3, 4} - print(store.list()) # ['fast.json', 'creative.json'] +### Disable Browser Auto-Open - store.delete("creative") - print(store.list()) # ['fast.json'] - ``` - - +For headless environments or when you prefer to manually open the URL: -## Good to Know +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + open_browser=False, # Prints URL to console instead +) +``` -Profile names must be simple filenames (no slashes, no dots at the start). +### Check Subscription Mode + +Verify that the LLM is using subscription-based authentication: + +```python icon="python" +llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") +print(f"Using subscription: {llm.is_subscription}") # True +``` + +## Credential Storage + +Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. ## Ready-to-run Example -This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) +This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) -```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py -"""Example: Using LLMProfileStore to save and reuse LLM configurations. +```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py +"""Example: Using ChatGPT subscription for Codex models. -LLMProfileStore persists LLM configurations as JSON files, so you can define -a profile once and reload it across sessions without repeating setup code. +This example demonstrates how to use your ChatGPT Plus/Pro subscription +to access OpenAI's Codex models without consuming API credits. + +The subscription_login() method handles: +- OAuth PKCE authentication flow +- Credential caching (~/.openhands/auth/) +- Automatic token refresh + +Supported models: +- gpt-5.2-codex +- gpt-5.2 +- gpt-5.1-codex-max +- gpt-5.1-codex-mini + +Requirements: +- Active ChatGPT Plus or Pro subscription +- Browser access for initial OAuth login """ import os -import tempfile - -from pydantic import SecretStr -from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -# Use a temporary directory so this example doesn't pollute your home folder. -# In real usage you can omit base_dir to use the default (~/.openhands/profiles). -store = LLMProfileStore(base_dir=tempfile.mkdtemp()) +# First time: Opens browser for OAuth login +# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" +) +# Alternative: Force a fresh login (useful if credentials are stale) +# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) -# 1. Create two LLM profiles with different usage +# Alternative: Disable auto-opening browser (prints URL to console instead) +# llm = LLM.subscription_login( +# vendor="openai", model="gpt-5.2-codex", open_browser=False +# ) -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -base_url = os.getenv("LLM_BASE_URL") -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +# Verify subscription mode is active +print(f"Using subscription mode: {llm.is_subscription}") -fast_llm = LLM( - usage_id="fast", - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - temperature=0.0, +# Use the LLM with an agent as usual +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], ) -creative_llm = LLM( - usage_id="creative", - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - temperature=0.9, -) +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) -# 2. Save profiles +conversation.send_message("List the files in the current directory.") +conversation.run() +print("Done!") +``` -# Note that secrets are excluded by default for safety. -store.save("fast", fast_llm) -store.save("creative", creative_llm) + -# To persist the API key as well, pass `include_secrets=True`: -# store.save("fast", fast_llm, include_secrets=True) +## Next Steps -# 3. List available persisted profiles +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token +- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces -print(f"Stored profiles: {store.list()}") +### Model Context Protocol +Source: https://docs.openhands.dev/sdk/guides/mcp.md -# 4. Load a profile +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -loaded = store.load("fast") -assert isinstance(loaded, LLM) -print( - "Loaded profile. " - f"usage:{loaded.usage_id}, " - f"model: {loaded.model}, " - f"temperature: {loaded.temperature}." -) + + ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. + Read more about MCP [here](https://modelcontextprotocol.io/). + -# 5. Delete a profile -store.delete("creative") -print(f"After deletion: {store.list()}") -print("EXAMPLE_COST: 0") +## Basic MCP Usage + +> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! + + + + ### MCP Configuration + Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) + + ```python mcp_config icon="python" wrap focus={3-10} + mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "repomix": { + "command": "npx", + "args": ["-y", "repomix@1.4.2", "--mcp"] + }, + } + } + ``` + + + ### Tool Filtering + Use `filter_tools_regex` to control which MCP tools are available to the agent + + ```python filter_tools_regex focus={4-5} icon="python" + agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", + ) + ``` + + + +## MCP with OAuth + +> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! + +For MCP servers requiring OAuth authentication: +- Configure OAuth-enabled MCP servers by specifying the URL and auth type +- The SDK automatically handles the OAuth flow when first connecting +- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) +- User will be prompted to authenticate +- Access tokens are securely stored and automatically refreshed by FastMCP as needed + +```python mcp_config focus={5} icon="python" wrap +mcp_config = { + "mcpServers": { + "Notion": { + "url": "https://mcp.notion.com/mcp", + "auth": "oauth" + } + } +} ``` - +## Ready-to-Run Basic MCP Usage Example -## Next Steps + +This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) + -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime -- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models -- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully +Here's an example integrating MCP servers with an agent: +```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py +import os -# Reasoning -Source: https://docs.openhands.dev/sdk/guides/llm-reasoning +from pydantic import SecretStr -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. -This guide demonstrates two provider-specific approaches: -1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning -2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter +logger = get_logger(__name__) -## Anthropic Extended Thinking +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process -through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, + } +} +# Agent +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + # This regex filters out all repomix tools except pack_codebase + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", +) -### How It Works +llm_messages = [] # collect raw LLM messages -The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: -```python focus={6-11} icon="python" wrap -def show_thinking(event: Event): +def conversation_callback(event: Event): if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") - for block in message.thinking_blocks: - if isinstance(block, RedactedThinkingBlock): - print(f"Redacted: {block.data}") - elif isinstance(block, ThinkingBlock): - print(f"Thinking: {block.thinking}") + llm_messages.append(event.to_llm_message()) -conversation = Conversation(agent=agent, callbacks=[show_thinking]) -``` -### Understanding Thinking Blocks +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) -Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() -- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process -- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data +conversation.send_message("Great! Now delete that file.") +conversation.run() -By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, -giving you insight into how Claude is approaching the problem. +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -### Ready-to-run Example Antrophic +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Ready-to-Run MCP with OAuth Example -This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) +This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) -```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py -"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" - +```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py import os from pydantic import SecretStr @@ -29857,19 +29863,20 @@ from openhands.sdk import ( Conversation, Event, LLMConvertibleEvent, - RedactedThinkingBlock, - ThinkingBlock, + get_logger, ) from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -# Configure LLM for Anthropic Claude with extended thinking +logger = get_logger(__name__) + +# Configure LLM api_key = os.getenv("LLM_API_KEY") assert api_key is not None, "LLM_API_KEY environment variable is not set." model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") - llm = LLM( usage_id="agent", model=model, @@ -29877,142 +29884,147 @@ llm = LLM( api_key=SecretStr(api_key), ) -# Setup agent with bash tool -agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] +mcp_config = { + "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} +} +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) -# Callback to display thinking blocks -def show_thinking(event: Event): +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") - for i, block in enumerate(message.thinking_blocks): - if isinstance(block, RedactedThinkingBlock): - print(f" Block {i + 1}: {block.data}") - elif isinstance(block, ThinkingBlock): - print(f" Block {i + 1}: {block.thinking}") + llm_messages.append(event.to_llm_message()) +# Conversation conversation = Conversation( - agent=agent, callbacks=[show_thinking], workspace=os.getcwd() -) - -conversation.send_message( - "Calculate compound interest for $10,000 at 5% annually, " - "compounded quarterly for 3 years. Show your work.", + agent=agent, + callbacks=[conversation_callback], ) -conversation.run() -conversation.send_message( - "Now, write that number to RESULTs.txt.", -) +logger.info("Starting conversation with MCP integration...") +conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") conversation.run() -print("✅ Done!") -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") ``` - + -## OpenAI Reasoning via Responses API +## Next Steps -> A ready-to-run example is available [here](#ready-to-run-example-openai)! +- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage +- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation -OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) -that provides access to the model's reasoning process. -By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. +### Metrics Tracking +Source: https://docs.openhands.dev/sdk/guides/metrics.md -### How It Works +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Configure the LLM with the `reasoning_effort` parameter to enable reasoning: +## Overview -```python focus={5} icon="python" wrap -llm = LLM( - model="openhands/gpt-5-codex", - api_key=SecretStr(api_key), - base_url=base_url, - # Enable reasoning with effort level - reasoning_effort="high", -) -``` +The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). -The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of -reasoning performed by the model. +## Getting Metrics from Individual LLMs -Then capture reasoning traces in your callback: +> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! -```python focus={3-4} icon="python" wrap -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - msg = event.to_llm_message() - llm_messages.append(msg) -``` +Track token usage, costs, and performance metrics from LLM interactions: -### Understanding Reasoning Traces +### Accessing Individual LLM Metrics -The OpenAI Responses API provides reasoning traces that show how the model approached the problem. -These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. -Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. +Access metrics directly from the LLM object after running the conversation: -### Ready-to-run Example OpenAI +```python icon="python" focus={3-4} +conversation.run() - -This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) - +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` -```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py -""" -Example: Responses API path via LiteLLM in a Real Agent Conversation +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: -- Runs a real Agent/Conversation to verify /responses path works -- Demonstrates rendering of Responses reasoning within normal conversation events -""" +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call -from __future__ import annotations + + For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + +### Ready-to-run Example (LLM metrics) + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py import os from pydantic import SecretStr from openhands.sdk import ( + LLM, + Agent, Conversation, Event, LLMConvertibleEvent, get_logger, ) -from openhands.sdk.llm import LLM -from openhands.tools.preset.default import get_default_agent +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool logger = get_logger(__name__) -api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") -assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." - -model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") - llm = LLM( + usage_id="agent", model=model, - api_key=SecretStr(api_key), base_url=base_url, - # Responses-path options - reasoning_effort="high", - # Logging / behavior tweaks - log_completions=False, - usage_id="agent", + api_key=SecretStr(api_key), ) -print("\n=== Agent Conversation using /responses path ===") -agent = get_default_agent( - llm=llm, - cli_mode=True, # disable browser tools for env simplicity -) +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -llm_messages = [] # collect raw LLM-convertible messages for inspection +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages def conversation_callback(event: Event): @@ -30020,84 +30032,63 @@ def conversation_callback(event: Event): llm_messages.append(event.to_llm_message()) +# Conversation conversation = Conversation( agent=agent, callbacks=[conversation_callback], - workspace=os.getcwd(), + workspace=cwd, ) -# Keep the tasks short for demo purposes -conversation.send_message("Read the repo and write one fact into FACTS.txt.") +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) conversation.run() -conversation.send_message("Now delete FACTS.txt.") +conversation.send_message("Great! Now delete that file.") conversation.run() print("=" * 100) print("Conversation finished. Got the following LLM messages:") for i, message in enumerate(llm_messages): - ms = str(message) - print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) # Report cost cost = llm.metrics.accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - - -## Use Cases - -**Debugging**: Understand why the agent made specific decisions or took certain actions. - -**Transparency**: Show users how the AI arrived at its conclusions. - -**Quality Assurance**: Identify flawed reasoning patterns or logic errors. - -**Learning**: Study how models approach complex problems. - -## Next Steps - -- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance -- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities - - -# LLM Registry -Source: https://docs.openhands.dev/sdk/guides/llm-registry - -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + -> A ready-to-run example is available [here](#ready-to-run-example)! +## Using LLM Registry for Cost Tracking -Use the LLM registry to manage multiple LLM providers and dynamically switch between models. +> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! -## Using the Registry +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. -You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. +### How the LLM Registry Works -```python icon="python" focus={9,10,13} -main_llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: -# define the registry and add an LLM -llm_registry = LLMRegistry() -llm_registry.add(main_llm) -... -# retrieve the LLM by its usage ID -llm = llm_registry.get("agent") -``` +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID -## Ready-to-run Example +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. +### Ready-to-run Example (LLM Registry) This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + ```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py import os @@ -30190,869 +30181,840 @@ print(f"Direct completion response: {texts[0] if texts else str(msg)}") cost = llm.metrics.accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - - -## Next Steps - -- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs - - -# Model Routing -Source: https://docs.openhands.dev/sdk/guides/llm-routing - -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; - -This feature is under active development and more default routers will be available in future releases. - -> A ready-to-run example is available [here](#ready-to-run-example)! - -### Using the built-in MultimodalRouter - -Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: - -```python icon="python" wrap focus={13-16} -primary_llm = LLM( - usage_id="agent-primary", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -secondary_llm = LLM( - usage_id="agent-secondary", - model="litellm_proxy/mistral/devstral-small-2507", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) -multimodal_router = MultimodalRouter( - usage_id="multimodal-router", - llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, -) -``` - -You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. - -## Ready-to-run Example +### Getting Aggregated Conversation Costs -This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) -Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. -```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py import os from pydantic import SecretStr +from tabulate import tabulate from openhands.sdk import ( LLM, Agent, Conversation, - Event, - ImageContent, - LLMConvertibleEvent, + LLMSummarizingCondenser, Message, TextContent, get_logger, ) -from openhands.sdk.llm.router import MultimodalRouter -from openhands.tools.preset.default import get_default_tools +from openhands.sdk.tool.spec import Tool +from openhands.tools.terminal import TerminalTool logger = get_logger(__name__) -# Configure LLM +# Configure LLM using LLMRegistry api_key = os.getenv("LLM_API_KEY") assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") -primary_llm = LLM( - usage_id="agent-primary", +# Create LLM instance +llm = LLM( + usage_id="agent", model=model, base_url=base_url, api_key=SecretStr(api_key), ) -secondary_llm = LLM( - usage_id="agent-secondary", - model="openhands/devstral-small-2507", + +llm_condenser = LLM( + model=model, base_url=base_url, api_key=SecretStr(api_key), -) -multimodal_router = MultimodalRouter( - usage_id="multimodal-router", - llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, + usage_id="condenser", ) # Tools -tools = get_default_tools() # Use our default openhands experience - -# Agent -agent = Agent(llm=multimodal_router, tools=tools) - -llm_messages = [] # collect raw LLM messages - - -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) - - -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() -) - -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text=("Hi there, who trained you?"))], - ) -) -conversation.run() +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) -conversation.send_message( - message=Message( - role="user", - content=[ - ImageContent( - image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] - ), - TextContent(text=("What do you see in the image above?")), - ], - ) +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + ], + condenser=condenser, ) -conversation.run() +conversation = Conversation(agent=agent, workspace=cwd) conversation.send_message( message=Message( role="user", - content=[TextContent(text=("Who trained you as an LLM?"))], + content=[TextContent(text="Please echo 'Hello!'")], ) ) conversation.run() -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model=model, + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) # Report cost cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - + +### Understanding Conversation Stats -## Next Steps +The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs +#### Key Methods and Properties +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. -# LLM Streaming -Source: https://docs.openhands.dev/sdk/guides/llm-streaming +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +```python icon="python" focus={2, 6, 10} +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") - -This is currently only supported for the chat completion endpoint. - +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") -> A ready-to-run example is available [here](#ready-to-run-example)! +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +``` +## Next Steps -Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use -streaming callbacks to process and display tokens as they arrive from the language model. +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models +### Observability & Tracing +Source: https://docs.openhands.dev/sdk/guides/observability.md -## How It Works +> A full setup example is available [here](#example:-full-setup)! -Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the -complete response. This creates a more responsive user experience, especially for long-form content generation. +## Overview - - - ### Enable Streaming on LLM - Configure the LLM with streaming enabled: +The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: - ```python focus={6} icon="python" wrap - llm = LLM( - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), - base_url=base_url, - usage_id="stream-demo", - stream=True, # Enable streaming - ) - ``` - - - ### Define Token Callback - Create a callback function that processes streaming chunks as they arrive: +- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support +- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing +- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more - ```python icon="python" wrap - def on_token(chunk: ModelResponseStream) -> None: - """Process each streaming chunk as it arrives.""" - choices = chunk.choices - for choice in choices: - delta = choice.delta - if delta is not None: - content = getattr(delta, "content", None) - if isinstance(content, str): - sys.stdout.write(content) - sys.stdout.flush() - ``` +The SDK automatically traces: +- Agent execution steps +- Tool calls and executions +- LLM API calls (via LiteLLM integration) +- Browser automation sessions (when using browser-use) +- Conversation lifecycle events - The callback receives a `ModelResponseStream` object containing: - - **`choices`**: List of response choices from the model - - **`delta`**: Incremental content changes for each choice - - **`content`**: The actual text tokens being streamed - - - ### Register Callback with Conversation +## Quick Start - Pass your token callback to the conversation: +Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. - ```python focus={3} icon="python" wrap - conversation = Conversation( - agent=agent, - token_callbacks=[on_token], # Register streaming callback - workspace=os.getcwd(), - ) - ``` +### Using Laminar - The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers - if needed (e.g., one for display, another for logging). - - +[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: -## Ready-to-run Example +```bash icon="terminal" wrap +# Set your Laminar project API key +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` - -This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) - +That's it! Run your agent code normally and traces will be sent to Laminar automatically. -```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py -import os -import sys -from typing import Literal +### Using Honeycomb or Other OTLP Backends -from pydantic import SecretStr +For Honeycomb, Jaeger, or any other OTLP-compatible backend: -from openhands.sdk import ( - Conversation, - get_logger, -) -from openhands.sdk.llm import LLM -from openhands.sdk.llm.streaming import ModelResponseStream -from openhands.tools.preset.default import get_default_agent +```bash icon="terminal" wrap +# Required: Set the OTLP endpoint +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" -logger = get_logger(__name__) +# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it +``` +### Alternative Configuration Methods -api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") -if not api_key: - raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") +You can also use these alternative environment variable formats: -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - usage_id="stream-demo", - stream=True, -) +```bash icon="terminal" wrap +# Short form for endpoint +export OTEL_ENDPOINT="http://localhost:4317" -agent = get_default_agent(llm=llm, cli_mode=True) +# Alternative header format +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" +# Alternative protocol specification +export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" +``` -# Define streaming states -StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] -# Track state across on_token calls for boundary detection -_current_state: StreamingState | None = None +## How It Works +The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: -def on_token(chunk: ModelResponseStream) -> None: - """ - Handle all types of streaming tokens including content, - tool calls, and thinking blocks with dynamic boundary detection. - """ - global _current_state +1. **Detects Configuration**: Checks for OTEL environment variables on startup +2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter +3. **Instruments Code**: Automatically wraps key functions with tracing decorators +4. **Captures Context**: Associates traces with conversation IDs for session grouping +5. **Exports Spans**: Sends trace data to your configured backend - choices = chunk.choices - for choice in choices: - delta = choice.delta - if delta is not None: - # Handle thinking blocks (reasoning content) - reasoning_content = getattr(delta, "reasoning_content", None) - if isinstance(reasoning_content, str) and reasoning_content: - if _current_state != "thinking": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("THINKING: ") - _current_state = "thinking" - sys.stdout.write(reasoning_content) - sys.stdout.flush() +### What Gets Traced - # Handle regular content - content = getattr(delta, "content", None) - if isinstance(content, str) and content: - if _current_state != "content": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("CONTENT: ") - _current_state = "content" - sys.stdout.write(content) - sys.stdout.flush() +The SDK automatically instruments these components: - # Handle tool calls - tool_calls = getattr(delta, "tool_calls", None) - if tool_calls: - for tool_call in tool_calls: - tool_name = ( - tool_call.function.name if tool_call.function.name else "" - ) - tool_args = ( - tool_call.function.arguments - if tool_call.function.arguments - else "" - ) - if tool_name: - if _current_state != "tool_name": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("TOOL NAME: ") - _current_state = "tool_name" - sys.stdout.write(tool_name) - sys.stdout.flush() - if tool_args: - if _current_state != "tool_args": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("TOOL ARGS: ") - _current_state = "tool_args" - sys.stdout.write(tool_args) - sys.stdout.flush() +- **`agent.step`** - Each iteration of the agent's execution loop +- **Tool Executions** - Individual tool calls with input/output capture +- **LLM Calls** - API requests to language models via LiteLLM +- **Conversation Lifecycle** - Message sending, conversation runs, and title generation +- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) +### Trace Hierarchy -conversation = Conversation( - agent=agent, - workspace=os.getcwd(), - token_callbacks=[on_token], -) +Traces are organized hierarchically: -story_prompt = ( - "Tell me a long story about LLM streaming, write it a file, " - "make sure it has multiple paragraphs. " -) -conversation.send_message(story_prompt) -print("Token Streaming:") -print("-" * 100 + "\n") -conversation.run() + + + + + + + + + + + + + -cleanup_prompt = ( - "Thank you. Please delete the streaming story file now that I've read it, " - "then confirm the deletion." -) -conversation.send_message(cleanup_prompt) -print("Token Streaming:") -print("-" * 100 + "\n") -conversation.run() +Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single +conversation together in your observability platform. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. - +## Configuration Reference -## Next Steps +### Environment Variables -- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully -- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming -- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI +The SDK checks for these environment variables (in order of precedence): +| Variable | Description | Example | +|----------|-------------|---------| +| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | +| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | +| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | +| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | +| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | +| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | +| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | -# LLM Subscriptions -Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions +### Header Format -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Headers should be comma-separated `key=value` pairs with URL encoding for special characters: - -OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. - +```bash icon="terminal" wrap +# Single header +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" -> A ready-to-run example is available [here](#ready-to-run-example)! +# Multiple headers +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +``` -Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. +### Protocol Options -## How It Works +The SDK supports both HTTP and gRPC protocols: - - - ### Call subscription_login() +- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) +- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) - The `LLM.subscription_login()` class method handles the entire authentication flow: +## Platform-Specific Configuration - ```python icon="python" - from openhands.sdk import LLM +### Laminar Setup - llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") - ``` +1. Sign up at [laminar.sh](https://laminar.sh/) +2. Create a project and copy your API key +3. Set the environment variable: - On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. - - - ### Use the LLM +```bash icon="terminal" wrap +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` - Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. - - +**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. -## Supported Models +### Honeycomb Setup -The following models are available via ChatGPT subscription: +1. Sign up at [honeycomb.io](https://www.honeycomb.io/) +2. Get your API key from the account settings +3. Configure the environment: -| Model | Description | -|-------|-------------| -| `gpt-5.2-codex` | Latest Codex model (default) | -| `gpt-5.2` | GPT-5.2 base model | -| `gpt-5.1-codex-max` | High-capacity Codex model | -| `gpt-5.1-codex-mini` | Lightweight Codex model | +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` -## Configuration Options +### Jaeger Setup -### Force Fresh Login +For local development with Jaeger: -If your cached credentials become stale or you want to switch accounts: +```bash icon="terminal" wrap +# Start Jaeger all-in-one container +docker run -d --name jaeger \ + -p 4317:4317 \ + -p 16686:16686 \ + jaegertracing/all-in-one:latest -```python icon="python" -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", - force_login=True, # Always perform fresh OAuth login -) +# Configure SDK +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" ``` -### Disable Browser Auto-Open +Access the Jaeger UI at http://localhost:16686 -For headless environments or when you prefer to manually open the URL: +### Generic OTLP Collector -```python icon="python" -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", - open_browser=False, # Prints URL to console instead -) +For other backends, use their OTLP endpoint: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" ``` -### Check Subscription Mode +## Advanced Usage -Verify that the LLM is using subscription-based authentication: +### Disabling Observability -```python icon="python" -llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") -print(f"Using subscription: {llm.is_subscription}") # True +To disable tracing, simply unset all OTEL environment variables: + +```bash icon="terminal" wrap +unset LMNR_PROJECT_API_KEY +unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT +unset OTEL_EXPORTER_OTLP_ENDPOINT +unset OTEL_ENDPOINT ``` -## Credential Storage +The SDK will automatically skip all tracing instrumentation with minimal overhead. + +### Custom Span Attributes + +The SDK automatically adds these attributes to spans: + +- **`conversation_id`** - UUID of the conversation +- **`tool_name`** - Name of the tool being executed +- **`action.kind`** - Type of action being performed +- **`session_id`** - Groups all traces from one conversation + +### Debugging Tracing Issues + +If traces aren't appearing in your observability platform: + +1. **Verify Environment Variables**: + ```python icon="python" wrap + import os + + otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') + otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') + + print(f"OTEL Endpoint: {otel_endpoint}") + print(f"OTEL Headers: {otel_headers}") + ``` + +2. **Check SDK Logs**: The SDK logs observability initialization at debug level: + ```python icon="python" wrap + import logging + + logging.basicConfig(level=logging.DEBUG) + ``` + +3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: + ```bash icon="terminal" wrap + curl -v https://api.honeycomb.io:443/v1/traces + ``` + +4. **Validate Headers**: Check that authentication headers are properly URL-encoded + +## Troubleshooting + +### Traces Not Appearing + +**Problem**: No traces showing up in observability platform + +**Solutions**: +- Verify environment variables are set correctly +- Check network connectivity to OTLP endpoint +- Ensure authentication headers are valid +- Look for SDK initialization logs at debug level + +### High Trace Volume + +**Problem**: Too many spans being generated -Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. +**Solutions**: +- Configure sampling at the collector level +- For Laminar with non-browser tools, browser instrumentation is automatically disabled +- Use backend-specific filtering rules -## Ready-to-run Example +### Performance Impact - -This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) - +**Problem**: Concerned about tracing overhead -```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py -"""Example: Using ChatGPT subscription for Codex models. +**Solutions**: +- Tracing has minimal overhead when properly configured +- Disable tracing in development by unsetting environment variables +- Use asynchronous exporters (default in most OTLP configurations) -This example demonstrates how to use your ChatGPT Plus/Pro subscription -to access OpenAI's Codex models without consuming API credits. +## Example: Full Setup -The subscription_login() method handles: -- OAuth PKCE authentication flow -- Credential caching (~/.openhands/auth/) -- Automatic token refresh + +This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) + -Supported models: -- gpt-5.2-codex -- gpt-5.2 -- gpt-5.1-codex-max -- gpt-5.1-codex-mini +```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py +""" +Observability & Laminar example -Requirements: -- Active ChatGPT Plus or Pro subscription -- Browser access for initial OAuth login +This example demonstrates enabling OpenTelemetry tracing with Laminar in the +OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. """ import os +from pydantic import SecretStr + from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -# First time: Opens browser for OAuth login -# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" -) - -# Alternative: Force a fresh login (useful if credentials are stale) -# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) - -# Alternative: Disable auto-opening browser (prints URL to console instead) -# llm = LLM.subscription_login( -# vendor="openai", model="gpt-5.2-codex", open_browser=False -# ) +# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: +# export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# For non-Laminar OTLP backends, set OTEL_* variables instead. -# Verify subscription mode is active -print(f"Using subscription mode: {llm.is_subscription}") +# Configure LLM and Agent +api_key = os.getenv("LLM_API_KEY") +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key) if api_key else None, + base_url=base_url, + usage_id="agent", +) -# Use the LLM with an agent as usual agent = Agent( llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - ], + tools=[Tool(name=TerminalTool.name)], ) -cwd = os.getcwd() -conversation = Conversation(agent=agent, workspace=cwd) - -conversation.send_message("List the files in the current directory.") +# Create conversation and run a simple task +conversation = Conversation(agent=agent, workspace=".") +conversation.send_message("List the files in the current directory and print them.") conversation.run() -print("Done!") +print( + "All done! Check your Laminar dashboard for traces " + "(session is the conversation UUID)." +) ``` - +```bash Running the Example +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/27_observability_laminar.py +``` ## Next Steps -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations -- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token -- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces - +- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces +- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application +- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions -# Model Context Protocol -Source: https://docs.openhands.dev/sdk/guides/mcp +### Plugins +Source: https://docs.openhands.dev/sdk/guides/plugins.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; - - ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. - Read more about MCP [here](https://modelcontextprotocol.io/). - - +Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: +- **Skills**: Specialized knowledge and workflows +- **Hooks**: Event handlers for tool lifecycle +- **MCP Config**: External tool server configurations +- **Agents**: Specialized agent definitions +- **Commands**: Slash commands -## Basic MCP Usage +The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). -> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! +## Plugin Structure - - - ### MCP Configuration - Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) + +See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. + - ```python mcp_config icon="python" wrap focus={3-10} - mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "repomix": { - "command": "npx", - "args": ["-y", "repomix@1.4.2", "--mcp"] - }, - } - } - ``` - - - ### Tool Filtering - Use `filter_tools_regex` to control which MCP tools are available to the agent +A plugin follows this directory structure: - ```python filter_tools_regex focus={4-5} icon="python" - agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", - ) - ``` - - + + + + + + + + + + + + + + + + + + + + + + + -## MCP with OAuth +Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. -> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! +### Plugin Manifest -For MCP servers requiring OAuth authentication: -- Configure OAuth-enabled MCP servers by specifying the URL and auth type -- The SDK automatically handles the OAuth flow when first connecting -- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) -- User will be prompted to authenticate -- Access tokens are securely stored and automatically refreshed by FastMCP as needed +The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: -```python mcp_config focus={5} icon="python" wrap -mcp_config = { - "mcpServers": { - "Notion": { - "url": "https://mcp.notion.com/mcp", - "auth": "oauth" - } - } +```json icon="file-code" wrap +{ + "name": "code-quality", + "version": "1.0.0", + "description": "Code quality tools and workflows", + "author": "openhands", + "license": "MIT", + "repository": "https://github.com/example/code-quality-plugin" } ``` -## Ready-to-Run Basic MCP Usage Example +### Skills - -This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) - +Skills are defined in markdown files with YAML frontmatter: -Here's an example integrating MCP servers with an agent: +```markdown icon="file-code" +--- +name: python-linting +description: Instructions for linting Python code +trigger: + type: keyword + keywords: + - lint + - linting + - code quality +--- -```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py -import os +# Python Linting Skill -from pydantic import SecretStr +Run ruff to check for issues: -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +\`\`\`bash +ruff check . +\`\`\` +``` +### Hooks -logger = get_logger(__name__) +Hooks are defined in `hooks/hooks.json`: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +```json icon="file-code" wrap +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "file_editor", + "hooks": [ + { + "type": "command", + "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", + "timeout": 5 + } + ] + } + ] + } +} +``` -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +### MCP Configuration -# Add MCP Tools -mcp_config = { - "mcpServers": { - "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, - "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, +MCP servers are configured in `.mcp.json`: + +```json wrap icon="file-code" +{ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] } + } } -# Agent -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - # This regex filters out all repomix tools except pack_codebase - filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", -) +``` -llm_messages = [] # collect raw LLM messages +## Using Plugin Components +> The ready-to-run example is available [here](#ready-to-run-example)! -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +Brief explanation on how to use a plugin with an agent. + + + ### Loading a Plugin + First, load the desired plugins. -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, -) -conversation.set_security_analyzer(LLMSecurityAnalyzer()) + ```python icon="python" + from openhands.sdk.plugin import Plugin -logger.info("Starting conversation with MCP integration...") -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands and write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() + # Load a single plugin + plugin = Plugin.load("/path/to/plugin") -conversation.send_message("Great! Now delete that file.") -conversation.run() + # Load all plugins from a directory + plugins = Plugin.load_all("/path/to/plugins") + ``` + + + ### Accessing Components + You can access the different plugin components to see which ones are available. -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") + ```python icon="python" + # Skills + for skill in plugin.skills: + print(f"Skill: {skill.name}") -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` + # Hooks configuration + if plugin.hooks: + print(f"Hooks configured: {plugin.hooks}") - + # MCP servers + if plugin.mcp_config: + servers = plugin.mcp_config.get("mcpServers", {}) + print(f"MCP servers: {list(servers.keys())}") + ``` + + + ### Using with an Agent + You can now feed your agent with your preferred plugin. -## Ready-to-Run MCP with OAuth Example + ```python focus={3,10,17} icon="python" + # Create agent context with plugin skills + agent_context = AgentContext( + skills=plugin.skills, + ) + + # Create agent with plugin MCP config + agent = Agent( + llm=llm, + tools=tools, + mcp_config=plugin.mcp_config or {}, + agent_context=agent_context, + ) + + # Create conversation with plugin hooks + conversation = Conversation( + agent=agent, + hook_config=plugin.hooks, + ) + ``` + + + +## Ready-to-run Example -This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) +This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) -```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py +```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py +"""Example: Loading Plugins via Conversation + +Demonstrates the recommended way to load plugins using the `plugins` parameter +on Conversation. Plugins bundle skills, hooks, and MCP config together. + +For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins +""" + import os +import sys +import tempfile +from pathlib import Path from pydantic import SecretStr -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.plugin import PluginSource from openhands.sdk.tool import Tool from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -logger = get_logger(__name__) +# Locate example plugin directory +script_dir = Path(__file__).parent +plugin_path = script_dir / "example_plugins" / "code-quality" -# Configure LLM +# Define plugins to load +# Supported sources: local path, "github:owner/repo", or git URL +# Optional: ref (branch/tag/commit), repo_path (for monorepos) +plugins = [ + PluginSource(source=str(plugin_path)), + # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), + # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), +] + +# Check for API key api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +if not api_key: + print("Set LLM_API_KEY to run this example") + print("EXAMPLE_COST: 0") + sys.exit(0) + +# Configure LLM and Agent model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") llm = LLM( - usage_id="agent", + usage_id="plugin-demo", model=model, - base_url=base_url, api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) +agent = Agent( + llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] ) -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] - -mcp_config = { - "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} -} -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) - -llm_messages = [] # collect raw LLM messages - - -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +# Create conversation with plugins - skills, MCP config, and hooks are merged +# Note: Plugins are loaded lazily on first send_message() or run() call +with tempfile.TemporaryDirectory() as tmpdir: + conversation = Conversation( + agent=agent, + workspace=tmpdir, + plugins=plugins, + ) + # Test: The "lint" keyword triggers the python-linting skill + # This first send_message() call triggers lazy plugin loading + conversation.send_message("How do I lint Python code? Brief answer please.") -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], -) + # Verify skills were loaded from the plugin (after lazy loading) + skills = ( + conversation.agent.agent_context.skills + if conversation.agent.agent_context + else [] + ) + print(f"Loaded {len(skills)} skill(s) from plugins") -logger.info("Starting conversation with MCP integration...") -conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") -conversation.run() + conversation.run() -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") + print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") ``` - - -## Next Steps + -- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools -- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage -- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation +## Next Steps -# Metrics Tracking -Source: https://docs.openhands.dev/sdk/guides/metrics +- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers +- **[Hooks](/sdk/guides/hooks)** - Understand hook event types +- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Secret Registry +Source: https://docs.openhands.dev/sdk/guides/secrets.md -## Overview +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: -- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. -- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). +> A ready-to-run example is available [here](#ready-to-run-example)! -## Getting Metrics from Individual LLMs +The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. +It automatically detects secret references in bash commands, injects them as environment variables when needed, +and masks secret values in command outputs to prevent accidental exposure. -> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! +### Injecting Secrets -Track token usage, costs, and performance metrics from LLM interactions: +Use the `update_secrets()` method to add secrets to your conversation. -### Accessing Individual LLM Metrics -Access metrics directly from the LLM object after running the conversation: +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: -```python icon="python" focus={3-4} -conversation.run() +```python focus={4,11} icon="python" wrap +from openhands.sdk.conversation.secret_source import SecretSource -assert llm.metrics is not None -print(f"Final LLM metrics: {llm.metrics.model_dump()}") -``` +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) -The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" -- `accumulated_cost` - Total accumulated cost across all API calls -- `accumulated_token_usage` - Aggregated token usage with fields like: - - `prompt_tokens` - Number of input tokens processed - - `completion_tokens` - Number of output tokens generated - - `cache_read_tokens` - Cache hits (if supported by the model) - - `cache_write_tokens` - Cache writes (if supported by the model) - - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) - - `context_window` - Context window size used -- `costs` - List of individual cost records per API call -- `token_usages` - List of detailed token usage records per API call -- `response_latencies` - List of response latency metrics per API call +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +``` - - For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). - +## Ready-to-run Example -### Ready-to-run Example (LLM metrics) -This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) -```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py import os from pydantic import SecretStr @@ -31061,17 +31023,13 @@ from openhands.sdk import ( LLM, Agent, Conversation, - Event, - LLMConvertibleEvent, - get_logger, ) +from openhands.sdk.secret import SecretSource from openhands.sdk.tool import Tool from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -logger = get_logger(__name__) - # Configure LLM api_key = os.getenv("LLM_API_KEY") assert api_key is not None, "LLM_API_KEY environment variable is not set." @@ -31084,2477 +31042,2368 @@ llm = LLM( api_key=SecretStr(api_key), ) -cwd = os.getcwd() +# Tools tools = [ Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name), ] -# Add MCP Tools -mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} - # Agent -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) - -llm_messages = [] # collect raw LLM messages +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} ) -logger.info("Starting conversation with MCP integration...") -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands and write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() +conversation.send_message("just echo $SECRET_TOKEN") -conversation.send_message("Great! Now delete that file.") conversation.run() -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") -assert llm.metrics is not None -print( - f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" -) +conversation.run() # Report cost cost = llm.metrics.accumulated_cost print(f"EXAMPLE_COST: {cost}") ``` - + -## Using LLM Registry for Cost Tracking +## Next Steps -> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP +- **[Security Analyzer](/sdk/guides/security)** - Add security validation -The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. +### Security & Action Confirmation +Source: https://docs.openhands.dev/sdk/guides/security.md -### How the LLM Registry Works +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user +approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. -1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` -2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` -3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` -4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID +## Confirmation Policy +> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! -This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python icon="python" focus={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) +``` + +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python icon="python" focus={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python icon="python" focus={2-5} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky." + "Please try a safer approach." + ) +``` + +### Ready-to-run Example Confirmation -### Ready-to-run Example (LLM Registry) -This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) +Require user approval before executing agent actions: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" -```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py import os +import signal +from collections.abc import Callable from pydantic import SecretStr -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - LLMRegistry, - Message, - TextContent, - get_logger, +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, ) -from openhands.sdk.tool import Tool -from openhands.tools.terminal import TerminalTool +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.tools.preset.default import get_default_agent -logger = get_logger(__name__) +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) -# Configure LLM using LLMRegistry + +def _print_action_preview(pending_actions) -> None: + print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing actions…") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("▶️ Running conversation.run()…") + conversation.run() + + +# Configure LLM api_key = os.getenv("LLM_API_KEY") assert api_key is not None, "LLM_API_KEY environment variable is not set." model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") - -# Create LLM instance -main_llm = LLM( +llm = LLM( usage_id="agent", model=model, base_url=base_url, api_key=SecretStr(api_key), ) -# Create LLM registry and add the LLM -llm_registry = LLMRegistry() -llm_registry.add(main_llm) - -# Get LLM from registry -llm = llm_registry.get("agent") +agent = get_default_agent(llm=llm) +conversation = Conversation(agent=agent, workspace=os.getcwd()) -# Tools -cwd = os.getcwd() -tools = [Tool(name=TerminalTool.name)] +# Conditionally add security analyzer based on environment variable +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") + conversation.set_security_analyzer(LLMSecurityAnalyzer()) -# Agent -agent = Agent(llm=llm, tools=tools) +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) -llm_messages = [] # collect raw LLM messages +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets execution_status=WAITING_FOR_CONFIRMATION" ) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` -conversation.send_message("Please echo 'Hello!'") -conversation.run() + -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +--- -print("=" * 100) -print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") +## Security Analyzer -# Demonstrate getting the same LLM instance from registry -same_llm = llm_registry.get("agent") -print(f"Same LLM instance: {llm is same_llm}") +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: -# Demonstrate requesting a completion directly from an LLM -resp = llm.completion( - messages=[ - Message(role="user", content=[TextContent(text="Say hello in one word.")]) - ] -) -# Access the response content via OpenHands LLMResponse -msg = resp.message -texts = [c.text for c in msg.content if isinstance(c, TextContent)] -print(f"Direct completion response: {texts[0] if texts else str(msg)}") +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. + +### LLM Security Analyzer + +> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! + +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. + +#### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python icon="python" focus={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) ``` - -### Getting Aggregated Conversation Costs +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent + +#### Ready-to-run Example Security Analyzer -This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) -Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" -```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py import os +import signal +from collections.abc import Callable from pydantic import SecretStr -from tabulate import tabulate -from openhands.sdk import ( - LLM, - Agent, - Conversation, - LLMSummarizingCondenser, - Message, - TextContent, - get_logger, +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, ) -from openhands.sdk.tool.spec import Tool +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -logger = get_logger(__name__) +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) -# Configure LLM using LLMRegistry -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -# Create LLM instance -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +def _print_blocked_actions(pending_actions) -> None: + print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") -llm_condenser = LLM( - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - usage_id="condenser", -) -# Tools -condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False -cwd = os.getcwd() -agent = Agent( - llm=llm, - tools=[ - Tool( - name=TerminalTool.name, - ), - ], - condenser=condenser, -) + if ans in ("yes", "y"): + print("✅ Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") -conversation = Conversation(agent=agent, workspace=cwd) -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text="Please echo 'Hello!'")], - ) -) -conversation.run() -# Demonstrate extraneous costs part of the conversation -second_llm = LLM( - usage_id="demo-secondary", +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set execution_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("▶️ Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", model=model, - base_url=os.getenv("LLM_BASE_URL"), + base_url=base_url, api_key=SecretStr(api_key), ) -conversation.llm_registry.add(second_llm) -completion_response = second_llm.completion( - messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] -) -# Access total spend -spend = conversation.conversation_stats.get_combined_metrics() -print("\n=== Total Spend for Conversation ===\n") -print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") -if spend.accumulated_token_usage: - print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") - print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") - print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") - print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] -spend_per_usage = conversation.conversation_stats.usage_to_metrics -print("\n=== Spend Breakdown by Usage ID ===\n") -rows = [] -for usage_id, metrics in spend_per_usage.items(): - rows.append( - [ - usage_id, - f"${metrics.accumulated_cost:.6f}", - metrics.accumulated_token_usage.prompt_tokens - if metrics.accumulated_token_usage - else 0, - metrics.accumulated_token_usage.completion_tokens - if metrics.accumulated_token_usage - else 0, - ] - ) +# Agent +agent = Agent(llm=llm, tools=tools) -print( - tabulate( - rows, - headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], - tablefmt="github", - ) +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." ) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) +conversation.set_confirmation_policy(ConfirmRisky()) -# Report cost -cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) ``` - + -### Understanding Conversation Stats +### Custom Security Analyzer Implementation -The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. -#### Key Methods and Properties +#### Creating a Custom Analyzer -- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. - -- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: -- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. +```python icon="python" focus={5, 8} +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent -```python icon="python" focus={2, 6, 10} -# Get combined metrics for the entire conversation -total_metrics = conversation.conversation_stats.get_combined_metrics() -print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" -# Get metrics for a specific LLM by usage ID -agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") -print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW -# Access all usage IDs and their metrics -for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): - print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) ``` -## Next Steps - -- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs -- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models - + + For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + -# Observability & Tracing -Source: https://docs.openhands.dev/sdk/guides/observability -> A full setup example is available [here](#example:-full-setup)! +--- -## Overview +## Configurable Security Policy -The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: +> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! -- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support -- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing -- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more +Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. -The SDK automatically traces: -- Agent execution steps -- Tool calls and executions -- LLM API calls (via LiteLLM integration) -- Browser automation sessions (when using browser-use) -- Conversation lifecycle events -## Quick Start +### Using Custom Security Policies -Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. +You can provide a custom security policy template when creating an agent: -### Using Laminar +```python focus={9-13} icon="python" +from openhands.sdk import Agent, LLM -[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: +llm = LLM( + usage_id="agent", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), +) -```bash icon="terminal" wrap -# Set your Laminar project API key -export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# Provide a custom security policy template file +agent = Agent( + llm=llm, + tools=tools, + security_policy_filename="my_security_policy.j2", +) ``` -That's it! Run your agent code normally and traces will be sent to Laminar automatically. +Custom security policies allow you to: +- Define organization-specific risk assessment guidelines +- Set custom thresholds for security risk levels +- Add domain-specific security rules +- Tailor risk evaluation to your use case -### Using Honeycomb or Other OTLP Backends +The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. -For Honeycomb, Jaeger, or any other OTLP-compatible backend: +### Ready-to-run Example Security Policy -```bash icon="terminal" wrap -# Required: Set the OTLP endpoint -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" + +Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) + -# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" +Define custom security risk guidelines for your agent: -# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it -``` +```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py +"""OpenHands Agent SDK — Configurable Security Policy Example -### Alternative Configuration Methods +This example demonstrates how to use a custom security policy template +with an agent. Security policies define risk assessment guidelines that +help agents evaluate the safety of their actions. -You can also use these alternative environment variable formats: +By default, agents use the built-in security_policy.j2 template. This +example shows how to: +1. Use the default security policy +2. Provide a custom security policy template embedded in the script +3. Apply the custom policy to guide agent behavior +""" -```bash icon="terminal" wrap -# Short form for endpoint -export OTEL_ENDPOINT="http://localhost:4317" +import os +import tempfile +from pathlib import Path -# Alternative header format -export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" +from pydantic import SecretStr -# Alternative protocol specification -export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" -``` +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -## How It Works -The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: +logger = get_logger(__name__) -1. **Detects Configuration**: Checks for OTEL environment variables on startup -2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter -3. **Instruments Code**: Automatically wraps key functions with tracing decorators -4. **Captures Context**: Associates traces with conversation IDs for session grouping -5. **Exports Spans**: Sends trace data to your configured backend +# Define a custom security policy template inline +CUSTOM_SECURITY_POLICY = ( + "# 🔐 Custom Security Risk Policy\n" + "When using tools that support the security_risk parameter, assess the " + "safety risk of your actions:\n" + "\n" + "- **LOW**: Safe read-only actions.\n" + " - Viewing files, calculations, documentation.\n" + "- **MEDIUM**: Moderate container-scoped actions.\n" + " - File modifications, package installations.\n" + "- **HIGH**: Potentially dangerous actions.\n" + " - Network access, system modifications, data exfiltration.\n" + "\n" + "**Custom Rules**\n" + "- Always prioritize user data safety.\n" + "- Escalate to **HIGH** for any external data transmission.\n" +) -### What Gets Traced +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -The SDK automatically instruments these components: +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -- **`agent.step`** - Each iteration of the agent's execution loop -- **Tool Executions** - Individual tool calls with input/output capture -- **LLM Calls** - API requests to language models via LiteLLM -- **Conversation Lifecycle** - Message sending, conversation runs, and title generation -- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) +# Example 1: Agent with default security policy +print("=" * 100) +print("Example 1: Agent with default security policy") +print("=" * 100) +default_agent = Agent(llm=llm, tools=tools) +print(f"Security policy filename: {default_agent.security_policy_filename}") +print("\nDefault security policy is embedded in the agent's system message.") -### Trace Hierarchy +# Example 2: Agent with custom security policy +print("\n" + "=" * 100) +print("Example 2: Agent with custom security policy") +print("=" * 100) -Traces are organized hierarchically: +# Create a temporary file for the custom security policy +with tempfile.NamedTemporaryFile( + mode="w", suffix=".j2", delete=False, encoding="utf-8" +) as temp_file: + temp_file.write(CUSTOM_SECURITY_POLICY) + custom_policy_path = temp_file.name - - - - - - - - - - - - - +try: + # Create agent with custom security policy (using absolute path) + custom_agent = Agent( + llm=llm, + tools=tools, + security_policy_filename=custom_policy_path, + ) + print(f"Security policy filename: {custom_agent.security_policy_filename}") + print("\nCustom security policy loaded from temporary file.") -Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single -conversation together in your observability platform. + # Verify the custom policy is in the system message + system_message = custom_agent.static_system_message + if "Custom Security Risk Policy" in system_message: + print("✓ Custom security policy successfully embedded in system message.") + else: + print("✗ Custom security policy not found in system message.") -Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. + # Run a conversation with the custom agent + print("\n" + "=" * 100) + print("Running conversation with custom security policy") + print("=" * 100) -## Configuration Reference + llm_messages = [] # collect raw LLM messages -### Environment Variables + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -The SDK checks for these environment variables (in order of precedence): + conversation = Conversation( + agent=custom_agent, + callbacks=[conversation_callback], + workspace=".", + ) -| Variable | Description | Example | -|----------|-------------|---------| -| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | -| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | -| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | -| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | -| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | -| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | -| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | -| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | + conversation.send_message( + "Please create a simple Python script named hello.py that prints " + "'Hello, World!'. Make sure to follow security best practices." + ) + conversation.run() -### Header Format + print("\n" + "=" * 100) + print("Conversation finished.") + print(f"Total LLM messages: {len(llm_messages)}") + print("=" * 100) -Headers should be comma-separated `key=value` pairs with URL encoding for special characters: + # Report cost + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") -```bash icon="terminal" wrap -# Single header -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" +finally: + # Clean up temporary file + Path(custom_policy_path).unlink(missing_ok=True) -# Multiple headers -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +print("\n" + "=" * 100) +print("Example Summary") +print("=" * 100) +print("This example demonstrated:") +print("1. Using the default security policy (security_policy.j2)") +print("2. Creating a custom security policy template") +print("3. Applying the custom policy via security_policy_filename parameter") +print("4. Running a conversation with the custom security policy") +print( + "\nYou can customize security policies to match your organization's " + "specific requirements." +) ``` -### Protocol Options - -The SDK supports both HTTP and gRPC protocols: - -- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) -- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) - -## Platform-Specific Configuration - -### Laminar Setup + -1. Sign up at [laminar.sh](https://laminar.sh/) -2. Create a project and copy your API key -3. Set the environment variable: +## Next Steps -```bash icon="terminal" wrap -export LMNR_PROJECT_API_KEY="your-laminar-api-key" -``` +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management -**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. +### Agent Skills & Context +Source: https://docs.openhands.dev/sdk/guides/skill.md -### Honeycomb Setup +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -1. Sign up at [honeycomb.io](https://www.honeycomb.io/) -2. Get your API key from the account settings -3. Configure the environment: +This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). -```bash icon="terminal" wrap -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" -``` +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. -### Jaeger Setup +## Context Loading Methods -For local development with Jaeger: +| Method | When Content Loads | Use Case | +|--------|-------------------|----------| +| **Always-loaded** | At conversation start | Repository rules, coding standards | +| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | +| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | -```bash icon="terminal" wrap -# Start Jaeger all-in-one container -docker run -d --name jaeger \ - -p 4317:4317 \ - -p 16686:16686 \ - jaegertracing/all-in-one:latest +## Always-Loaded Context -# Configure SDK -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" -``` +Content that's always in the system prompt. -Access the Jaeger UI at http://localhost:16686 +### Option 1: `AGENTS.md` (Auto-loaded) -### Generic OTLP Collector +Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). -For other backends, use their OTLP endpoint: +```python icon="python" focus={3, 4} +from openhands.sdk.context.skills import load_project_skills -```bash icon="terminal" wrap -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root +skills = load_project_skills(workspace_dir="/path/to/repo") +agent_context = AgentContext(skills=skills) ``` -## Advanced Usage - -### Disabling Observability +### Option 2: Inline Skill (Code-defined) -To disable tracing, simply unset all OTEL environment variables: +```python icon="python" focus={5-11} +from openhands.sdk import AgentContext +from openhands.sdk.context import Skill -```bash icon="terminal" wrap -unset LMNR_PROJECT_API_KEY -unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT -unset OTEL_EXPORTER_OTLP_ENDPOINT -unset OTEL_ENDPOINT +agent_context = AgentContext( + skills=[ + Skill( + name="code-style", + content="Always use type hints in Python.", + trigger=None, # No trigger = always loaded + ), + ] +) ``` -The SDK will automatically skip all tracing instrumentation with minimal overhead. - -### Custom Span Attributes - -The SDK automatically adds these attributes to spans: - -- **`conversation_id`** - UUID of the conversation -- **`tool_name`** - Name of the tool being executed -- **`action.kind`** - Type of action being performed -- **`session_id`** - Groups all traces from one conversation - -### Debugging Tracing Issues - -If traces aren't appearing in your observability platform: - -1. **Verify Environment Variables**: - ```python icon="python" wrap - import os - - otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') - otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') - - print(f"OTEL Endpoint: {otel_endpoint}") - print(f"OTEL Headers: {otel_headers}") - ``` - -2. **Check SDK Logs**: The SDK logs observability initialization at debug level: - ```python icon="python" wrap - import logging +## Trigger-Loaded Context - logging.basicConfig(level=logging.DEBUG) - ``` +Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). -3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: - ```bash icon="terminal" wrap - curl -v https://api.honeycomb.io:443/v1/traces - ``` +```python icon="python" focus={6} +from openhands.sdk.context import Skill, KeywordTrigger -4. **Validate Headers**: Check that authentication headers are properly URL-encoded +Skill( + name="encryption-helper", + content="Use the encrypt.sh script to encrypt messages.", + trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), +) +``` -## Troubleshooting +When user says "encrypt this", the content is injected into the message: -### Traces Not Appearing +```xml icon="file" + +The following information has been included based on a keyword match for "encrypt". +Skill location: /path/to/encryption-helper -**Problem**: No traces showing up in observability platform +Use the encrypt.sh script to encrypt messages. + +``` -**Solutions**: -- Verify environment variables are set correctly -- Check network connectivity to OTLP endpoint -- Ensure authentication headers are valid -- Look for SDK initialization logs at debug level +## Progressive Disclosure (AgentSkills Standard) -### High Trace Volume +For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. -**Problem**: Too many spans being generated +```python icon="python" +from openhands.sdk.context.skills import load_skills_from_dir -**Solutions**: -- Configure sampling at the collector level -- For Laminar with non-browser tools, browser instrumentation is automatically disabled -- Use backend-specific filtering rules +# Load SKILL.md files from a directory +_, _, agent_skills = load_skills_from_dir("/path/to/skills") +agent_context = AgentContext(skills=list(agent_skills.values())) +``` -### Performance Impact +Skills are listed in the system prompt: +```xml icon="file" + + + code-style + Project coding standards. + /path/to/code-style/SKILL.md + + +``` -**Problem**: Concerned about tracing overhead + +Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. + -**Solutions**: -- Tracing has minimal overhead when properly configured -- Disable tracing in development by unsetting environment variables -- Use asynchronous exporters (default in most OTLP configurations) +--- -## Example: Full Setup +## Full Example -This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) +Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) -```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py -""" -Observability & Laminar example - -This example demonstrates enabling OpenTelemetry tracing with Laminar in the -OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. -""" - +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py import os from pydantic import SecretStr -from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: -# export LMNR_PROJECT_API_KEY="your-laminar-api-key" -# For non-Laminar OTLP backends, set OTEL_* variables instead. +logger = get_logger(__name__) -# Configure LLM and Agent +# Configure LLM api_key = os.getenv("LLM_API_KEY") -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") llm = LLM( + usage_id="agent", model=model, - api_key=SecretStr(api_key) if api_key else None, base_url=base_url, - usage_id="agent", + api_key=SecretStr(api_key), ) -agent = Agent( - llm=llm, - tools=[Tool(name=TerminalTool.name)], +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# AgentContext provides flexible ways to customize prompts: +# 1. Skills: Inject instructions (always-active or keyword-triggered) +# 2. system_message_suffix: Append text to the system prompt +# 3. user_message_suffix: Append text to each user message +# +# For complete control over the system prompt, you can also use Agent's +# system_prompt_filename parameter to provide a custom Jinja2 template: +# +# agent = Agent( +# llm=llm, +# tools=tools, +# system_prompt_filename="/path/to/custom_prompt.j2", +# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, +# ) +# +# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active (repo skill) + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + # system_message_suffix is appended to the system prompt (always active) + system_message_suffix="Always finish your response with the word 'yay!'", + # user_message_suffix is appended to each user message + user_message_suffix="The first character of your response should be 'I'", + # You can also enable automatic load skills from + # public registry at https://github.com/OpenHands/extensions + load_public_skills=True, ) -# Create conversation and run a simple task -conversation = Conversation(agent=agent, workspace=".") -conversation.send_message("List the files in the current directory and print them.") +# Agent +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") conversation.run() -print( - "All done! Check your Laminar dashboard for traces " - "(session is the conversation UUID)." + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Now triggering public skill 'github'") +conversation.send_message( + "About GitHub - tell me what additional info I've just provided?" ) -``` +conversation.run() -```bash Running the Example -export LMNR_PROJECT_API_KEY="your-laminar-api-key" -cd software-agent-sdk -uv run python examples/01_standalone_sdk/27_observability_laminar.py +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -## Next Steps + -- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces -- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application -- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions +### Creating Skills +Skills are defined with a name, content (the instructions), and an optional trigger: -# Plugins -Source: https://docs.openhands.dev/sdk/guides/plugins +```python icon="python" focus={3-14} +agent_context = AgentContext( + skills=[ + Skill( + name="AGENTS.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Keyword Triggers -Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: +Use `KeywordTrigger` to activate skills only when specific words appear: -- **Skills**: Specialized knowledge and workflows -- **Hooks**: Event handlers for tool lifecycle -- **MCP Config**: External tool server configurations -- **Agents**: Specialized agent definitions -- **Commands**: Slash commands +```python icon="python" focus={4} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` -The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). -## Plugin Structure +## File-Based Skills (`SKILL.md`) + +For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. -See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. +Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) -A plugin follows this directory structure: +### Directory Structure + +Each skill is a directory containing: - - - - - - - - - - - + + + + - - + + - - + + - - -Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. +where -### Plugin Manifest +| Component | Required | Description | +|-------|----------|-------------| +| `SKILL.md` | Yes | Skill definition with frontmatter | +| `scripts/` | No | Executable scripts | +| `references/` | No | Reference documentation | +| `assets/` | No | Static assets | -The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: -```json icon="file-code" wrap -{ - "name": "code-quality", - "version": "1.0.0", - "description": "Code quality tools and workflows", - "author": "openhands", - "license": "MIT", - "repository": "https://github.com/example/code-quality-plugin" -} -``` -### Skills +### `SKILL.md` Format -Skills are defined in markdown files with YAML frontmatter: +The `SKILL.md` file defines the skill with YAML frontmatter: -```markdown icon="file-code" +```md icon="markdown" --- -name: python-linting -description: Instructions for linting Python code -trigger: - type: keyword - keywords: - - lint - - linting - - code quality +name: my-skill # Required (standard) +description: > # Required (standard) + A brief description of what this skill does and when to use it. +license: MIT # Optional (standard) +compatibility: Requires bash # Optional (standard) +metadata: # Optional (standard) + author: your-name + version: "1.0" +triggers: # Optional (OpenHands extension) + - keyword1 + - keyword2 --- -# Python Linting Skill - -Run ruff to check for issues: - -\`\`\`bash -ruff check . -\`\`\` -``` - -### Hooks - -Hooks are defined in `hooks/hooks.json`: - -```json icon="file-code" wrap -{ - "hooks": { - "PostToolUse": [ - { - "matcher": "file_editor", - "hooks": [ - { - "type": "command", - "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", - "timeout": 5 - } - ] - } - ] - } -} -``` - -### MCP Configuration - -MCP servers are configured in `.mcp.json`: +# Skill Content -```json wrap icon="file-code" -{ - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} +Instructions and documentation for the agent... ``` -## Using Plugin Components - -> The ready-to-run example is available [here](#ready-to-run-example)! - -Brief explanation on how to use a plugin with an agent. +#### Frontmatter Fields - - - ### Loading a Plugin - First, load the desired plugins. +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Skill identifier (lowercase + hyphens) | +| `description` | Yes | What the skill does (shown to agent) | +| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | +| `license` | No | License name | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Custom key-value pairs | - ```python icon="python" - from openhands.sdk.plugin import Plugin + +Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. + - # Load a single plugin - plugin = Plugin.load("/path/to/plugin") +### Loading Skills - # Load all plugins from a directory - plugins = Plugin.load_all("/path/to/plugins") - ``` - - - ### Accessing Components - You can access the different plugin components to see which ones are available. +Use `load_skills_from_dir()` to load all skills from a directory: - ```python icon="python" - # Skills - for skill in plugin.skills: - print(f"Skill: {skill.name}") +```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py +"""Example: Loading Skills from Disk (AgentSkills Standard) - # Hooks configuration - if plugin.hooks: - print(f"Hooks configured: {plugin.hooks}") +This example demonstrates how to load skills following the AgentSkills standard +from a directory on disk. - # MCP servers - if plugin.mcp_config: - servers = plugin.mcp_config.get("mcpServers", {}) - print(f"MCP servers: {list(servers.keys())}") - ``` - - - ### Using with an Agent - You can now feed your agent with your preferred plugin. +Skills are modular, self-contained packages that extend an agent's capabilities +by providing specialized knowledge, workflows, and tools. They follow the +AgentSkills standard which includes: +- SKILL.md file with frontmatter metadata (name, description, triggers) +- Optional resource directories: scripts/, references/, assets/ - ```python focus={3,10,17} icon="python" - # Create agent context with plugin skills - agent_context = AgentContext( - skills=plugin.skills, - ) +The example_skills/ directory contains two skills: +- rot13-encryption: Has triggers (encrypt, decrypt) - listed in + AND content auto-injected when triggered +- code-style-guide: No triggers - listed in for on-demand access - # Create agent with plugin MCP config - agent = Agent( - llm=llm, - tools=tools, - mcp_config=plugin.mcp_config or {}, - agent_context=agent_context, - ) +All SKILL.md files follow the AgentSkills progressive disclosure model: +they are listed in with name, description, and location. +Skills with triggers get the best of both worlds: automatic content injection +when triggered, plus the agent can proactively read them anytime. +""" - # Create conversation with plugin hooks - conversation = Conversation( - agent=agent, - hook_config=plugin.hooks, - ) - ``` - - +import os +import sys +from pathlib import Path -## Ready-to-run Example +from pydantic import SecretStr - -This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) - +from openhands.sdk import LLM, Agent, AgentContext, Conversation +from openhands.sdk.context.skills import ( + discover_skill_resources, + load_skills_from_dir, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py -"""Example: Loading Plugins via Conversation -Demonstrates the recommended way to load plugins using the `plugins` parameter -on Conversation. Plugins bundle skills, hooks, and MCP config together. +# Get the directory containing this script +script_dir = Path(__file__).parent +example_skills_dir = script_dir / "example_skills" -For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins -""" +# ========================================================================= +# Part 1: Loading Skills from a Directory +# ========================================================================= +print("=" * 80) +print("Part 1: Loading Skills from a Directory") +print("=" * 80) -import os -import sys -import tempfile -from pathlib import Path +print(f"Loading skills from: {example_skills_dir}") -from pydantic import SecretStr +# Discover resources in the skill directory +skill_subdir = example_skills_dir / "rot13-encryption" +resources = discover_skill_resources(skill_subdir) +print("\nDiscovered resources in rot13-encryption/:") +print(f" - scripts: {resources.scripts}") +print(f" - references: {resources.references}") +print(f" - assets: {resources.assets}") -from openhands.sdk import LLM, Agent, Conversation -from openhands.sdk.plugin import PluginSource -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +# Load skills from the directory +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) +print("\nLoaded skills from directory:") +print(f" - Repo skills: {list(repo_skills.keys())}") +print(f" - Knowledge skills: {list(knowledge_skills.keys())}") +print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") -# Locate example plugin directory -script_dir = Path(__file__).parent -plugin_path = script_dir / "example_plugins" / "code-quality" +# Access the loaded skill and show all AgentSkills standard fields +if agent_skills: + skill_name = next(iter(agent_skills)) + loaded_skill = agent_skills[skill_name] + print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") + print(f" - Name: {loaded_skill.name}") + desc = loaded_skill.description or "" + print(f" - Description: {desc[:70]}...") + print(f" - License: {loaded_skill.license}") + print(f" - Compatibility: {loaded_skill.compatibility}") + print(f" - Metadata: {loaded_skill.metadata}") + if loaded_skill.resources: + print(" - Resources:") + print(f" - Scripts: {loaded_skill.resources.scripts}") + print(f" - References: {loaded_skill.resources.references}") + print(f" - Assets: {loaded_skill.resources.assets}") + print(f" - Skill root: {loaded_skill.resources.skill_root}") -# Define plugins to load -# Supported sources: local path, "github:owner/repo", or git URL -# Optional: ref (branch/tag/commit), repo_path (for monorepos) -plugins = [ - PluginSource(source=str(plugin_path)), - # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), - # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), -] +# ========================================================================= +# Part 2: Using Skills with an Agent +# ========================================================================= +print("\n" + "=" * 80) +print("Part 2: Using Skills with an Agent") +print("=" * 80) # Check for API key api_key = os.getenv("LLM_API_KEY") if not api_key: - print("Set LLM_API_KEY to run this example") - print("EXAMPLE_COST: 0") + print("Skipping agent demo (LLM_API_KEY not set)") + print("\nTo run the full demo, set the LLM_API_KEY environment variable:") + print(" export LLM_API_KEY=your-api-key") sys.exit(0) -# Configure LLM and Agent +# Configure LLM model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") llm = LLM( - usage_id="plugin-demo", + usage_id="skills-demo", model=model, api_key=SecretStr(api_key), base_url=os.getenv("LLM_BASE_URL"), ) -agent = Agent( - llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] -) -# Create conversation with plugins - skills, MCP config, and hooks are merged -# Note: Plugins are loaded lazily on first send_message() or run() call -with tempfile.TemporaryDirectory() as tmpdir: - conversation = Conversation( - agent=agent, - workspace=tmpdir, - plugins=plugins, - ) +# Create agent context with loaded skills +agent_context = AgentContext( + skills=list(agent_skills.values()), + # Disable public skills for this demo to keep output focused + load_public_skills=False, +) - # Test: The "lint" keyword triggers the python-linting skill - # This first send_message() call triggers lazy plugin loading - conversation.send_message("How do I lint Python code? Brief answer please.") +# Create agent with tools so it can read skill resources +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) - # Verify skills were loaded from the plugin (after lazy loading) - skills = ( - conversation.agent.agent_context.skills - if conversation.agent.agent_context - else [] - ) - print(f"Loaded {len(skills)} skill(s) from plugins") +# Create conversation +conversation = Conversation(agent=agent, workspace=os.getcwd()) - conversation.run() +# Test the skill (triggered by "encrypt" keyword) +# The skill provides instructions and a script for ROT13 encryption +print("\nSending message with 'encrypt' keyword to trigger skill...") +conversation.send_message("Encrypt the message 'hello world'.") +conversation.run() - print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") +print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") ``` - + -## Next Steps +### Key Functions -- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers -- **[Hooks](/sdk/guides/hooks)** - Understand hook event types -- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers +#### `load_skills_from_dir()` +Loads all skills from a directory, returning three dictionaries: -# Secret Registry -Source: https://docs.openhands.dev/sdk/guides/secrets +```python icon="python" focus={3} +from openhands.sdk.context.skills import load_skills_from_dir -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +- **repo_skills**: Skills from `repo.md` files (always active) +- **knowledge_skills**: Skills from `knowledge/` subdirectories +- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) -The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. -It automatically detects secret references in bash commands, injects them as environment variables when needed, -and masks secret values in command outputs to prevent accidental exposure. +#### `discover_skill_resources()` -### Injecting Secrets +Discovers resource files in a skill directory: -Use the `update_secrets()` method to add secrets to your conversation. +```python icon="python" focus={3} +from openhands.sdk.context.skills import discover_skill_resources +resources = discover_skill_resources(skill_dir) +print(resources.scripts) # List of script files +print(resources.references) # List of reference files +print(resources.assets) # List of asset files +print(resources.skill_root) # Path to skill directory +``` -Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: +### Skill Location in Prompts -```python focus={4,11} icon="python" wrap -from openhands.sdk.conversation.secret_source import SecretSource +The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: -# Static secret -conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) +``` + +The following information has been included based on a keyword match for "encrypt". -# Dynamic secret using SecretSource -class MySecretSource(SecretSource): - def get_value(self) -> str: - return "callable-based-secret" +Skill location: /path/to/rot13-encryption +(Use this path to resolve relative file references in the skill content below) -conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +[skill content from SKILL.md] + ``` -## Ready-to-run Example +This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. - -This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) - +### Example Skill: ROT13 Encryption -```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py -import os +Here's a skill with triggers (OpenHands extension): -from pydantic import SecretStr +**SKILL.md:** +```markdown icon="markdown" +--- +name: rot13-encryption +description: > + This skill helps encrypt and decrypt messages using ROT13 cipher. +triggers: + - encrypt + - decrypt + - cipher +--- -from openhands.sdk import ( - LLM, - Agent, - Conversation, -) -from openhands.sdk.secret import SecretSource -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +# ROT13 Encryption Skill +Run the [encrypt.sh](scripts/encrypt.sh) script with your message: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +\`\`\`bash +./scripts/encrypt.sh "your message" +\`\`\` +``` -# Tools -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +**scripts/encrypt.sh:** +```bash icon="sh" +#!/bin/bash +echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +``` -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent) +When the user says "encrypt", the skill is triggered and the agent can use the provided script. +## Loading Public Skills -class MySecretSource(SecretSource): - def get_value(self) -> str: - return "callable-based-secret" +OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. +### Automatic Loading via AgentContext -conversation.update_secrets( - {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} +Enable public skills loading in your `AgentContext`: + +```python icon="python" focus={2} +agent_context = AgentContext( + load_public_skills=True, # Auto-load from public registry + skills=[ + # Your custom skills here + ] ) +``` -conversation.send_message("just echo $SECRET_TOKEN") +When enabled, the SDK will: +1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run +2. Load all available skills from the repository +3. Merge them with your explicitly defined skills -conversation.run() +### Skill Naming and Triggers -conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") +**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. -conversation.run() +**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +```python icon="python" +# Both skills will be triggered by "/codereview" +agent_context = AgentContext( + load_public_skills=True, # Loads public "code-review" skill + skills=[ + Skill( + name="custom-codereview-guide", # Different name = coexists + content="Project-specific guidelines...", + trigger=KeywordTrigger(keywords=["/codereview"]), + ), + ] +) ``` - + +**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. + -## Next Steps +### Programmatic Loading -- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP -- **[Security Analyzer](/sdk/guides/security)** - Add security validation +You can also load public skills manually and have more control: +```python icon="python" +from openhands.sdk.context.skills import load_public_skills -# Security & Action Confirmation -Source: https://docs.openhands.dev/sdk/guides/security +# Load all public skills +public_skills = load_public_skills() -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +# Use with AgentContext +agent_context = AgentContext(skills=public_skills) -Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user -approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. +# Or combine with custom skills +my_skills = [ + Skill(name="custom", content="Custom instructions", trigger=None) +] +agent_context = AgentContext(skills=my_skills + public_skills) +``` -## Confirmation Policy -> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! +### Custom Skills Repository -Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. +You can load skills from your own repository: -### Setting Confirmation Policy +```python icon="python" focus={3-7} +from openhands.sdk.context.skills import load_public_skills -Set the confirmation policy on your conversation: +# Load from a custom repository +custom_skills = load_public_skills( + repo_url="https://github.com/my-org/my-skills", + branch="main" +) +``` -```python icon="python" focus={4} -from openhands.sdk.security.confirmation_policy import AlwaysConfirm +### How It Works -conversation = Conversation(agent=agent, workspace=".") -conversation.set_confirmation_policy(AlwaysConfirm()) -``` +The `load_public_skills()` function uses git-based caching for efficiency: -Available policies: -- **`AlwaysConfirm()`** - Require approval for all actions -- **`NeverConfirm()`** - Execute all actions without approval -- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) +- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` +- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date +- **Offline mode**: Uses the cached version if network is unavailable -### Custom Confirmation Handler +This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. -Implement your approval logic by checking conversation status: + +Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. + -```python icon="python" focus={2-3,5} -while conversation.state.agent_status != AgentExecutionStatus.FINISHED: - if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not confirm_in_console(pending): - conversation.reject_pending_actions("User rejected") - continue - conversation.run() -``` +## Customizing Agent Context -### Rejecting Actions +### Message Suffixes -Provide feedback when rejecting to help the agent try a different approach: +Append custom instructions to the system prompt or user messages via `AgentContext`: -```python icon="python" focus={2-5} -if not user_approved: - conversation.reject_pending_actions( - "User rejected because actions seem too risky." - "Please try a safer approach." - ) +```python icon="python" +agent_context = AgentContext( + system_message_suffix=""" + +Repository: my-project +Branch: feature/new-api + + """.strip(), + user_message_suffix="Remember to explain your reasoning." +) ``` -### Ready-to-run Example Confirmation +- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) +- **`user_message_suffix`**: Appended to each user message - -Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) - +### Replacing the Entire System Prompt -Require user approval before executing agent actions: +For complete control, provide a custom Jinja2 template via the `Agent` class: -```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py -"""OpenHands Agent SDK — Confirmation Mode Example""" +```python icon="python" focus={6} +from openhands.sdk import Agent -import os -import signal -from collections.abc import Callable +agent = Agent( + llm=llm, + tools=tools, + system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path + system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} +) +``` -from pydantic import SecretStr +**Custom template example** (`custom_system_prompt.j2`): -from openhands.sdk import LLM, BaseConversation, Conversation -from openhands.sdk.conversation.state import ( - ConversationExecutionStatus, - ConversationState, -) -from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.tools.preset.default import get_default_agent +```jinja2 +You are a helpful coding assistant for {{ repo_name }}. + +{% if cli_mode %} +You are running in CLI mode. Keep responses concise. +{% endif %} +Follow these guidelines: +- Write clean, well-documented code +- Consider edge cases and error handling +- Suggest tests when appropriate +``` -# Make ^C a clean exit instead of a stack trace -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) +**Key points:** +- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory +- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location +- Pass variables to the template via `system_prompt_kwargs` +- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt +## Next Steps -def _print_action_preview(pending_actions) -> None: - print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval +## OpenHands Overview -def confirm_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). - """ - _print_action_preview(pending_actions) - while True: - try: - ans = ( - input("\nDo you want to execute these actions? (yes/no): ") - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\n❌ No input received; rejecting by default.") - return False +### Community +Source: https://docs.openhands.dev/overview/community.md - if ans in ("yes", "y"): - print("✅ Approved — executing actions…") - return True - if ans in ("no", "n"): - print("❌ Rejected — skipping actions…") - return False - print("Please enter 'yes' or 'no'.") +# The OpenHands Community +OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered world. -def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: - """ - Drive the conversation until FINISHED. - If WAITING_FOR_CONFIRMATION, ask the confirmer; - on reject, call reject_pending_actions(). - Preserves original error if agent waits but no actions exist. - """ - while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: - if ( - conversation.state.execution_status - == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "⚠️ Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected the actions") - # Let the agent produce a new step or finish - continue +## Mission - print("▶️ Running conversation.run()…") - conversation.run() +It's very clear that AI is changing software development. We want the developer community to drive that change organically, through open source. +So we're not just building friendly interfaces for AI-driven development. We're publishing _building blocks_ that empower developers to create new experiences, tailored to your own habits, needs, and imagination. -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +## Ethos -agent = get_default_agent(llm=llm) -conversation = Conversation(agent=agent, workspace=os.getcwd()) +We have two core values: **high openness** and **high agency**. While we don't expect everyone in the community to embody these values, we want to establish them as norms. -# Conditionally add security analyzer based on environment variable -add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) -if add_security_analyzer: - print("Agent security analyzer added.") - conversation.set_security_analyzer(LLMSecurityAnalyzer()) +### High Openness -# 1) Confirmation mode ON -conversation.set_confirmation_policy(AlwaysConfirm()) -print("\n1) Command that will likely create actions…") -conversation.send_message("Please list the files in the current directory using ls -la") -run_until_finished(conversation, confirm_in_console) +We welcome anyone and everyone into our community by default. You don't have to be a software developer to help us build. You don't have to be pro-AI to help us learn. -# 2) A command the user may choose to reject -print("\n2) Command the user may choose to reject…") -conversation.send_message("Please create a file called 'dangerous_file.txt'") -run_until_finished(conversation, confirm_in_console) +Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the fruits of our work, but the whole process of growing it. -# 3) Simple greeting (no actions expected) -print("\n3) Simple greeting (no actions expected)…") -conversation.send_message("Just say hello to me") -run_until_finished(conversation, confirm_in_console) +We welcome thoughtful criticism, whether it's a comment on a PR or feedback on the community as a whole. -# 4) Disable confirmation mode and run commands directly -print("\n4) Disable confirmation mode and run a command…") -conversation.set_confirmation_policy(NeverConfirm()) -conversation.send_message("Please echo 'Hello from confirmation mode example!'") -conversation.run() +### High Agency -conversation.send_message( - "Please delete any file that was created during this conversation." -) -conversation.run() +Everyone should feel empowered to contribute to OpenHands. Whether it's by making a PR, hosting an event, sharing feedback, or just asking a question, don't hold back! + +OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly and love building new things. + +Coding, development practices, and communities are changing rapidly. We won't hesitate to change direction and make big bets. + +## Relationship to All Hands -print("\n=== Example Complete ===") -print("Key points:") -print( - "- conversation.run() creates actions; confirmation mode " - "sets execution_status=WAITING_FOR_CONFIRMATION" -) -print("- User confirmation is handled via a single reusable function") -print("- Rejection uses conversation.reject_pending_actions() and the loop continues") -print("- Simple responses work normally without actions") -print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") -``` +OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.all-hands.dev/). - +All Hands was founded by three of the first major contributors to OpenHands: ---- +- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards +- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands +- Robert Brennan, a software engineer who architected the user-facing features of OpenHands -## Security Analyzer +All Hands is an important part of the OpenHands ecosystem. We've raised over $20M—mainly to hire developers and researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/)) -Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: +But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we promise to navigate that conflict thoughtfully and transparently. -- **LOW** - Safe operations with minimal security impact -- **MEDIUM** - Moderate security impact, review recommended -- **HIGH** - Significant security impact, requires confirmation -- **UNKNOWN** - Risk level could not be determined +At some point, we may transfer custody of OpenHands to an open source foundation. But for now, the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the "benevolent" part, please: fork us. -Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. +### Contributing +Source: https://docs.openhands.dev/overview/contributing.md -### LLM Security Analyzer +# Contributing to OpenHands -> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! +Welcome to the OpenHands community! We're building the future of AI-powered software development, and we'd love for you to be part of this journey. -The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. +## Our Vision: Free as in Freedom -#### Security Analyzer Configuration +The OpenHands community is built around the belief that **AI and AI agents are going to fundamentally change the way we build software**, and if this is true, we should do everything we can to make sure that the benefits provided by such powerful technology are **accessible to everyone**. -Create an LLM-based security analyzer to review actions before execution: +We believe in the power of open source to democratize access to cutting-edge AI technology. Just as the internet transformed how we share information, we envision a world where AI-powered development tools are available to every developer, regardless of their background or resources. -```python icon="python" focus={9} -from openhands.sdk import LLM -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -llm = LLM( - usage_id="security-analyzer", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -security_analyzer = LLMSecurityAnalyzer(llm=security_llm) -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) -``` +If this resonates with you, we'd love to have you join us in our quest! -The security analyzer: -- Reviews each action before execution -- Flags potentially dangerous operations -- Can be configured with custom security policy -- Uses a separate LLM to avoid conflicts with the main agent +## What Can You Build? -#### Ready-to-run Example Security Analyzer +There are countless ways to contribute to OpenHands. Whether you're a seasoned developer, a researcher, a designer, or someone just getting started, there's a place for you in our community. - -Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) - +### Frontend & UI/UX +Make OpenHands more beautiful and user-friendly: +- **React & TypeScript Development** - Improve the web interface +- **UI/UX Design** - Enhance user experience and accessibility +- **Mobile Responsiveness** - Make OpenHands work great on all devices +- **Component Libraries** - Build reusable UI components -Automatically analyze agent actions for security risks before execution: +*Small fixes are always welcome! For bigger changes, join our **#eng-ui-ux** channel in [Slack](https://openhands.dev/joinslack) first.* -```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py -"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) +### Agent Development +Help make our AI agents smarter and more capable: +- **Prompt Engineering** - Improve how agents understand and respond +- **New Agent Types** - Create specialized agents for different tasks +- **Agent Evaluation** - Develop better ways to measure agent performance +- **Multi-Agent Systems** - Enable agents to work together -This example shows how to use the LLMSecurityAnalyzer to automatically -evaluate security risks of actions before execution. -""" +*We use [SWE-bench](https://www.swebench.com/) to evaluate our agents. Join our [Slack](https://openhands.dev/joinslack) to learn more.* -import os -import signal -from collections.abc import Callable +### Backend & Infrastructure +Build the foundation that powers OpenHands: +- **Python Development** - Core functionality and APIs +- **Runtime Systems** - Docker containers and sandboxes +- **Cloud Integrations** - Support for different cloud providers +- **Performance Optimization** - Make everything faster and more efficient -from pydantic import SecretStr +### Testing & Quality Assurance +Help us maintain high quality: +- **Unit Testing** - Write tests for new features +- **Integration Testing** - Ensure components work together +- **Bug Hunting** - Find and report issues +- **Performance Testing** - Identify bottlenecks and optimization opportunities -from openhands.sdk import LLM, Agent, BaseConversation, Conversation -from openhands.sdk.conversation.state import ( - ConversationExecutionStatus, - ConversationState, -) -from openhands.sdk.security.confirmation_policy import ConfirmRisky -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### Documentation & Education +Help others learn and contribute: +- **Technical Documentation** - API docs, guides, and tutorials +- **Video Tutorials** - Create learning content +- **Translation** - Make OpenHands accessible in more languages +- **Community Support** - Help other users and contributors +### Research & Innovation +Push the boundaries of what's possible: +- **Academic Research** - Publish papers using OpenHands +- **Benchmarking** - Develop new evaluation methods +- **Experimental Features** - Try cutting-edge AI techniques +- **Data Analysis** - Study how developers use AI tools -# Clean ^C exit: no stack trace noise -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) +## 🚀 Getting Started +Ready to contribute? Here's your path to making an impact: -def _print_blocked_actions(pending_actions) -> None: - print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") +### 1. Quick Wins +Start with these easy contributions: +- **Use OpenHands** and [report issues](https://github.com/OpenHands/OpenHands/issues) you encounter +- **Give feedback** using the thumbs-up/thumbs-down buttons after each session +- **Star our repository** on [GitHub](https://github.com/OpenHands/OpenHands) +- **Share OpenHands** with other developers +### 2. Set Up Your Development Environment +Follow our setup guide: +- **Requirements**: Linux/Mac/WSL, Docker, Python 3.12, Node.js 22+, Poetry 1.8+ +- **Quick setup**: `make build` to get everything ready +- **Configuration**: `make setup-config` to configure your LLM +- **Run locally**: `make run` to start the application -def confirm_high_risk_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. - """ - _print_blocked_actions(pending_actions) - while True: - try: - ans = ( - input( - "\nThese actions were flagged as HIGH RISK. " - "Do you want to execute them anyway? (yes/no): " - ) - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\n❌ No input received; rejecting by default.") - return False +*Full details in our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md)* - if ans in ("yes", "y"): - print("✅ Approved — executing high-risk actions...") - return True - if ans in ("no", "n"): - print("❌ Rejected — skipping high-risk actions...") - return False - print("Please enter 'yes' or 'no'.") +### 3. Find Your First Issue +Look for beginner-friendly opportunities: +- Browse [good first issues](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) +- Check our [project boards](https://github.com/OpenHands/OpenHands/projects) for organized tasks +- Ask in [Slack](https://openhands.dev/joinslack) what needs help +### 4. Join the Community +Connect with other contributors in our [Slack Community](https://openhands.dev/joinslack). You can connect with OpenHands contributors, maintainers, and more! -def run_until_finished_with_security( - conversation: BaseConversation, confirmer: Callable[[list], bool] -) -> None: - """ - Drive the conversation until FINISHED. - - If WAITING_FOR_CONFIRMATION: ask the confirmer. - * On approve: set execution_status = IDLE (keeps original example’s behavior). - * On reject: conversation.reject_pending_actions(...). - - If WAITING but no pending actions: print warning and set IDLE (matches original). - """ - while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: - if ( - conversation.state.execution_status - == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "⚠️ Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected high-risk actions") - continue +## 📋 How to Contribute Code - print("▶️ Running conversation.run()...") - conversation.run() +### Understanding the Codebase +Get familiar with our architecture: +- **[Frontend](https://github.com/OpenHands/OpenHands/tree/main/frontend/README.md)** - React application +- **[Backend](https://github.com/OpenHands/OpenHands/tree/main/openhands/README.md)** - Python core +- **[Agents](https://github.com/OpenHands/OpenHands/tree/main/openhands/agenthub/README.md)** - AI agent implementations +- **[Runtime](https://github.com/OpenHands/OpenHands/tree/main/openhands/runtime/README.md)** - Execution environments +- **[Evaluation](https://github.com/OpenHands/benchmarks)** - Testing and benchmarks +### Pull Request Process +We welcome all pull requests! Here's how we evaluate them: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="security-analyzer", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +#### Small Improvements +- Quick review and approval for obvious improvements +- Make sure CI tests pass +- Include clear description of changes -# Tools -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +#### Core Agent Changes +We're more careful with agent changes since they affect user experience: +- **Accuracy** - Does it make the agent better at solving problems? +- **Efficiency** - Does it improve speed or reduce resource usage? +- **Code Quality** - Is the code maintainable and well-tested? -# Agent -agent = Agent(llm=llm, tools=tools) +*Discuss major changes in [GitHub issues](https://github.com/OpenHands/OpenHands/issues) or [Slack](https://openhands.dev/joinslack) first!* -# Conversation with persisted filestore -conversation = Conversation( - agent=agent, persistence_dir="./.conversations", workspace="." -) -conversation.set_security_analyzer(LLMSecurityAnalyzer()) -conversation.set_confirmation_policy(ConfirmRisky()) +### Pull Request Guidelines +We recommend the following for smooth reviews but they're not required. Just know that the more you follow these guidelines, the more likely you'll get your PR reviewed faster and reduce the quantity of revisions. -print("\n1) Safe command (LOW risk - should execute automatically)...") -conversation.send_message("List files in the current directory") -conversation.run() +**Title Format:** +- `feat: Add new agent capability` +- `fix: Resolve memory leak in runtime` +- `docs: Update installation guide` +- `style: Fix code formatting` +- `refactor: Simplify authentication logic` +- `test: Add unit tests for parser` -print("\n2) Potentially risky command (may require confirmation)...") -conversation.send_message( - "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" -) -run_until_finished_with_security(conversation, confirm_high_risk_in_console) -``` +**Description:** +- Explain what the PR does and why +- Link to related issues +- Include screenshots for UI changes +- Add changelog entry for user-facing changes - +## License -### Custom Security Analyzer Implementation +OpenHands is released under the **MIT License**, which means: -You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. +### You Can: +- **Use** OpenHands for any purpose, including commercial projects +- **Modify** the code to fit your needs +- **Share** your modifications +- **Distribute** or sell copies of OpenHands -#### Creating a Custom Analyzer +### You Must: +- **Include** the original copyright notice and license text +- **Preserve** the license in any substantial portions you use -To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: +### No Warranty: +- OpenHands is provided "as is" without warranty +- Contributors are not liable for any damages -```python icon="python" focus={5, 8} -from openhands.sdk.security.analyzer import SecurityAnalyzerBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.event.llm_convertible import ActionEvent +*Full license text: [LICENSE](https://github.com/OpenHands/OpenHands/blob/main/LICENSE)* -class CustomSecurityAnalyzer(SecurityAnalyzerBase): - """Custom security analyzer with domain-specific rules.""" - - def security_risk(self, action: ActionEvent) -> SecurityRisk: - """Evaluate security risk based on custom rules. - - Args: - action: The ActionEvent to analyze - - Returns: - SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) - """ - # Example: Check for specific dangerous patterns - action_str = str(action.action.model_dump()).lower() if action.action else "" +**Special Note:** Content in the `enterprise/` directory has a separate license. See `enterprise/LICENSE` for details. - # High-risk patterns - if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): - return SecurityRisk.HIGH - - # Medium-risk patterns - if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): - return SecurityRisk.MEDIUM - - # Default to low risk - return SecurityRisk.LOW +## Ready to make your first contribution? -# Use your custom analyzer -security_analyzer = CustomSecurityAnalyzer() -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) -``` +1. **⭐ Star** our [GitHub repository](https://github.com/OpenHands/OpenHands) +2. **🔧 Set up** your development environment using our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md) +3. **💬 Join** our [Slack community](https://openhands.dev/joinslack) to meet other contributors +4. **🎯 Find** a [good first issue](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) to work on +5. **📝 Read** our [Code of Conduct](https://github.com/OpenHands/OpenHands/blob/main/CODE_OF_CONDUCT.md) - - For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). - +## Need Help? +Don't hesitate to ask for help: +- **Slack**: [Join our community](https://openhands.dev/joinslack) for real-time support +- **GitHub Issues**: [Open an issue](https://github.com/OpenHands/OpenHands/issues) for bugs or feature requests +- **Email**: Contact us at [contact@openhands.dev](mailto:contact@openhands.dev) --- -## Configurable Security Policy +Thank you for considering contributing to OpenHands! Together, we're building tools that will democratize AI-powered software development and make it accessible to developers everywhere. Every contribution, no matter how small, helps us move closer to that vision. -> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! +Welcome to the community! 🎉 -Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. +### FAQs +Source: https://docs.openhands.dev/overview/faqs.md +## Getting Started -### Using Custom Security Policies +### I'm new to OpenHands. Where should I start? -You can provide a custom security policy template when creating an agent: +1. **Quick start**: Use [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) to get started quickly with + [GitHub](/openhands/usage/cloud/github-installation), [GitLab](/openhands/usage/cloud/gitlab-installation), + [Bitbucket](/openhands/usage/cloud/bitbucket-installation), + and [Slack](/openhands/usage/cloud/slack-installation) integrations. +2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/openhands/usage/run-openhands/local-setup). +3. **First steps**: Read over the [first projects guidelines](/overview/first-projects) and + [prompting best practices](/openhands/usage/tips/prompting-best-practices) to learn the basics. -```python focus={9-13} icon="python" -from openhands.sdk import Agent, LLM +### Can I use OpenHands for production workloads? -llm = LLM( - usage_id="agent", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), -) +OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant +deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability. -# Provide a custom security policy template file -agent = Agent( - llm=llm, - tools=tools, - security_policy_filename="my_security_policy.j2", -) -``` +If you're interested in running OpenHands in a multi-tenant environment, please [contact us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) about our enterprise deployment options. -Custom security policies allow you to: -- Define organization-specific risk assessment guidelines -- Set custom thresholds for security risk levels -- Add domain-specific security rules -- Tailor risk evaluation to your use case + +Using OpenHands for work? We'd love to chat! Fill out +[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) +to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide +input on our product roadmap. + -The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. +## Safety and Security -### Ready-to-run Example Security Policy +### It's doing stuff without asking, is that safe? - -Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) - +**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container +(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration: -Define custom security risk guidelines for your agent: +**What's protected:** +- Your host system files and programs (unless you mount them using [this feature](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)) +- Host system resources +- Other containers and processes -```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py -"""OpenHands Agent SDK — Configurable Security Policy Example +**Potential risks to consider:** +- The agent can access the internet from within the container. +- If you provide credentials (API keys, tokens), the agent can use them. +- Mounted files and directories can be modified or deleted. +- Network requests can be made to external services. -This example demonstrates how to use a custom security policy template -with an agent. Security policies define risk assessment guidelines that -help agents evaluate the safety of their actions. +For detailed security information, see our [Runtime Architecture](/openhands/usage/architecture/runtime), +[Security Configuration](/openhands/usage/advanced/configuration-options#security-configuration), +and [Hardened Docker Installation](/openhands/usage/sandboxes/docker#hardened-docker-installation) documentation. -By default, agents use the built-in security_policy.j2 template. This -example shows how to: -1. Use the default security policy -2. Provide a custom security policy template embedded in the script -3. Apply the custom policy to guide agent behavior -""" +## File Storage and Access -import os -import tempfile -from pathlib import Path +### Where are my files stored? -from pydantic import SecretStr +Your files are stored in different locations depending on how you've configured OpenHands: -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +**Default behavior (no file mounting):** +- Files created by the agent are stored inside the runtime Docker container. +- These files are temporary and will be lost when the container is removed. +- The agent works in the `/workspace` directory inside the runtime container. +**When you mount your local filesystem (following [this](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)):** +- Your local files are mounted into the container's `/workspace` directory. +- Changes made by the agent are reflected in your local filesystem. +- Files persist after the container is stopped. -logger = get_logger(__name__) + +Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory. + -# Define a custom security policy template inline -CUSTOM_SECURITY_POLICY = ( - "# 🔐 Custom Security Risk Policy\n" - "When using tools that support the security_risk parameter, assess the " - "safety risk of your actions:\n" - "\n" - "- **LOW**: Safe read-only actions.\n" - " - Viewing files, calculations, documentation.\n" - "- **MEDIUM**: Moderate container-scoped actions.\n" - " - File modifications, package installations.\n" - "- **HIGH**: Potentially dangerous actions.\n" - " - Network access, system modifications, data exfiltration.\n" - "\n" - "**Custom Rules**\n" - "- Always prioritize user data safety.\n" - "- Escalate to **HIGH** for any external data transmission.\n" -) +## Development Tools and Environment -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +### How do I get the dev tools I need? -# Tools -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +OpenHands comes with a basic runtime environment that includes Python and Node.js. +It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment. -# Example 1: Agent with default security policy -print("=" * 100) -print("Example 1: Agent with default security policy") -print("=" * 100) -default_agent = Agent(llm=llm, tools=tools) -print(f"Security policy filename: {default_agent.security_policy_filename}") -print("\nDefault security policy is embedded in the agent's system message.") +If you would like to set things up more systematically, you can: +- **Use setup.sh**: Add a [setup.sh file](/openhands/usage/customization/repository#setup-script) file to + your repository, which will be run every time the agent starts. +- **Use a custom sandbox**: Use a [custom docker image](/openhands/usage/advanced/custom-sandbox-guide) to initialize the sandbox. -# Example 2: Agent with custom security policy -print("\n" + "=" * 100) -print("Example 2: Agent with custom security policy") -print("=" * 100) +### Something's not working. Where can I get help? -# Create a temporary file for the custom security policy -with tempfile.NamedTemporaryFile( - mode="w", suffix=".j2", delete=False, encoding="utf-8" -) as temp_file: - temp_file.write(CUSTOM_SECURITY_POLICY) - custom_policy_path = temp_file.name +1. **Search existing issues**: Check our [GitHub issues](https://github.com/OpenHands/OpenHands/issues) to see if + others have encountered the same problem. +2. **Join our community**: Get help from other users and developers: + - [Slack community](https://openhands.dev/joinslack) +3. **Check our troubleshooting guide**: Common issues and solutions are documented in + [Troubleshooting](/openhands/usage/troubleshooting/troubleshooting). +4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/OpenHands/OpenHands/issues/new) + and fill in as much detail as possible. -try: - # Create agent with custom security policy (using absolute path) - custom_agent = Agent( - llm=llm, - tools=tools, - security_policy_filename=custom_policy_path, - ) - print(f"Security policy filename: {custom_agent.security_policy_filename}") - print("\nCustom security policy loaded from temporary file.") +### First Projects +Source: https://docs.openhands.dev/overview/first-projects.md - # Verify the custom policy is in the system message - system_message = custom_agent.static_system_message - if "Custom Security Risk Policy" in system_message: - print("✓ Custom security policy successfully embedded in system message.") - else: - print("✗ Custom security policy not found in system message.") +Like any tool, it works best when you know how to use it effectively. Whether you're experimenting with a small +script or making changes in a large codebase, this guide will show how to apply OpenHands in different scenarios. - # Run a conversation with the custom agent - print("\n" + "=" * 100) - print("Running conversation with custom security policy") - print("=" * 100) +Let’s walk through a natural progression of using OpenHands: +- Try a simple prompt. +- Build a project from scratch. +- Add features to existing code. +- Refactor code. +- Debug and fix bugs. - llm_messages = [] # collect raw LLM messages +## First Steps: Hello World - def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +Start with a small task to get familiar with how OpenHands responds to prompts. - conversation = Conversation( - agent=custom_agent, - callbacks=[conversation_callback], - workspace=".", - ) +Click `New Conversation` and try prompting: +> Write a bash script hello.sh that prints "hello world!" - conversation.send_message( - "Please create a simple Python script named hello.py that prints " - "'Hello, World!'. Make sure to follow security best practices." - ) - conversation.run() +OpenHands will generate script, set the correct permissions, and even run it for you. - print("\n" + "=" * 100) - print("Conversation finished.") - print(f"Total LLM messages: {len(llm_messages)}") - print("=" * 100) +Now try making small changes: - # Report cost - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") +> Modify hello.sh so that it accepts a name as the first argument, but defaults to "world". -finally: - # Clean up temporary file - Path(custom_policy_path).unlink(missing_ok=True) +You can experiment in any language. For example: -print("\n" + "=" * 100) -print("Example Summary") -print("=" * 100) -print("This example demonstrated:") -print("1. Using the default security policy (security_policy.j2)") -print("2. Creating a custom security policy template") -print("3. Applying the custom policy via security_policy_filename parameter") -print("4. Running a conversation with the custom security policy") -print( - "\nYou can customize security policies to match your organization's " - "specific requirements." -) -``` +> Convert hello.sh to a Ruby script, and run it. - + + Start small and iterate. This helps you understand how OpenHands interprets and responds to different prompts. + -## Next Steps +## Build Something from Scratch -- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools -- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management +Agents excel at "greenfield" tasks, where they don’t need context about existing code. +Begin with a simple task and iterate from there. Be specific about what you want and the tech stack. +Click `New Conversation` and give it a clear goal: -# Agent Skills & Context -Source: https://docs.openhands.dev/sdk/guides/skill +> Build a frontend-only TODO app in React. All state should be stored in localStorage. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Once the basics are working, build on it just like you would in a real project: -This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). +> Allow adding an optional due date to each task. -OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. +You can also ask OpenHands to help with version control: -## Context Loading Methods +> Commit the changes and push them to a new branch called "feature/due-dates". -| Method | When Content Loads | Use Case | -|--------|-------------------|----------| -| **Always-loaded** | At conversation start | Repository rules, coding standards | -| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | -| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | + + Break your goals into small, manageable tasks.. Keep pushing your changes often. This makes it easier to recover + if something goes off track. + -## Always-Loaded Context +## Expand Existing Code -Content that's always in the system prompt. +Want to add new functionality to an existing repo? OpenHands can do that too. -### Option 1: `AGENTS.md` (Auto-loaded) + +If you're running OpenHands on your own, first add a +[GitHub token](/openhands/usage/settings/integrations-settings#github-setup), +[GitLab token](/openhands/usage/settings/integrations-settings#gitlab-setup) or +[Bitbucket token](/openhands/usage/settings/integrations-settings#bitbucket-setup). + -Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). +Choose your repository and branch via `Open Repository`, and press `Launch`. -```python icon="python" focus={3, 4} -from openhands.sdk.context.skills import load_project_skills +Examples of adding new functionality: -# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root -skills = load_project_skills(workspace_dir="/path/to/repo") -agent_context = AgentContext(skills=skills) -``` +> Add a GitHub action that lints the code in this repository. -### Option 2: Inline Skill (Code-defined) +> Modify ./backend/api/routes.js to add a new route that returns a list of all tasks. -```python icon="python" focus={5-11} -from openhands.sdk import AgentContext -from openhands.sdk.context import Skill +> Add a new React component to the ./frontend/components directory to display a list of Widgets. +> It should use the existing Widget component. -agent_context = AgentContext( - skills=[ - Skill( - name="code-style", - content="Always use type hints in Python.", - trigger=None, # No trigger = always loaded - ), - ] -) -``` + + OpenHands can explore the codebase, but giving it context upfront makes it faster and less expensive. + -## Trigger-Loaded Context +## Refactor Code -Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). +OpenHands does great at refactoring code in small chunks. Rather than rearchitecting the entire codebase, it's more +effective in focused refactoring tasks. Start by launching a conversation with +your repo and branch. Then guide it: -```python icon="python" focus={6} -from openhands.sdk.context import Skill, KeywordTrigger +> Rename all the single-letter variables in ./app.go. -Skill( - name="encryption-helper", - content="Use the encrypt.sh script to encrypt messages.", - trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), -) -``` +> Split the `build_and_deploy_widgets` function into two functions, `build_widgets` and `deploy_widgets` in widget.php. -When user says "encrypt this", the content is injected into the message: +> Break ./api/routes.js into separate files for each route. -```xml icon="file" - -The following information has been included based on a keyword match for "encrypt". -Skill location: /path/to/encryption-helper + + Focus on small, meaningful improvements instead of full rewrites. + -Use the encrypt.sh script to encrypt messages. - -``` +## Debug and Fix Bugs -## Progressive Disclosure (AgentSkills Standard) +OpenHands can help debug and fix issues, but it’s most effective when you’ve narrowed things down. -For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. +Give it a clear description of the problem and the file(s) involved: -```python icon="python" -from openhands.sdk.context.skills import load_skills_from_dir +> The email field in the `/subscribe` endpoint is rejecting .io domains. Fix this. -# Load SKILL.md files from a directory -_, _, agent_skills = load_skills_from_dir("/path/to/skills") -agent_context = AgentContext(skills=list(agent_skills.values())) -``` +> The `search_widgets` function in ./app.py is doing a case-sensitive search. Make it case-insensitive. -Skills are listed in the system prompt: -```xml icon="file" - - - code-style - Project coding standards. - /path/to/code-style/SKILL.md - - -``` +For bug fixing, test-driven development can be really useful. You can ask OpenHands to write a new test and iterate +until the bug is fixed: - -Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. - +> The `hello` function crashes on the empty string. Write a test that reproduces this bug, then fix the code so it passes. ---- + + Be as specific as possible. Include expected behavior, file names, and examples to speed things up. + -## Full Example +## Using OpenHands Effectively - -Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) - +OpenHands can assist with nearly any coding task, but it takes some practice to get the best results. +Keep these tips in mind: +* Keep your tasks small. +* Be clear and specific. +* Provide relevant context. +* Commit and push frequently. -```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py -import os +See [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) for more tips on how to get the most +out of OpenHands. -from pydantic import SecretStr +### Introduction +Source: https://docs.openhands.dev/overview/introduction.md -from openhands.sdk import ( - LLM, - Agent, - AgentContext, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.context import ( - KeywordTrigger, - Skill, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +🙌 Welcome to OpenHands, a [community](/overview/community) focused on AI-driven development. We'd love for you to [join us on Slack](https://openhands.dev/joinslack). +There are a few ways to work with OpenHands: -logger = get_logger(__name__) +## OpenHands Software Agent SDK +The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below. -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +Define agents in code, then run them locally, or scale to 1000s of agents in the cloud -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +[Check out the docs](https://docs.openhands.dev/sdk) or [view the source](https://github.com/All-Hands-AI/agent-sdk/) -# AgentContext provides flexible ways to customize prompts: -# 1. Skills: Inject instructions (always-active or keyword-triggered) -# 2. system_message_suffix: Append text to the system prompt -# 3. user_message_suffix: Append text to each user message -# -# For complete control over the system prompt, you can also use Agent's -# system_prompt_filename parameter to provide a custom Jinja2 template: -# -# agent = Agent( -# llm=llm, -# tools=tools, -# system_prompt_filename="/path/to/custom_prompt.j2", -# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, -# ) -# -# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts -agent_context = AgentContext( - skills=[ - Skill( - name="repo.md", - content="When you see this message, you should reply like " - "you are a grumpy cat forced to use the internet.", - # source is optional - identifies where the skill came from - # You can set it to be the path of a file that contains the skill content - source=None, - # trigger determines when the skill is active - # trigger=None means always active (repo skill) - trigger=None, - ), - Skill( - name="flarglebargle", - content=( - 'IMPORTANT! The user has said the magic word "flarglebargle". ' - "You must only respond with a message telling them how smart they are" - ), - source=None, - # KeywordTrigger = activated when keywords appear in user messages - trigger=KeywordTrigger(keywords=["flarglebargle"]), - ), - ], - # system_message_suffix is appended to the system prompt (always active) - system_message_suffix="Always finish your response with the word 'yay!'", - # user_message_suffix is appended to each user message - user_message_suffix="The first character of your response should be 'I'", - # You can also enable automatic load skills from - # public registry at https://github.com/OpenHands/extensions - load_public_skills=True, -) +## OpenHands CLI +The CLI is the easiest way to start using OpenHands. The experience will be familiar to anyone who has worked +with e.g. Claude Code or Codex. You can power it with Claude, GPT, or any other LLM. -# Agent -agent = Agent(llm=llm, tools=tools, agent_context=agent_context) +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/cli-mode) or [view the source](https://github.com/OpenHands/OpenHands-CLI) -llm_messages = [] # collect raw LLM messages +## OpenHands Local GUI +Use the Local GUI for running agents on your laptop. It comes with a REST API and a single-page React application. +The experience will be familiar to anyone who has used Devin or Jules. +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup) or view the source in this repo. -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## OpenHands Cloud +This is a commercial deployment of OpenHands GUI, running on hosted infrastructure. +You can try it with a free by [signing in with your GitHub account](https://app.all-hands.dev). -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) +OpenHands Cloud comes with source-available features and integrations: +- Deeper integrations with GitHub, GitLab, and Bitbucket +- Integrations with Slack, Jira, and Linear +- Multi-user support +- RBAC and permissions +- Collaboration features (e.g., conversation sharing) +- Usage reporting +- Budgeting enforcement -print("=" * 100) -print("Checking if the repo skill is activated.") -conversation.send_message("Hey are you a grumpy cat?") -conversation.run() +## OpenHands Enterprise +Large enterprises can work with us to self-host OpenHands Cloud in their own VPC, via Kubernetes. +OpenHands Enterprise can also work with the CLI and SDK above. -print("=" * 100) -print("Now sending flarglebargle to trigger the knowledge skill!") -conversation.send_message("flarglebargle!") -conversation.run() +OpenHands Enterprise is source-available--you can see all the source code here in the enterprise/ directory, +but you'll need to purchase a license if you want to run it for more than one month. -print("=" * 100) -print("Now triggering public skill 'github'") -conversation.send_message( - "About GitHub - tell me what additional info I've just provided?" -) -conversation.run() +Enterprise contracts also come with extended support and access to our research team. -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +Learn more at [openhands.dev/enterprise](https://openhands.dev/enterprise) -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +## Everything Else - +Check out our [Product Roadmap](https://github.com/orgs/openhands/projects/1), and feel free to +[open up an issue](https://github.com/OpenHands/OpenHands/issues) if there's something you'd like to see! -### Creating Skills +You might also be interested in our [evaluation infrastructure](https://github.com/OpenHands/benchmarks), our [chrome extension](https://github.com/OpenHands/openhands-chrome-extension/), or our [Theory-of-Mind module](https://github.com/OpenHands/ToM-SWE). -Skills are defined with a name, content (the instructions), and an optional trigger: +All our work is available under the MIT license, except for the `enterprise/` directory in this repository (see the [enterprise license](https://github.com/OpenHands/OpenHands/blob/main/enterprise/LICENSE) for details). +The core `openhands` and `agent-server` Docker images are fully MIT-licensed as well. -```python icon="python" focus={3-14} -agent_context = AgentContext( - skills=[ - Skill( - name="AGENTS.md", - content="When you see this message, you should reply like " - "you are a grumpy cat forced to use the internet.", - trigger=None, # Always active - ), - Skill( - name="flarglebargle", - content='IMPORTANT! The user has said the magic word "flarglebargle". ' - "You must only respond with a message telling them how smart they are", - trigger=KeywordTrigger(keywords=["flarglebargle"]), - ), - ] -) -``` +If you need help with anything, or just want to chat, [come find us on Slack](https://openhands.dev/joinslack). -### Keyword Triggers +### Model Context Protocol (MCP) +Source: https://docs.openhands.dev/overview/model-context-protocol.md -Use `KeywordTrigger` to activate skills only when specific words appear: +Model Context Protocol (MCP) is an open standard that allows OpenHands to communicate with external tool servers, extending the agent's capabilities with custom tools, specialized data processing, external API access, and more. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). -```python icon="python" focus={4} -Skill( - name="magic-word", - content="Special instructions when magic word is detected", - trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), -) -``` +## How MCP Works +When OpenHands starts, it: -## File-Based Skills (`SKILL.md`) +1. Reads the MCP configuration +2. Connects to configured servers (SSE, SHTTP, or stdio) +3. Registers tools provided by these servers with the agent +4. Routes tool calls to appropriate MCP servers during execution -For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. +## MCP Support Matrix - -Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) - +| Platform | Support Level | Configuration Method | Documentation | +|----------|---------------|---------------------|---------------| +| **CLI** | ✅ Full Support | `~/.openhands/mcp.json` file | [CLI MCP Servers](/openhands/usage/cli/mcp-servers) | +| **SDK** | ✅ Full Support | Programmatic configuration | [SDK MCP Guide](/sdk/guides/mcp) | +| **Local GUI** | ✅ Full Support | Settings UI + config files | [Local GUI](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI settings | [Cloud GUI](/openhands/usage/cloud/cloud-ui) | -### Directory Structure +## Platform-Specific Differences -Each skill is a directory containing: + + + - Configuration via `~/.openhands/mcp.json` file + - Real-time status monitoring with `/mcp` command + - Supports all MCP transport protocols (SSE, SHTTP, stdio) + - Manual configuration required + + + - Programmatic configuration in code + - Full control over MCP server lifecycle + - Dynamic server registration and management + - Integration with custom tool systems + + + - Visual configuration through Settings UI + - File-based configuration backup + - Real-time server status display + - Supports all transport protocols + + + - Cloud-based configuration management + - Managed MCP server hosting options + - Team-wide configuration sharing + - Enterprise security features + + - - - - - - - - - - - - - - +## Getting Started with MCP -where +- **For detailed configuration**: See [MCP Settings](/openhands/usage/settings/mcp-settings) +- **For SDK integration**: See [SDK MCP Guide](/sdk/guides/mcp) +- **For architecture details**: See [MCP Architecture](/sdk/arch/mcp) -| Component | Required | Description | -|-------|----------|-------------| -| `SKILL.md` | Yes | Skill definition with frontmatter | -| `scripts/` | No | Executable scripts | -| `references/` | No | Reference documentation | -| `assets/` | No | Static assets | +### Quick Start +Source: https://docs.openhands.dev/overview/quickstart.md +Get started with OpenHands in minutes. Choose the option that works best for you. + + + **Recommended** -### `SKILL.md` Format + The fastest way to get started. No setup required—just sign in and start coding. -The `SKILL.md` file defines the skill with YAML frontmatter: + - Free usage of MiniMax M2.5 for a limited time + - No installation needed + - Managed infrastructure + + + Use OpenHands from your terminal. Perfect for automation and scripting. -```md icon="markdown" ---- -name: my-skill # Required (standard) -description: > # Required (standard) - A brief description of what this skill does and when to use it. -license: MIT # Optional (standard) -compatibility: Requires bash # Optional (standard) -metadata: # Optional (standard) - author: your-name - version: "1.0" -triggers: # Optional (OpenHands extension) - - keyword1 - - keyword2 ---- + - IDE integrations available + - Headless mode for CI/CD + - Lightweight installation + + + Run OpenHands locally with a web-based interface. Bring your own LLM and API key. -# Skill Content + - Full control over your environment + - Works offline + - Docker-based setup + + -Instructions and documentation for the agent... -``` +### Overview +Source: https://docs.openhands.dev/overview/skills.md -#### Frontmatter Fields +Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. They provide consistent practices across projects and can be triggered automatically based on keywords or context. -| Field | Required | Description | -|-------|----------|-------------| -| `name` | Yes | Skill identifier (lowercase + hyphens) | -| `description` | Yes | What the skill does (shown to agent) | -| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | -| `license` | No | License name | -| `compatibility` | No | Environment requirements | -| `metadata` | No | Custom key-value pairs | + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers for automatic activation. See the [SDK Skills Guide](/sdk/guides/skill) for details on the SKILL.md format. + - -Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. - +## Official Skill Registry -### Loading Skills +The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands agents. You can browse available skills, contribute your own, and learn from examples created by the community. -Use `load_skills_from_dir()` to load all skills from a directory: +## How Skills Work -```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py -"""Example: Loading Skills from Disk (AgentSkills Standard) +Skills inject additional context and rules into the agent's behavior. -This example demonstrates how to load skills following the AgentSkills standard -from a directory on disk. +At a high level, OpenHands supports two loading models: -Skills are modular, self-contained packages that extend an agent's capabilities -by providing specialized knowledge, workflows, and tools. They follow the -AgentSkills standard which includes: -- SKILL.md file with frontmatter metadata (name, description, triggers) -- Optional resource directories: scripts/, references/, assets/ +- **Always-on context** (e.g., `AGENTS.md`) that is injected into the system prompt at conversation start. +- **On-demand skills** that are either: + - **triggered by the user** (keyword matches), or + - **invoked by the agent** (the agent decides to look up the full skill content). -The example_skills/ directory contains two skills: -- rot13-encryption: Has triggers (encrypt, decrypt) - listed in - AND content auto-injected when triggered -- code-style-guide: No triggers - listed in for on-demand access +## Permanent agent context (recommended) -All SKILL.md files follow the AgentSkills progressive disclosure model: -they are listed in with name, description, and location. -Skills with triggers get the best of both worlds: automatic content injection -when triggered, plus the agent can proactively read them anytime. -""" +For repository-wide, always-on instructions, prefer a root-level `AGENTS.md` file. -import os -import sys -from pathlib import Path +We also support model-specific variants: +- `GEMINI.md` for Gemini +- `CLAUDE.md` for Claude -from pydantic import SecretStr +## Triggered and optional skills -from openhands.sdk import LLM, Agent, AgentContext, Conversation -from openhands.sdk.context.skills import ( - discover_skill_resources, - load_skills_from_dir, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +To add optional skills that are loaded on demand: +- **AgentSkills standard (recommended for progressive disclosure)**: create one directory per skill and add a `SKILL.md` file. +- **Legacy/OpenHands format (simple)**: put markdown files in `.agents/skills/*.md` at the repository root. -# Get the directory containing this script -script_dir = Path(__file__).parent -example_skills_dir = script_dir / "example_skills" + +Loaded skills take up space in the context window. On-demand skills help keep the system prompt smaller because the agent sees a summary first and reads the full content only when needed. + -# ========================================================================= -# Part 1: Loading Skills from a Directory -# ========================================================================= -print("=" * 80) -print("Part 1: Loading Skills from a Directory") -print("=" * 80) +### Example Repository Structure -print(f"Loading skills from: {example_skills_dir}") +``` +some-repository/ +├── AGENTS.md # Permanent repository guidelines (recommended) +└── .agents/ + └── skills/ + ├── rot13-encryption/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ ├── scripts/ + │ │ └── rot13.sh + │ └── references/ + │ └── README.md + ├── another-agentskill/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ └── scripts/ + │ └── placeholder.sh + └── legacy_trigger_this.md # Legacy/OpenHands format (keyword-triggered) +``` -# Discover resources in the skill directory -skill_subdir = example_skills_dir / "rot13-encryption" -resources = discover_skill_resources(skill_subdir) -print("\nDiscovered resources in rot13-encryption/:") -print(f" - scripts: {resources.scripts}") -print(f" - references: {resources.references}") -print(f" - assets: {resources.assets}") +## Skill Loading Precedence -# Load skills from the directory -repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) +For project location, paths are relative to the repository root; `.agents/skills/` is a subdirectory of the project directory. +For user home location, paths are relative to the user home: `~/` -print("\nLoaded skills from directory:") -print(f" - Repo skills: {list(repo_skills.keys())}") -print(f" - Knowledge skills: {list(knowledge_skills.keys())}") -print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") +When multiple skills share the same name, OpenHands keeps the first match in this order: -# Access the loaded skill and show all AgentSkills standard fields -if agent_skills: - skill_name = next(iter(agent_skills)) - loaded_skill = agent_skills[skill_name] - print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") - print(f" - Name: {loaded_skill.name}") - desc = loaded_skill.description or "" - print(f" - Description: {desc[:70]}...") - print(f" - License: {loaded_skill.license}") - print(f" - Compatibility: {loaded_skill.compatibility}") - print(f" - Metadata: {loaded_skill.metadata}") - if loaded_skill.resources: - print(" - Resources:") - print(f" - Scripts: {loaded_skill.resources.scripts}") - print(f" - References: {loaded_skill.resources.references}") - print(f" - Assets: {loaded_skill.resources.assets}") - print(f" - Skill root: {loaded_skill.resources.skill_root}") +1. `.agents/skills/` (recommended) +2. `.openhands/skills/` (deprecated) +3. `.openhands/microagents/` (deprecated) -# ========================================================================= -# Part 2: Using Skills with an Agent -# ========================================================================= -print("\n" + "=" * 80) -print("Part 2: Using Skills with an Agent") -print("=" * 80) +Project-specific skills take precedence over user skills. -# Check for API key -api_key = os.getenv("LLM_API_KEY") -if not api_key: - print("Skipping agent demo (LLM_API_KEY not set)") - print("\nTo run the full demo, set the LLM_API_KEY environment variable:") - print(" export LLM_API_KEY=your-api-key") - sys.exit(0) +## Skill Types -# Configure LLM -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -llm = LLM( - usage_id="skills-demo", - model=model, - api_key=SecretStr(api_key), - base_url=os.getenv("LLM_BASE_URL"), -) +Currently supported skill types: -# Create agent context with loaded skills -agent_context = AgentContext( - skills=list(agent_skills.values()), - # Disable public skills for this demo to keep output focused - load_public_skills=False, -) +- **[Permanent Context](/overview/skills/repo)**: Repository-wide guidelines and best practices. We recommend `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`). +- **[Keyword-Triggered Skills](/overview/skills/keyword)**: Guidelines activated by specific keywords in user prompts. +- **[Organization Skills](/overview/skills/org)**: Team or organization-wide standards. +- **[Global Skills](/overview/skills/public)**: Community-shared skills and templates. -# Create agent with tools so it can read skill resources -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] -agent = Agent(llm=llm, tools=tools, agent_context=agent_context) +### Skills Frontmatter Requirements -# Create conversation -conversation = Conversation(agent=agent, workspace=os.getcwd()) +Each skill file may include frontmatter that provides additional information. In some cases, this frontmatter is required: -# Test the skill (triggered by "encrypt" keyword) -# The skill provides instructions and a script for ROT13 encryption -print("\nSending message with 'encrypt' keyword to trigger skill...") -conversation.send_message("Encrypt the message 'hello world'.") -conversation.run() +| Skill Type | Required | +|-------------|----------| +| General Skills | No | +| Keyword-Triggered Skills | Yes | -print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") -print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") -``` +## Skills Support Matrix - +| Platform | Support Level | Configuration Method | Implementation | Documentation | +|----------|---------------|---------------------|----------------|---------------| +| **CLI** | ✅ Full Support | `~/.agents/skills/` (user-level) and `.agents/skills/` (repo-level) | File-based markdown | [Skills Overview](/overview/skills) | +| **SDK** | ✅ Full Support | Programmatic `Skill` objects | Code-based configuration | [SDK Skills Guide](/sdk/guides/skill) | +| **Local GUI** | ✅ Full Support | `.agents/skills/` + UI | File-based with UI management | [Local Setup](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI + repository integration | Managed skill library | [Cloud UI](/openhands/usage/cloud/cloud-ui) | +## Platform-Specific Differences -### Key Functions + + + - File-based configuration in two locations: + - `~/.agents/skills/` - User-level skills (all conversations). + - `.agents/skills/` - Repository-level skills (current directory) + - Markdown format for skill definitions + - Manual file management required + - Supports both general and keyword-triggered skills + + + - Programmatic `Skill` objects in code + - Dynamic skill creation and management + - Integration with custom workflows + - Full control over skill lifecycle + + + - Visual skill management through UI + - File-based storage with GUI editing + - Real-time skill status display + - Drag-and-drop skill organization + + + - Cloud-based skill library management + - Team-wide skill sharing and templates + - Organization-level skill policies + - Integrated skill marketplace + + -#### `load_skills_from_dir()` +## Learn More -Loads all skills from a directory, returning three dictionaries: +- **For SDK integration**: See [SDK Skills Guide](/sdk/guides/skill) +- **For architecture details**: See [Skills Architecture](/sdk/arch/skill) +- **For specific skill types**: See [Repository Skills](/overview/skills/repo), [Keyword Skills](/overview/skills/keyword), [Organization Skills](/overview/skills/org), and [Global Skills](/overview/skills/public) -```python icon="python" focus={3} -from openhands.sdk.context.skills import load_skills_from_dir +### Keyword-Triggered Skills +Source: https://docs.openhands.dev/overview/skills/keyword.md -repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) -``` +## Usage -- **repo_skills**: Skills from `repo.md` files (always active) -- **knowledge_skills**: Skills from `knowledge/` subdirectories -- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) +These skills are only loaded when a prompt includes one of the trigger words. -#### `discover_skill_resources()` +## Frontmatter Syntax -Discovers resource files in a skill directory: +Frontmatter is required for keyword-triggered skills. It must be placed at the top of the file, +above the guidelines. -```python icon="python" focus={3} -from openhands.sdk.context.skills import discover_skill_resources +Enclose the frontmatter in triple dashes (---) and include the following fields: -resources = discover_skill_resources(skill_dir) -print(resources.scripts) # List of script files -print(resources.references) # List of reference files -print(resources.assets) # List of asset files -print(resources.skill_root) # Path to skill directory -``` +| Field | Description | Required | Default | +|------------|--------------------------------------------------|----------|------------------| +| `triggers` | A list of keywords that activate the skill. | Yes | None | -### Skill Location in Prompts -The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: +## Example +Keyword-triggered skill file example located at `.agents/skills/yummy.md`: ``` - -The following information has been included based on a keyword match for "encrypt". - -Skill location: /path/to/rot13-encryption -(Use this path to resolve relative file references in the skill content below) +--- +triggers: +- yummyhappy +- happyyummy +--- -[skill content from SKILL.md] - +The user has said the magic word. Respond with "That was delicious!" ``` -This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. +[See examples of keyword-triggered skills in the official OpenHands Skills Registry](https://github.com/OpenHands/extensions) -### Example Skill: ROT13 Encryption +### Organization and User Skills +Source: https://docs.openhands.dev/overview/skills/org.md -Here's a skill with triggers (OpenHands extension): +## Usage -**SKILL.md:** -```markdown icon="markdown" ---- -name: rot13-encryption -description: > - This skill helps encrypt and decrypt messages using ROT13 cipher. -triggers: - - encrypt - - decrypt - - cipher ---- +These skills can be [any type of skill](/overview/skills#skill-types) and will be loaded +accordingly. However, they are applied to all repositories belonging to the organization or user. -# ROT13 Encryption Skill +Add a `.agents` repository under the organization or user and create a `skills` directory and place the +skills in that directory. -Run the [encrypt.sh](scripts/encrypt.sh) script with your message: +For GitLab organizations, use `openhands-config` as the repository name instead of `.agents`, since GitLab doesn't support repository names starting with non-alphanumeric characters. -\`\`\`bash -./scripts/encrypt.sh "your message" -\`\`\` -``` +## Example -**scripts/encrypt.sh:** -```bash icon="sh" -#!/bin/bash -echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +General skill file example for organization `Great-Co` located inside the `.agents` repository: +`skills/org-skill.md`: +``` +* Use type hints and error boundaries; validate inputs at system boundaries and fail with meaningful error messages. +* Document interfaces and public APIs; use implementation comments only for non-obvious logic. +* Follow the same naming convention for variables, classes, constants, etc. already used in each repository. ``` -When the user says "encrypt", the skill is triggered and the agent can use the provided script. - -## Loading Public Skills +For GitLab organizations, the same skill would be located inside the `openhands-config` repository. -OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. +## User Skills When Running Openhands on Your Own -### Automatic Loading via AgentContext + + This works with CLI, headless and development modes. It does not work out of the box when running OpenHands using the docker command. + -Enable public skills loading in your `AgentContext`: +When running OpenHands on your own, you can place skills in the `~/.agents/skills` folder on your local +system and OpenHands will always load it for all your conversations. Repo-level overrides live in `.agents/skills`. -```python icon="python" focus={2} -agent_context = AgentContext( - load_public_skills=True, # Auto-load from public registry - skills=[ - # Your custom skills here - ] -) -``` +### Global Skills +Source: https://docs.openhands.dev/overview/skills/public.md -When enabled, the SDK will: -1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run -2. Load all available skills from the repository -3. Merge them with your explicitly defined skills +## Global Skill Registry -### Skill Naming and Triggers +The official global skill registry is hosted at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands users. -**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. +## Contributing a Global Skill -**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. +You can create global skills and share with the community by opening a pull request to the official skill registry. -```python icon="python" -# Both skills will be triggered by "/codereview" -agent_context = AgentContext( - load_public_skills=True, # Loads public "code-review" skill - skills=[ - Skill( - name="custom-codereview-guide", # Different name = coexists - content="Project-specific guidelines...", - trigger=KeywordTrigger(keywords=["/codereview"]), - ), - ] -) -``` +See the [OpenHands Skill Registry](https://github.com/OpenHands/extensions) for specific instructions on how to contribute a global skill. - -**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. - +### Global Skills Best Practices -### Programmatic Loading +- **Clear Scope**: Keep the skill focused on a specific domain or task. +- **Explicit Instructions**: Provide clear, unambiguous guidelines. +- **Useful Examples**: Include practical examples of common use cases. +- **Safety First**: Include necessary warnings and constraints. +- **Integration Awareness**: Consider how the skill interacts with other components. -You can also load public skills manually and have more control: +### Steps to Contribute a Global Skill -```python icon="python" -from openhands.sdk.context.skills import load_public_skills +#### 1. Plan the Global Skill -# Load all public skills -public_skills = load_public_skills() +Before creating a global skill, consider: -# Use with AgentContext -agent_context = AgentContext(skills=public_skills) +- What specific problem or use case will it address? +- What unique capabilities or knowledge should it have? +- What trigger words make sense for activating it? +- What constraints or guidelines should it follow? -# Or combine with custom skills -my_skills = [ - Skill(name="custom", content="Custom instructions", trigger=None) -] -agent_context = AgentContext(skills=my_skills + public_skills) -``` +#### 2. Create File -### Custom Skills Repository +Create a new Markdown file with a descriptive name in the official skill registry: +[github.com/OpenHands/extensions](https://github.com/OpenHands/extensions) -You can load skills from your own repository: +#### 3. Testing the Global Skill -```python icon="python" focus={3-7} -from openhands.sdk.context.skills import load_public_skills +- Test the agent with various prompts. +- Verify trigger words activate the agent correctly. +- Ensure instructions are clear and comprehensive. +- Check for potential conflicts and overlaps with existing agents. -# Load from a custom repository -custom_skills = load_public_skills( - repo_url="https://github.com/my-org/my-skills", - branch="main" -) -``` +#### 4. Submission Process -### How It Works +Submit a pull request with: -The `load_public_skills()` function uses git-based caching for efficiency: +- The new skill file. +- Updated documentation if needed. +- Description of the agent's purpose and capabilities. -- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` -- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date -- **Offline mode**: Uses the cached version if network is unavailable +### General Skills +Source: https://docs.openhands.dev/overview/skills/repo.md -This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. +## Usage - -Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. - +These skills are always loaded as part of the context. -## Customizing Agent Context +## Frontmatter Syntax -### Message Suffixes +The frontmatter for this type of skill is optional. -Append custom instructions to the system prompt or user messages via `AgentContext`: +Frontmatter should be enclosed in triple dashes (---) and may include the following fields: -```python icon="python" -agent_context = AgentContext( - system_message_suffix=""" - -Repository: my-project -Branch: feature/new-api - - """.strip(), - user_message_suffix="Remember to explain your reasoning." -) -``` +| Field | Description | Required | Default | +|-----------|-----------------------------------------|----------|----------------| +| `agent` | The agent this skill applies to | No | 'CodeActAgent' | -- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) -- **`user_message_suffix`**: Appended to each user message +## Creating a Repository Agent -### Replacing the Entire System Prompt +To create an effective repository agent, you can ask OpenHands to analyze your repository with a prompt like: -For complete control, provide a custom Jinja2 template via the `Agent` class: +``` +Please browse the repository, look at the documentation and relevant code, and understand the purpose of this repository. -```python icon="python" focus={6} -from openhands.sdk import Agent +Specifically, I want you to create an `AGENTS.md` file at the repository root. This file should contain succinct information that summarizes: +1. The purpose of this repository +2. The general setup of this repo +3. A brief description of the structure of this repo -agent = Agent( - llm=llm, - tools=tools, - system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path - system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} -) +Read all the GitHub workflows under .github/ of the repository (if this folder exists) to understand the CI checks (e.g., linter, pre-commit), and include those in the `AGENTS.md` file. ``` -**Custom template example** (`custom_system_prompt.j2`): +This approach helps OpenHands capture repository context efficiently, reducing the need for repeated searches during conversations and ensuring more accurate solutions. -```jinja2 -You are a helpful coding assistant for {{ repo_name }}. +## Example Content -{% if cli_mode %} -You are running in CLI mode. Keep responses concise. -{% endif %} +An `AGENTS.md` file should include: -Follow these guidelines: -- Write clean, well-documented code -- Consider edge cases and error handling -- Suggest tests when appropriate ``` +# Repository Purpose +This project is a TODO application that allows users to track TODO items. -**Key points:** -- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory -- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location -- Pass variables to the template via `system_prompt_kwargs` -- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt +# Setup Instructions +To set it up, you can run `npm run build`. -## Next Steps +# Repository Structure +- `/src`: Core application code +- `/tests`: Test suite +- `/docs`: Documentation +- `/.github`: CI/CD workflows -- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools -- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers -- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval +# CI/CD Workflows +- `lint.yml`: Runs ESLint on all JavaScript files +- `test.yml`: Runs the test suite on pull requests + +# Development Guidelines +Always make sure the tests are passing before committing changes. You can run the tests by running `npm run test`. +``` +[See more examples of general skills at OpenHands Skills registry.](https://github.com/OpenHands/extensions) diff --git a/llms.txt b/llms.txt index f2f60add..d1a8407d 100644 --- a/llms.txt +++ b/llms.txt @@ -2,7 +2,98 @@ > LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded. -## Agent SDK +The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI) +from the OpenHands Software Agent SDK. + +## OpenHands Web App Server + +- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) +- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. +- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. +- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK +- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). +- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) +- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands +- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). +- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. +- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. +- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. +- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) +- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands +- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. +- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. +- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands +- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) +- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands +- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) +- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). +- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents +- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. +- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) +- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. +- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. +- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. +- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) +- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you +- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands +- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). +- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. +- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. +- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle +- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). +- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. +- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. +- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. +- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. +- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. +- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. +- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. +- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) +- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. +- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. +- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. +- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands +- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) +- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples +- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase +- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) +- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task +- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker + +## OpenHands Cloud + +- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once +- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. +- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on +- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. +- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once +- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) +- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. +- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. +- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. +- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. +- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. + +## OpenHands CLI + +- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options +- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users +- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker +- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines +- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol +- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system +- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs +- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities +- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI +- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes +- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI +- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface +- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents +- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension +- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser +- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol + +## OpenHands Software Agent SDK - [Agent](https://docs.openhands.dev/sdk/arch/agent.md): High-level architecture of the reasoning-action loop - [Agent Server Package](https://docs.openhands.dev/sdk/arch/agent-server.md): HTTP API server for remote agent execution with workspace isolation, container orchestration, and multi-user support. @@ -75,89 +166,7 @@ - [Tool System & MCP](https://docs.openhands.dev/sdk/arch/tool-system.md): High-level architecture of the action-observation tool framework - [Workspace](https://docs.openhands.dev/sdk/arch/workspace.md): High-level architecture of the execution environment abstraction -## OpenHands - -- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) -- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. -- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. -- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK -- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). -- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) -- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once -- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. -- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on -- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands -- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options -- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). -- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. -- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users -- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. -- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. -- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) -- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands -- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. -- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. -- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands -- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) -- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. -- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once -- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) -- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands -- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) -- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). -- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker -- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines -- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol -- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents -- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system -- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. -- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs -- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. -- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. -- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) -- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. -- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. -- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. -- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. -- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) -- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities -- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you -- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands -- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). -- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. -- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI -- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. -- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle -- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). -- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. -- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. -- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. -- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. -- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. -- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes -- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. -- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. -- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. -- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI -- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) -- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. -- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. -- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. -- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. -- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands -- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface -- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents -- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) -- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples -- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension -- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase -- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser -- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) -- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task -- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker -- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol - -## Overview +## OpenHands Overview - [Community](https://docs.openhands.dev/overview/community.md): Learn about the OpenHands community, mission, and values - [Contributing](https://docs.openhands.dev/overview/contributing.md): Join us in building OpenHands and the future of AI. Learn how to contribute to make a meaningful impact. diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py index 9ab664e6..8d45a379 100755 --- a/scripts/generate-llms-files.py +++ b/scripts/generate-llms-files.py @@ -1,5 +1,46 @@ #!/usr/bin/env python3 +"""Generate custom `llms.txt` + `llms-full.txt` for the OpenHands docs site. + +Why this exists +-------------- +Mintlify automatically generates and hosts `/llms.txt` and `/llms-full.txt` for +Mintlify-backed documentation sites. + +For OpenHands, we want those files to provide **V1-only** context to LLMs while we +still keep some legacy V0 pages available for humans. In particular, we want to +exclude: + +- The legacy docs subtree under `openhands/usage/v0/` +- Any page whose filename starts with `V0*` + +Mintlify supports overriding the auto-generated files by committing `llms.txt` +(and/or `llms-full.txt`) to the repository root. + +References: +- Mintlify docs: https://www.mintlify.com/docs/ai/llmstxt +- llms.txt proposal: https://llmstxt.org/ + +How to use +---------- +Run from the repository root (this repo's `docs/` directory): + + ./scripts/generate-llms-files.py + +This will rewrite `./llms.txt` and `./llms-full.txt`. + +Design notes +------------ +- We only parse `title` and `description` from MDX frontmatter. +- We intentionally group OpenHands pages into sections that clearly distinguish: + - OpenHands CLI + - OpenHands Web App Server (incl. "Local GUI") + - OpenHands Cloud + - OpenHands Software Agent SDK + +""" + + from __future__ import annotations import re @@ -112,39 +153,95 @@ def iter_doc_pages() -> list[DocPage]: return pages -def group_name(rel_path: Path) -> str: - top = rel_path.parts[0] - return { - "overview": "Overview", - "openhands": "OpenHands", - "sdk": "Agent SDK", - }.get(top, top.replace("-", " ").title()) +LLMS_SECTION_ORDER = [ + "OpenHands Web App Server", + "OpenHands Cloud", + "OpenHands CLI", + "OpenHands Software Agent SDK", + "OpenHands Overview", + "Other", +] + + +def section_name(page: DocPage) -> str: + """Map a page to an `llms.txt` section. + + This is deliberately opinionated. The goal is to make it obvious to an LLM + what content is about: + + - the OpenHands CLI + - the OpenHands Web App + server (what the nav historically called "Local GUI") + - OpenHands Cloud + - the OpenHands Software Agent SDK + + """ + + route = page.route + + if route.startswith("/sdk"): + return "OpenHands Software Agent SDK" + + if route.startswith("/openhands/usage/cli"): + return "OpenHands CLI" + + if route.startswith("/openhands/usage/cloud"): + return "OpenHands Cloud" + + if route.startswith("/openhands/usage"): + return "OpenHands Web App Server" + + if route.startswith("/overview"): + return "OpenHands Overview" + + return "Other" + + +def _section_sort_key(section: str) -> tuple[int, str]: + """Stable ordering for llms sections, with a sane fallback.""" + + try: + return (LLMS_SECTION_ORDER.index(section), "") + except ValueError: + return (len(LLMS_SECTION_ORDER), section.lower()) def build_llms_txt(pages: list[DocPage]) -> str: + """Generate `llms.txt`. + + The format follows the llms.txt proposal: + - One H1 + - A short blockquote summary + - Optional non-heading text + - H2 sections containing bullet lists of links + + """ + grouped: dict[str, list[DocPage]] = {} - for p in pages: - grouped.setdefault(group_name(p.rel_path), []).append(p) + for page in pages: + grouped.setdefault(section_name(page), []).append(page) - for g in grouped: - grouped[g] = sorted(grouped[g], key=lambda x: (x.title.lower(), x.route)) + for section_pages in grouped.values(): + section_pages.sort(key=lambda p: (p.title.lower(), p.route)) lines: list[str] = [ "# OpenHands Docs", "", "> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded.", "", + "The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI)", + "from the OpenHands Software Agent SDK.", + "", ] - for group in sorted(grouped.keys()): - lines.append(f"## {group}") + for section in sorted(grouped.keys(), key=_section_sort_key): + lines.append(f"## {section}") lines.append("") - for p in grouped[group]: - url = f"{BASE_URL}{p.route}.md" - line = f"- [{p.title}]({url})" - if p.description: - line += f": {p.description}" + for page in grouped[section]: + url = f"{BASE_URL}{page.route}.md" + line = f"- [{page.title}]({url})" + if page.description: + line += f": {page.description}" lines.append(line) lines.append("") @@ -153,21 +250,42 @@ def build_llms_txt(pages: list[DocPage]) -> str: def build_llms_full_txt(pages: list[DocPage]) -> str: - header = [ + """Generate `llms-full.txt`. + + This is meant to be copy/pasteable context for AI tools. + + Unlike `llms.txt`, there is no strict spec for `llms-full.txt`, but we keep a + single H1, then use H2/H3 headings to make the document navigable. + + """ + + grouped: dict[str, list[DocPage]] = {} + for page in pages: + grouped.setdefault(section_name(page), []).append(page) + + for section_pages in grouped.values(): + section_pages.sort(key=lambda p: p.route) + + lines: list[str] = [ "# OpenHands Docs", "", "> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded.", "", ] - chunks: list[str] = ["\n".join(header).rstrip()] + for section in sorted(grouped.keys(), key=_section_sort_key): + lines.append(f"## {section}") + lines.append("") - for p in sorted(pages, key=lambda x: x.route): - chunks.append( - f"\n\n# {p.title}\nSource: {BASE_URL}{p.route}\n\n{p.body}\n" - ) + for page in grouped[section]: + lines.append(f"### {page.title}") + lines.append(f"Source: {BASE_URL}{page.route}.md") + lines.append("") + if page.body: + lines.append(page.body) + lines.append("") - return "".join(chunks).lstrip() + "\n" + return "\n".join(lines).rstrip() + "\n" def main() -> None: From 01dcfde1211c94149434290aea4c5051ffc614d7 Mon Sep 17 00:00:00 2001 From: openhands Date: Tue, 24 Feb 2026 09:10:30 +0000 Subject: [PATCH 3/6] docs: document llms override generation Co-authored-by: openhands --- AGENTS.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 022e2e0e..ae172910 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,6 +25,21 @@ The site is built with **Mintlify** and deployed automatically by Mintlify on pu - `.agents/skills/` — prompt extensions for agents editing this repo (legacy: `.openhands/skills/`; formerly `microagents`) - `tests/` — pytest checks for docs consistency (notably LLM pricing docs) + +## llms.txt / llms-full.txt (V1-only) + +Mintlify auto-generates `/llms.txt` and `/llms-full.txt`, but this repo **overrides** them by committing +`llms.txt` and `llms-full.txt` at the repo root. + +We do this so LLMs get **V1-only** context while legacy V0 pages remain available for humans. + +- Generator script: `scripts/generate-llms-files.py` +- Regenerate: + ```bash + ./scripts/generate-llms-files.py + ``` +- Exclusions: `openhands/usage/v0/` and any `V0*`-prefixed page files. + ## Local development ### Preview the site From 73ea4ba8097c6f83183e4282653b646ae5278c11 Mon Sep 17 00:00:00 2001 From: openhands Date: Tue, 24 Feb 2026 20:48:26 +0000 Subject: [PATCH 4/6] docs: reorder llms sections (SDK, CLI, Web, Cloud) Co-authored-by: openhands --- llms-full.txt | 48706 +++++++++++++++---------------- llms.txt | 176 +- scripts/generate-llms-files.py | 4 +- 3 files changed, 24443 insertions(+), 24443 deletions(-) diff --git a/llms-full.txt b/llms-full.txt index 91325040..509b3bfd 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -2,19408 +2,20753 @@ > Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded. -## OpenHands Web App Server +## OpenHands Software Agent SDK -### About OpenHands -Source: https://docs.openhands.dev/openhands/usage/about.md +### Software Agent SDK +Source: https://docs.openhands.dev/sdk.md -## Research Strategy +The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. -Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves: +You can use the OpenHands Software Agent SDK for: -- **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling. -- **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization. -- **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our agents. +- One-off tasks, like building a README for your repo +- Routine maintenance tasks, like updating dependencies +- Major tasks that involve multiple agents, like refactors and rewrites -## Default Agent +You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). -Our default Agent is currently the [CodeActAgent](./agents), which is capable of generating code and handling files. +Get started with some examples or keep reading to learn more. -## Built With +## Features -OpenHands is built using a combination of powerful frameworks and libraries, providing a robust foundation for its -development. Here are the key technologies used in the project: + + + A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. + + + Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. + + + A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. + + -![FastAPI](https://img.shields.io/badge/FastAPI-black?style=for-the-badge) ![uvicorn](https://img.shields.io/badge/uvicorn-black?style=for-the-badge) ![LiteLLM](https://img.shields.io/badge/LiteLLM-black?style=for-the-badge) ![Docker](https://img.shields.io/badge/Docker-black?style=for-the-badge) ![Ruff](https://img.shields.io/badge/Ruff-black?style=for-the-badge) ![MyPy](https://img.shields.io/badge/MyPy-black?style=for-the-badge) ![LlamaIndex](https://img.shields.io/badge/LlamaIndex-black?style=for-the-badge) ![React](https://img.shields.io/badge/React-black?style=for-the-badge) +## Why OpenHands Software Agent SDK? -Please note that the selection of these technologies is in progress, and additional technologies may be added or -existing ones may be removed as the project evolves. We strive to adopt the most suitable and efficient tools to -enhance the capabilities of OpenHands. +### Emphasis on coding -## License +While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. -Distributed under MIT [License](https://github.com/OpenHands/OpenHands/blob/main/LICENSE). +While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. -### Configuration Options -Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md +### State-of-the-Art Performance - - This page documents the current V1 configuration model. +OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: - Legacy config.toml / “runtime” configuration docs have been moved - to the Legacy (V0) section of the Web tab. - +- Task planning and decomposition +- Automatic context compression +- Security analysis +- Strong agent-computer interfaces -## Where configuration lives in V1 +OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. -Most user-facing configuration is done via the **Settings** UI in the Web app -(LLM provider/model, integrations, MCP, secrets, etc.). +### Free and Open Source -For self-hosted deployments and advanced workflows, OpenHands also supports -environment-variable configuration. +OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. -## Common V1 environment variables +Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! -These are some commonly used variables in V1 deployments: +## Get Started -- **LLM credentials** - - LLM_API_KEY - - LLM_MODEL + + + Install the SDK, run your first agent, and explore the guides. + + -- **Persistence** - - OH_PERSISTENCE_DIR: where OpenHands stores local state (defaults to - ~/.openhands). +## Learn the SDK -- **Public URL (optional)** - - OH_WEB_URL: the externally reachable URL of your OpenHands instance - (used for callbacks in some deployments). + + + Understand the SDK's architecture: agents, tools, workspaces, and more. + + + Explore the complete SDK API and source code. + + -- **Sandbox workspace mounting** - - SANDBOX_VOLUMES: mount host directories into the sandbox (see - [Docker Sandbox](/openhands/usage/sandboxes/docker)). +## Build with Examples -- **Sandbox image selection** - - AGENT_SERVER_IMAGE_REPOSITORY - - AGENT_SERVER_IMAGE_TAG + + + Build local agents with custom tools and capabilities. + + + Run agents on remote servers with Docker sandboxing. + + + Automate repository tasks with agent-powered workflows. + + +## Community -## Sandbox provider selection + + + Connect with the OpenHands community on Slack. + + + Contribute to the SDK or report issues on GitHub. + + -Some deployments still use the legacy RUNTIME environment variable to -choose which sandbox provider to use: +### openhands.sdk.agent +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md -- RUNTIME=docker (default) -- RUNTIME=process (aka legacy RUNTIME=local) -- RUNTIME=remote +### class Agent -See [Sandboxes overview](/openhands/usage/sandboxes/overview) for details. +Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) -## Need legacy options? +Main agent implementation for OpenHands. -If you are looking for the old config.toml reference or V0 “runtime” -providers, see: +The Agent class provides the core functionality for running AI agents that can +interact with tools, process messages, and execute actions. It inherits from +AgentBase and implements the agent execution logic. Critic-related functionality +is provided by CriticMixin. -- Web → Legacy (V0) → V0 Configuration Options -- Web → Legacy (V0) → V0 Runtime Configuration +#### Example -### Custom Sandbox -Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md +```pycon +>>> from openhands.sdk import LLM, Agent, Tool +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] +>>> agent = Agent(llm=llm, tools=tools) +``` - - These settings are only available in [Local GUI](/openhands/usage/run-openhands/local-setup). OpenHands Cloud uses managed sandbox environments. - -The sandbox is where the agent performs its tasks. Instead of running commands directly on your computer -(which could be risky), the agent runs them inside a Docker container. +#### Properties -The default OpenHands sandbox (`python-nodejs:python3.12-nodejs22` -from [nikolaik/python-nodejs](https://hub.docker.com/r/nikolaik/python-nodejs)) comes with some packages installed such -as python and Node.js but may need other software installed by default. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -You have two options for customization: +#### Methods -- Use an existing image with the required software. -- Create your own custom Docker image. +#### init_state() -If you choose the first option, you can skip the `Create Your Docker Image` section. +Initialize conversation state. -## Create Your Docker Image +Invariants enforced by this method: +- If a SystemPromptEvent is already present, it must be within the first 3 -To create a custom Docker image, it must be Debian based. + events (index 0 or 1 in practice; index 2 is included in the scan window + to detect a user message appearing before the system prompt). +- A user MessageEvent should not appear before the SystemPromptEvent. -For example, if you want OpenHands to have `ruby` installed, you could create a `Dockerfile` with the following content: +These invariants keep event ordering predictable for downstream components +(condenser, UI, etc.) and also prevent accidentally materializing the full +event history during initialization. -```dockerfile -FROM nikolaik/python-nodejs:python3.12-nodejs22 +#### model_post_init() -# Install required packages -RUN apt-get update && apt-get install -y ruby -``` +This function is meant to behave like a BaseModel method to initialise private attributes. -Or you could use a Ruby-specific base image: +It takes context as an argument since that’s what pydantic-core passes when calling it. -```dockerfile -FROM ruby:latest -``` +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -Save this file in a folder. Then, build your Docker image (e.g., named custom-image) by navigating to the folder in -the terminal and running:: -```bash -docker build -t custom-image . -``` +#### step() -This will produce a new image called `custom-image`, which will be available in Docker. +Taking a step in the conversation. -## Using the Docker Command +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with -When running OpenHands using [the docker command](/openhands/usage/run-openhands/local-setup#start-the-app), replace -the `AGENT_SERVER_IMAGE_REPOSITORY` and `AGENT_SERVER_IMAGE_TAG` environment variables with `-e SANDBOX_BASE_CONTAINER_IMAGE=`: + LLM calls (role=”assistant”) and tool results (role=”tool”) -```commandline -docker run -it --rm --pull=always \ - -e SANDBOX_BASE_CONTAINER_IMAGE=custom-image \ - ... -``` +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step -## Using the Development Workflow +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. -### Setup +NOTE: state will be mutated in-place. -First, ensure you can run OpenHands by following the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md). +### class AgentBase -### Specify the Base Sandbox Image +Bases: `DiscriminatedUnionMixin`, `ABC` -In the `config.toml` file within the OpenHands directory, set the `base_container_image` to the image you want to use. -This can be an image you’ve already pulled or one you’ve built: +Abstract base class for OpenHands agents. -```bash -[core] -... -[sandbox] -base_container_image="custom-image" -``` - -### Additional Configuration Options +Agents are stateless and should be fully defined by their configuration. +This base class provides the common interface and functionality that all +agent implementations must follow. -The `config.toml` file supports several other options for customizing your sandbox: -```toml -[core] -# Install additional dependencies when the runtime is built -# Can contain any valid shell commands -# If you need the path to the Python interpreter in any of these commands, you can use the $OH_INTERPRETER_PATH variable -runtime_extra_deps = """ -pip install numpy pandas -apt-get update && apt-get install -y ffmpeg -""" +#### Properties -# Set environment variables for the runtime -# Useful for configuration that needs to be available at runtime -runtime_startup_env_vars = { DATABASE_URL = "postgresql://user:pass@localhost/db" } +- `agent_context`: AgentContext | None +- `condenser`: CondenserBase | None +- `critic`: CriticBase | None +- `dynamic_context`: str | None + Get the dynamic per-conversation context. + This returns the context that varies between conversations, such as: + - Repository information and skills + - Runtime information (hosts, working directory) + - User-specific secrets and settings + - Conversation instructions + This content should NOT be included in the cached system prompt to enable + cross-conversation cache sharing. Instead, it is sent as a second content + block (without a cache marker) inside the system message. + * Returns: + The dynamic context string, or None if no context is configured. +- `filter_tools_regex`: str | None +- `include_default_tools`: list[str] +- `llm`: LLM +- `mcp_config`: dict[str, Any] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str + Returns the name of the Agent. +- `prompt_dir`: str + Returns the directory where this class’s module file is located. +- `security_policy_filename`: str +- `static_system_message`: str + Compute the static portion of the system message. + This returns only the base system prompt template without any dynamic + per-conversation context. This static portion can be cached and reused + across conversations for better prompt caching efficiency. + * Returns: + The rendered system prompt template without dynamic context. +- `system_message`: str + Return the combined system message (static + dynamic). +- `system_prompt_filename`: str +- `system_prompt_kwargs`: dict[str, object] +- `tools`: list[Tool] +- `tools_map`: dictstr, [ToolDefinition] + Get the initialized tools map. + :raises RuntimeError: If the agent has not been initialized. -# Specify platform for multi-architecture builds (e.g., "linux/amd64" or "linux/arm64") -platform = "linux/amd64" -``` +#### Methods -### Run +#### get_all_llms() -Run OpenHands by running ```make run``` in the top level directory. +Recursively yield unique base-class LLM objects reachable from self. -### Search Engine Setup -Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md +- Returns actual object references (not copies). +- De-dupes by id(LLM). +- Cycle-safe via a visited set for all traversed objects. +- Only yields objects whose type is exactly LLM (no subclasses). +- Does not handle dataclasses. -## Setting Up Search Engine in OpenHands +#### init_state() -OpenHands can be configured to use [Tavily](https://tavily.com/) as a search engine, which allows the agent to -search the web for information when needed. This capability enhances the agent's ability to provide up-to-date -information and solve problems that require external knowledge. +Initialize the empty conversation state to prepare the agent for user +messages. - - Tavily is configured as a search engine by default in OpenHands Cloud! - +Typically this involves adding system message -### Getting a Tavily API Key +NOTE: state will be mutated in-place. -To use the search functionality in OpenHands, you'll need to obtain a Tavily API key: +#### model_dump_succint() -1. Visit [Tavily's website](https://tavily.com/) and sign up for an account. -2. Navigate to the API section in your dashboard. -3. Generate a new API key. -4. Copy the API key (it should start with `tvly-`). +Like model_dump, but excludes None fields by default. -### Configuring Search in OpenHands +#### model_post_init() -Once you have your Tavily API key, you can configure OpenHands to use it: +This function is meant to behave like a BaseModel method to initialise private attributes. -#### In the OpenHands UI +It takes context as an argument since that’s what pydantic-core passes when calling it. -1. Open OpenHands and navigate to the `Settings > LLM` page. -2. Enter your Tavily API key (starting with `tvly-`) in the `Search API Key (Tavily)` field. -3. Click `Save` to apply the changes. +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. - - The search API key field is optional. If you don't provide a key, the search functionality will not be available to - the agent. - +#### abstractmethod step() -#### Using Configuration Files +Taking a step in the conversation. -If you're running OpenHands in headless mode or via CLI, you can configure the search API key in your configuration file: +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with -```toml -# In your OpenHands config file -[core] -search_api_key = "tvly-your-api-key-here" -``` + LLM calls (role=”assistant”) and tool results (role=”tool”) -### How Search Works in OpenHands +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step -When the search engine is configured: +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. -- The agent can decide to search the web when it needs external information. -- Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which - includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map). -- Results are returned and incorporated into the agent's context. -- The agent can use this information to provide more accurate and up-to-date responses. +NOTE: state will be mutated in-place. -### Limitations +#### Deprecated +Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and +[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation +enables cross-conversation prompt caching. Will be removed in 1.16.0. -- Search results depend on Tavily's coverage and freshness. -- Usage may be subject to Tavily's rate limits and pricing tiers. -- The agent will only search when it determines that external information is needed. +#### WARNING +Using this property DISABLES cross-conversation prompt caching because +it combines static and dynamic content into a single string. Use +[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately +to enable caching. -### Troubleshooting +#### Deprecated +Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. -If you encounter issues with the search functionality: +#### verify() -- Verify that your API key is correct and active. -- Check that your API key starts with `tvly-`. -- Ensure you have an active internet connection. -- Check Tavily's status page for any service disruptions. +Verify that we can resume this agent from persisted state. -### Main Agent and Capabilities -Source: https://docs.openhands.dev/openhands/usage/agents.md +We do not merge configuration between persisted and runtime Agent +instances. Instead, we verify compatibility requirements and then +continue with the runtime-provided Agent. -## CodeActAgent +Compatibility requirements: +- Agent class/type must match. +- Tools must match exactly (same tool names). -### Description +Tools are part of the system prompt and cannot be changed mid-conversation. +To use different tools, start a new conversation or use conversation forking +(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). -This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a -unified **code** action space for both _simplicity_ and _performance_. +All other configuration (LLM, agent_context, condenser, etc.) can be +freely changed between sessions. -The conceptual idea is illustrated below. At each turn, the agent can: +* Parameters: + * `persisted` – The agent loaded from persisted state. + * `events` – Unused, kept for API compatibility. +* Returns: + This runtime agent (self) if verification passes. +* Raises: + `ValueError` – If agent class or tools don’t match. -1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. -2. **CodeAct**: Choose to perform the task by executing code +### openhands.sdk.conversation +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md -- Execute any valid Linux `bash` command -- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. +### class BaseConversation -![image](https://github.com/OpenHands/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) +Bases: `ABC` -### Demo +Abstract base class for conversation implementations. -https://github.com/OpenHands/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac +This class defines the interface that all conversation implementations must follow. +Conversations manage the interaction between users and agents, handling message +exchange, execution control, and state management. -_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_. -### REST API (V1) -Source: https://docs.openhands.dev/openhands/usage/api/v1.md +#### Properties - - OpenHands is in a transition period: legacy (V0) endpoints still exist alongside - the new /api/v1 endpoints. +- `confirmation_policy_active`: bool +- `conversation_stats`: ConversationStats +- `id`: UUID +- `is_confirmation_mode_active`: bool + Check if confirmation mode is active. + Returns True if BOTH conditions are met: + 1. The conversation state has a security analyzer set (not None) + 2. The confirmation policy is active +- `state`: ConversationStateProtocol - If you need the legacy OpenAPI reference, see the Legacy (V0) section in the Web tab. - +#### Methods -## Overview +#### __init__() -OpenHands V1 REST endpoints are mounted under: +Initialize the base conversation with span tracking. -- /api/v1 +#### abstractmethod ask_agent() -These endpoints back the current Web UI and are intended for newer integrations. +Ask the agent a simple, stateless question and get a direct LLM response. -## Key resources +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. -The V1 API is organized around a few core concepts: +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent -- **App conversations**: create/list conversations and access conversation metadata. - - POST /api/v1/app-conversations - - GET /api/v1/app-conversations +#### abstractmethod close() -- **Sandboxes**: list/start/pause/resume the execution environments that power conversations. - - GET /api/v1/sandboxes/search - - POST /api/v1/sandboxes - - POST /api/v1/sandboxes/{id}/pause - - POST /api/v1/sandboxes/{id}/resume +#### static compose_callbacks() -- **Sandbox specs**: list the available sandbox “templates” (e.g., Docker image presets). - - GET /api/v1/sandbox-specs/search +Compose multiple callbacks into a single callback function. -### Backend Architecture -Source: https://docs.openhands.dev/openhands/usage/architecture/backend.md +* Parameters: + `callbacks` – An iterable of callback functions +* Returns: + A single callback function that calls all provided callbacks -This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents. +#### abstractmethod condense() -# System overview +Force condensation of the conversation history. -```mermaid -flowchart LR - U["User"] --> FE["Frontend (SPA)"] - FE -- "HTTP/WS" --> BE["OpenHands Backend"] - BE --> ES["EventStream"] - BE --> ST["Storage"] - BE --> RT["Runtime Interface"] - BE --> LLM["LLM Providers"] +This method uses the existing condensation request pattern to trigger +condensation. It adds a CondensationRequest event to the conversation +and forces the agent to take a single step to process it. - subgraph Runtime - direction TB - RT --> DRT["Docker Runtime"] - RT --> LRT["Local Runtime"] - RT --> RRT["Remote Runtime"] - DRT --> AES["Action Execution Server"] - LRT --> AES - RRT --> AES - AES --> Bash["Bash Session"] - AES --> Jupyter["Jupyter Plugin"] - AES --> Browser["BrowserEnv"] - end -``` +The condensation will be applied immediately and will modify the conversation +state by adding a condensation event to the history. -This Overview is simplified to show the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below. +* Raises: + `ValueError` – If no condenser is configured or the condenser doesn’t + handle condensation requests. -# Backend Architecture +#### abstractmethod execute_tool() +Execute a tool directly without going through the agent loop. -```mermaid -classDiagram - class Agent { - <> - +sandbox_plugins: list[PluginRequirement] - } - class CodeActAgent { - +tools - } - Agent <|-- CodeActAgent +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. - class EventStream - class Observation - class Action - Action --> Observation - Agent --> EventStream +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. - class Runtime { - +connect() - +send_action_for_execution() - } - class ActionExecutionClient { - +_send_action_server_request() - } - class DockerRuntime - class LocalRuntime - class RemoteRuntime - Runtime <|-- ActionExecutionClient - ActionExecutionClient <|-- DockerRuntime - ActionExecutionClient <|-- LocalRuntime - ActionExecutionClient <|-- RemoteRuntime +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop - class ActionExecutionServer { - +/execute_action - +/alive - } - class BashSession - class JupyterPlugin - class BrowserEnv - ActionExecutionServer --> BashSession - ActionExecutionServer --> JupyterPlugin - ActionExecutionServer --> BrowserEnv +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor - Agent --> Runtime - Runtime ..> ActionExecutionServer : REST -``` +#### abstractmethod generate_title() -
- Updating this Diagram -
- We maintain architecture diagrams inline with Mermaid in this MDX. +Generate a title for the conversation based on the first user message. - Guidance: - - Edit the Mermaid blocks directly (flowchart/classDiagram). - - Quote labels and edge text for GitHub preview compatibility. - - Keep relationships concise and reflect stable abstractions (agents, runtime client/server, plugins). - - Verify accuracy against code: - - openhands/runtime/impl/action_execution/action_execution_client.py - - openhands/runtime/impl/docker/docker_runtime.py - - openhands/runtime/impl/local/local_runtime.py - - openhands/runtime/action_execution_server.py - - openhands/runtime/plugins/* - - Build docs locally or view on GitHub to confirm diagrams render. +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. -
-
+#### static get_persistence_dir() -### Runtime Architecture -Source: https://docs.openhands.dev/openhands/usage/architecture/runtime.md +Get the persistence directory for the conversation. -The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action. -It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system. +* Parameters: + * `persistence_base_dir` – Base directory for persistence. Can be a string + path or Path object. + * `conversation_id` – Unique conversation ID. +* Returns: + String path to the conversation-specific persistence directory. + Always returns a normalized string path even if a Path was provided. -## Why do we need a sandboxed runtime? +#### abstractmethod pause() -OpenHands needs to execute arbitrary code in a secure, isolated environment for several reasons: +#### abstractmethod reject_pending_actions() -1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources -2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues -3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system -4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system -5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable +#### abstractmethod run() -## How does the Runtime work? +Execute the agent to process messages and perform actions. -The OpenHands Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works: +This method runs the agent until it finishes processing the current +message or reaches the maximum iteration limit. -```mermaid -graph TD - A[User-provided Custom Docker Image] --> B[OpenHands Backend] - B -->|Builds| C[OH Runtime Image] - C -->|Launches| D[Action Executor] - D -->|Initializes| E[Browser] - D -->|Initializes| F[Bash Shell] - D -->|Initializes| G[Plugins] - G -->|Initializes| L[Jupyter Server] +#### abstractmethod send_message() - B -->|Spawn| H[Agent] - B -->|Spawn| I[EventStream] - I <--->|Execute Action to - Get Observation - via REST API - | D +Send a message to the agent. - H -->|Generate Action| I - I -->|Obtain Observation| H +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. - subgraph "Docker Container" - D - E - F - G - L - end -``` +#### abstractmethod set_confirmation_policy() -1. User Input: The user provides a custom base Docker image -2. Image Building: OpenHands builds a new Docker image (the "OH runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client" -3. Container Launch: When OpenHands starts, it launches a Docker container using the OH runtime image -4. Action Execution Server Initialization: The action execution server initializes an `ActionExecutor` inside the container, setting up necessary components like a bash shell and loading any specified plugins -5. Communication: The OpenHands backend (client: `openhands/runtime/impl/action_execution/action_execution_client.py`; runtimes: `openhands/runtime/impl/docker/docker_runtime.py`, `openhands/runtime/impl/local/local_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations -6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations -7. Observation Return: The action execution server sends execution results back to the OpenHands backend as observations +Set the confirmation policy for the conversation. -The role of the client: +#### abstractmethod set_security_analyzer() -- It acts as an intermediary between the OpenHands backend and the sandboxed environment -- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container -- It manages the state of the sandboxed environment, including the current working directory and loaded plugins -- It formats and returns observations to the backend, ensuring a consistent interface for processing results +Set the security analyzer for the conversation. -## How OpenHands builds and maintains OH Runtime images +#### abstractmethod update_secrets() -OpenHands' approach to building and managing runtime images ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments. +### class Conversation -Check out the [relevant code](https://github.com/OpenHands/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details. +### class Conversation -### Image Tagging System +Bases: `object` -OpenHands uses a three-tag system for its runtime images to balance reproducibility with flexibility. -The tags are: +Factory class for creating conversation instances with OpenHands agents. -- **Versioned Tag**: `oh_v{openhands_version}_{base_image}` (e.g.: `oh_v0.9.9_nikolaik_s_python-nodejs_t_python3.12-nodejs22`) -- **Lock Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}` (e.g.: `oh_v0.9.9_1234567890abcdef`) -- **Source Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}_{16_digit_source_hash}` - (e.g.: `oh_v0.9.9_1234567890abcdef_1234567890abcdef`) +This factory automatically creates either a LocalConversation or RemoteConversation +based on the workspace type provided. LocalConversation runs the agent locally, +while RemoteConversation connects to a remote agent server. -#### Source Tag - Most Specific +* Returns: + LocalConversation if workspace is local, RemoteConversation if workspace + is remote. -This is the first 16 digits of the MD5 of the directory hash for the source directory. This gives a hash -for only the openhands source +#### Example -#### Lock Tag +```pycon +>>> from openhands.sdk import LLM, Agent, Conversation +>>> from openhands.sdk.plugin import PluginSource +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> agent = Agent(llm=llm, tools=[]) +>>> conversation = Conversation( +... agent=agent, +... workspace="./workspace", +... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], +... ) +>>> conversation.send_message("Hello!") +>>> conversation.run() +``` -This hash is built from the first 16 digits of the MD5 of: +### class ConversationExecutionStatus -- The name of the base image upon which the image was built (e.g.: `nikolaik/python-nodejs:python3.12-nodejs22`) -- The content of the `pyproject.toml` included in the image. -- The content of the `poetry.lock` included in the image. +Bases: `str`, `Enum` -This effectively gives a hash for the dependencies of Openhands independent of the source code. +Enum representing the current execution state of the conversation. -#### Versioned Tag - Most Generic +#### Methods -This tag is a concatenation of openhands version and the base image name (transformed to fit in tag standard). +#### DELETING = 'deleting' -#### Build Process +#### ERROR = 'error' -When generating an image... +#### FINISHED = 'finished' -- **No re-build**: OpenHands first checks whether an image with the same **most specific source tag** exists. If there is such an image, - no build is performed - the existing image is used. -- **Fastest re-build**: OpenHands next checks whether an image with the **generic lock tag** exists. If there is such an image, - OpenHands builds a new image based upon it, bypassing all installation steps (like `poetry install` and - `apt-get`) except a final operation to copy the current source code. The new image is tagged with a - **source** tag only. -- **Ok-ish re-build**: If neither a **source** nor **lock** tag exists, an image will be built based upon the **versioned** tag image. - In versioned tag image, most dependencies should already been installed hence saving time. -- **Slowest re-build**: If all of the three tags don't exists, a brand new image is built based upon the base - image (Which is a slower operation). This new image is tagged with all the **source**, **lock**, and **versioned** tags. +#### IDLE = 'idle' -This tagging approach allows OpenHands to efficiently manage both development and production environments. +#### PAUSED = 'paused' -1. Identical source code and Dockerfile always produce the same image (via hash-based tags) -2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images) -3. The **lock** tag (e.g., `runtime:oh_v0.9.3_1234567890abcdef`) always points to the latest build for a particular base image, dependency, and OpenHands version combination +#### RUNNING = 'running' -## Volume mounts: named volumes and overlay +#### STUCK = 'stuck' -OpenHands supports both bind mounts and Docker named volumes in SandboxConfig.volumes: +#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' -- Bind mount: "/abs/host/path:/container/path[:mode]" -- Named volume: "volume:``:/container/path[:mode]" or any non-absolute host spec treated as a named volume +#### is_terminal() -Overlay mode (copy-on-write layer) is supported for bind mounts by appending ":overlay" to the mode (e.g., ":ro,overlay"). -To enable overlay COW, set SANDBOX_VOLUME_OVERLAYS to a writable host directory; per-container upper/work dirs are created under it. If SANDBOX_VOLUME_OVERLAYS is unset, overlay mounts are skipped. +Check if this status represents a terminal state. -Implementation references: -- openhands/runtime/impl/docker/docker_runtime.py (named volumes in _build_docker_run_args; overlay mounts in _process_overlay_mounts) -- openhands/core/config/sandbox_config.py (volumes field) +Terminal states indicate the run has completed and the agent is no longer +actively processing. These are: FINISHED, ERROR, STUCK. +Note: IDLE is NOT a terminal state - it’s the initial state of a conversation +before any run has started. Including IDLE would cause false positives when +the WebSocket delivers the initial state update during connection. -## Runtime Plugin System +* Returns: + True if this is a terminal status, False otherwise. -The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the action execution server starts up inside the runtime. +### class ConversationState -## Ports and URLs +Bases: `OpenHandsModel` -- Host port allocation uses file-locked ranges for stability and concurrency: - - Main runtime port: find_available_port_with_lock on configured range - - VSCode port: SandboxConfig.sandbox.vscode_port if provided, else find_available_port_with_lock in VSCODE_PORT_RANGE - - App ports: two additional ranges for plugin/web apps -- DOCKER_HOST_ADDR (if set) adjusts how URLs are formed for LocalRuntime/Docker environments. -- VSCode URL is exposed with a connection token from the action execution server endpoint /vscode/connection_token and rendered as: - - Docker/Local: `http://localhost:{port}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` - - RemoteRuntime: `scheme://vscode-{host}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` -References: -- openhands/runtime/impl/docker/docker_runtime.py (port ranges, locking, DOCKER_HOST_ADDR, vscode_url) -- openhands/runtime/impl/local/local_runtime.py (vscode_url factory) -- openhands/runtime/impl/remote/remote_runtime.py (vscode_url mapping) -- openhands/runtime/action_execution_server.py (/vscode/connection_token) +#### Properties +- `activated_knowledge_skills`: list[str] +- `agent`: AgentBase +- `agent_state`: dict[str, Any] +- `blocked_actions`: dict[str, str] +- `blocked_messages`: dict[str, str] +- `confirmation_policy`: ConfirmationPolicyBase +- `env_observation_persistence_dir`: str | None + Directory for persisting environment observation files. +- `events`: [EventLog](#class-eventlog) +- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) +- `id`: UUID +- `max_iterations`: int +- `persistence_dir`: str | None +- `secret_registry`: [SecretRegistry](#class-secretregistry) +- `security_analyzer`: SecurityAnalyzerBase | None +- `stats`: ConversationStats +- `stuck_detection`: bool +- `workspace`: BaseWorkspace -Examples: -- Jupyter: openhands/runtime/plugins/jupyter/__init__.py (JupyterPlugin, Kernel Gateway) -- VS Code: openhands/runtime/plugins/vscode/* (VSCodePlugin, exposes tokenized URL) -- Agent Skills: openhands/runtime/plugins/agent_skills/* +#### Methods -Key aspects of the plugin system: +#### acquire() -1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class -2. Plugin Registration: Available plugins are registered in `openhands/runtime/plugins/__init__.py` via `ALL_PLUGINS` -3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime -4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions -5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping +Acquire the lock. -### Repository Customization -Source: https://docs.openhands.dev/openhands/usage/customization/repository.md +* Parameters: + * `blocking` – If True, block until lock is acquired. If False, return + immediately. + * `timeout` – Maximum time to wait for lock (ignored if blocking=False). + -1 means wait indefinitely. +* Returns: + True if lock was acquired, False otherwise. -## Skills (formerly Microagents) +#### block_action() -Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands -should function. See [Skills Overview](/overview/skills) for more information. +Persistently record a hook-blocked action. +#### block_message() -## Setup Script -You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. -This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. +Persistently record a hook-blocked user message. -For example: -```bash -#!/bin/bash -export MY_ENV_VAR="my value" -sudo apt-get update -sudo apt-get install -y lsof -cd frontend && npm install ; cd .. -``` +#### classmethod create() -## Pre-commit Script -You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. -This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. +Create a new conversation state or resume from persistence. -For example: -```bash -#!/bin/bash -# Run linting checks -cd frontend && npm run lint -if [ $? -ne 0 ]; then - echo "Frontend linting failed. Please fix the issues before committing." - exit 1 -fi +This factory method handles both new conversation creation and resumption +from persisted state. -# Run tests -cd backend && pytest tests/unit -if [ $? -ne 0 ]; then - echo "Backend tests failed. Please fix the issues before committing." - exit 1 -fi +New conversation: +The provided Agent is used directly. Pydantic validation happens via the +cls() constructor. -exit 0 -``` +Restored conversation: +The provided Agent is validated against the persisted agent using +agent.load(). Tools must match (they may have been used in conversation +history), but all other configuration can be freely changed: LLM, +agent_context, condenser, system prompts, etc. -### Debugging -Source: https://docs.openhands.dev/openhands/usage/developers/debugging.md +* Parameters: + * `id` – Unique conversation identifier + * `agent` – The Agent to use (tools must match persisted on restore) + * `workspace` – Working directory for agent operations + * `persistence_dir` – Directory for persisting state and events + * `max_iterations` – Maximum iterations per run + * `stuck_detection` – Whether to enable stuck detection + * `cipher` – Optional cipher for encrypting/decrypting secrets in + persisted state. If provided, secrets are encrypted when + saving and decrypted when loading. If not provided, secrets + are redacted (lost) on serialization. +* Returns: + ConversationState ready for use +* Raises: + * `ValueError` – If conversation ID or tools mismatch on restore + * `ValidationError` – If agent or other fields fail Pydantic validation -The following is intended as a primer on debugging OpenHands for Development purposes. +#### static get_unmatched_actions() -## Server / VSCode +Find actions in the event history that don’t have matching observations. -The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: +This method identifies ActionEvents that don’t have corresponding +ObservationEvents or UserRejectObservations, which typically indicates +actions that are pending confirmation or execution. -``` -{ - "version": "0.2.0", - "configurations": [ - { - "name": "OpenHands CLI", - "type": "debugpy", - "request": "launch", - "module": "openhands.cli.main", - "justMyCode": false - }, - { - "name": "OpenHands WebApp", - "type": "debugpy", - "request": "launch", - "module": "uvicorn", - "args": [ - "openhands.server.listen:app", - "--reload", - "--reload-exclude", - "${workspaceFolder}/workspace", - "--port", - "3000" - ], - "justMyCode": false - } - ] -} -``` +* Parameters: + `events` – List of events to search through +* Returns: + List of ActionEvent objects that don’t have corresponding observations, + in chronological order -More specific debugging configurations which include more parameters may be specified: +#### locked() -``` - ... - { - "name": "Debug CodeAct", - "type": "debugpy", - "request": "launch", - "module": "openhands.core.main", - "args": [ - "-t", - "Ask me what your task is.", - "-d", - "${workspaceFolder}/workspace", - "-c", - "CodeActAgent", - "-l", - "llm.o1", - "-n", - "prompts" - ], - "justMyCode": false - } - ... -``` +Return True if the lock is currently held by any thread. -Values in the snippet above can be updated such that: +#### model_config = (configuration object) - * *t*: the task - * *d*: the openhands workspace directory - * *c*: the agent - * *l*: the LLM config (pre-defined in config.toml) - * *n*: session name (e.g. eventstream name) +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### Development Overview -Source: https://docs.openhands.dev/openhands/usage/developers/development-overview.md +#### model_post_init() -## Core Documentation +This function is meant to behave like a BaseModel method to initialise private attributes. -### Project Fundamentals -- **Main Project Overview** (`/README.md`) - The primary entry point for understanding OpenHands, including features and basic setup instructions. +It takes context as an argument since that’s what pydantic-core passes when calling it. -- **Development Guide** (`/Development.md`) - Guide for developers working on OpenHands, including setup, requirements, and development workflows. +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -- **Contributing Guidelines** (`/CONTRIBUTING.md`) - Essential information for contributors, covering code style, PR process, and contribution workflows. +#### owned() -### Component Documentation +Return True if the lock is currently held by the calling thread. -#### Frontend -- **Frontend Application** (`/frontend/README.md`) - Complete guide for setting up and developing the React-based frontend application. +#### pop_blocked_action() -#### Backend -- **Backend Implementation** (`/openhands/README.md`) - Detailed documentation of the Python backend implementation and architecture. +Remove and return a hook-blocked action reason, if present. -- **Server Documentation** (`/openhands/server/README.md`) - Server implementation details, API documentation, and service architecture. +#### pop_blocked_message() -- **Runtime Environment** (`/openhands/runtime/README.md`) - Documentation covering the runtime environment, execution model, and runtime configurations. +Remove and return a hook-blocked message reason, if present. -#### Infrastructure -- **Container Documentation** (`/containers/README.md`) - Information about Docker containers, deployment strategies, and container management. +#### release() -### Testing and Evaluation -- **Unit Testing Guide** (`/tests/unit/README.md`) - Instructions for writing, running, and maintaining unit tests. +Release the lock. -- **Evaluation Framework** (`/evaluation/README.md`) - Documentation for the evaluation framework, benchmarks, and performance testing. +* Raises: + `RuntimeError` – If the current thread doesn’t own the lock. -### Advanced Features -- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) - Detailed information about the skills architecture, implementation, and usage. +#### set_on_state_change() -### Documentation Standards -- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) - Standards and guidelines for writing and maintaining project documentation. +Set a callback to be called when state changes. -## Getting Started with Development +* Parameters: + `callback` – A function that takes an Event (ConversationStateUpdateEvent) + or None to remove the callback -If you're new to developing with OpenHands, we recommend following this sequence: +### class ConversationVisualizerBase -1. Start with the main `README.md` to understand the project's purpose and features -2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute -3. Follow the setup instructions in `Development.md` -4. Dive into specific component documentation based on your area of interest: - - Frontend developers should focus on `/frontend/README.md` - - Backend developers should start with `/openhands/README.md` - - Infrastructure work should begin with `/containers/README.md` +Bases: `ABC` -## Documentation Updates +Base class for conversation visualizers. -When making changes to the codebase, please ensure that: -1. Relevant documentation is updated to reflect your changes -2. New features are documented in the appropriate README files -3. Any API changes are reflected in the server documentation -4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` +This abstract base class defines the interface that all conversation visualizers +must implement. Visualizers can be created before the Conversation is initialized +and will be configured with the conversation state automatically. -### Evaluation Harness -Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md +The typical usage pattern: +1. Create a visualizer instance: -This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. + viz = MyVisualizer() +1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) +2. Conversation automatically calls viz.initialize(state) to attach the state -## Setup Environment and LLM Configuration +You can also pass the uninstantiated class if you don’t need extra args +: for initialization, and Conversation will create it: + : conv = Conversation(agent, visualizer=MyVisualizer) -Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. -OpenHands in development mode uses `config.toml` to keep track of most configurations. +Conversation will then calls MyVisualizer() followed by initialize(state) -Here's an example configuration file you can use to define and use multiple LLMs: -```toml -[llm] -# IMPORTANT: add your API key here, and set the model to the one you want to evaluate -model = "claude-3-5-sonnet-20241022" -api_key = "sk-XXX" +#### Properties -[llm.eval_gpt4_1106_preview_llm] -model = "gpt-4-1106-preview" -api_key = "XXX" -temperature = 0.0 +- `conversation_stats`: ConversationStats | None + Get conversation stats from the state. -[llm.eval_some_openai_compatible_model_llm] -model = "openai/MODEL_NAME" -base_url = "https://OPENAI_COMPATIBLE_URL/v1" -api_key = "XXX" -temperature = 0.0 -``` +#### Methods +#### __init__() -## How to use OpenHands in the command line +Initialize the visualizer base. -OpenHands can be run from the command line using the following format: +#### create_sub_visualizer() -```bash -poetry run python ./openhands/core/main.py \ - -i \ - -t "" \ - -c \ - -l -``` +Create a visualizer for a sub-agent during delegation. -For example: +Override this method to support sub-agent visualization in multi-agent +delegation scenarios. The sub-visualizer will be used to display events +from the spawned sub-agent. -```bash -poetry run python ./openhands/core/main.py \ - -i 10 \ - -t "Write me a bash script that prints hello world." \ - -c CodeActAgent \ - -l llm -``` +By default, returns None which means sub-agents will not have visualization. +Subclasses that support delegation (like DelegationVisualizer) should +override this method to create appropriate sub-visualizers. -This command runs OpenHands with: -- A maximum of 10 iterations -- The specified task description -- Using the CodeActAgent -- With the LLM configuration defined in the `llm` section of your `config.toml` file +* Parameters: + `agent_id` – The identifier of the sub-agent being spawned +* Returns: + A visualizer instance for the sub-agent, or None if sub-agent + visualization is not supported -## How does OpenHands work +#### final initialize() -The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: +Initialize the visualizer with conversation state. -1. Parse command-line arguments and load the configuration -2. Create a runtime environment using `create_runtime()` -3. Initialize the specified agent -4. Run the controller using `run_controller()`, which: - - Attaches the runtime to the agent - - Executes the agent's task - - Returns a final state when complete +This method is called by Conversation after the state is created, +allowing the visualizer to access conversation stats and other +state information. -The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. +Subclasses should not override this method, to ensure the state is set. +* Parameters: + `state` – The conversation state object -## Easiest way to get started: Exploring Existing Benchmarks +#### abstractmethod on_event() -We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. +Handle a conversation event. -To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. +This method is called for each event in the conversation and should +implement the visualization logic. -## How to create an evaluation workflow +* Parameters: + `event` – The event to visualize +### class DefaultConversationVisualizer -To create an evaluation workflow for your benchmark, follow these steps: +Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) -1. Import relevant OpenHands utilities: - ```python - import openhands.agenthub - from evaluation.utils.shared import ( - EvalMetadata, - EvalOutput, - make_metadata, - prepare_dataset, - reset_logger_for_multiprocessing, - run_evaluation, - ) - from openhands.controller.state.state import State - from openhands.core.config import ( - AppConfig, - SandboxConfig, - get_llm_config_arg, - parse_arguments, - ) - from openhands.core.logger import openhands_logger as logger - from openhands.core.main import create_runtime, run_controller - from openhands.events.action import CmdRunAction - from openhands.events.observation import CmdOutputObservation, ErrorObservation - from openhands.runtime.runtime import Runtime - ``` +Handles visualization of conversation events with Rich formatting. -2. Create a configuration: - ```python - def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: - config = AppConfig( - default_agent=metadata.agent_class, - runtime='docker', - max_iterations=metadata.max_iterations, - sandbox=SandboxConfig( - base_container_image='your_container_image', - enable_auto_lint=True, - timeout=300, - ), - ) - config.set_llm_config(metadata.llm_config) - return config - ``` +Provides Rich-formatted output with semantic dividers and complete content display. -3. Initialize the runtime and set up the evaluation environment: - ```python - def initialize_runtime(runtime: Runtime, instance: pd.Series): - # Set up your evaluation environment here - # For example, setting environment variables, preparing files, etc. - pass - ``` +#### Methods -4. Create a function to process each instance: - ```python - from openhands.utils.async_utils import call_async_from_sync - def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: - config = get_config(instance, metadata) - runtime = create_runtime(config) - call_async_from_sync(runtime.connect) - initialize_runtime(runtime, instance) +#### __init__() - instruction = get_instruction(instance, metadata) +Initialize the visualizer. - state = run_controller( - config=config, - task_str=instruction, - runtime=runtime, - fake_user_response_fn=your_user_response_function, - ) +* Parameters: + * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles + for highlighting keywords in the visualizer. + For example: (configuration object) + * `skip_user_messages` – If True, skip displaying user messages. Useful for + scenarios where user input is not relevant to show. - # Evaluate the agent's actions - evaluation_result = await evaluate_agent_actions(runtime, instance) +#### on_event() - return EvalOutput( - instance_id=instance.instance_id, - instruction=instruction, - test_result=evaluation_result, - metadata=metadata, - history=compatibility_for_eval_history_pairs(state.history), - metrics=state.metrics.get() if state.metrics else None, - error=state.last_error if state and state.last_error else None, - ) - ``` +Main event handler that displays events with Rich formatting. -5. Run the evaluation: - ```python - metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) - output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') - instances = prepare_dataset(your_dataset, output_file, eval_n_limit) - - await run_evaluation( - instances, - metadata, - output_file, - num_workers, - process_instance - ) - ``` +### class EventLog -This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. +Bases: [`EventsListBase`](#class-eventslistbase) -Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. +Persistent event log with locking for concurrent writes. -By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. +This class provides thread-safe and process-safe event storage using +the FileStore’s locking mechanism. Events are persisted to disk and +can be accessed by index or event ID. +#### Methods -## Understanding the `user_response_fn` +#### NOTE +For LocalFileStore, file locking via flock() does NOT work reliably +on NFS mounts or network filesystems. Users deploying with shared +storage should use alternative coordination mechanisms. -The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. +#### __init__() +#### append() -### Workflow and Interaction +Append an event with locking for thread/process safety. -The correct workflow for handling actions and the `user_response_fn` is as follows: +* Raises: + * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. + * `ValueError` – If an event with the same ID already exists. -1. Agent receives a task and starts processing -2. Agent emits an Action -3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): - - The Runtime processes the Action - - Runtime returns an Observation -4. If the Action is not executable (typically a MessageAction): - - The `user_response_fn` is called - - It returns a simulated user response -5. The agent receives either the Observation or the simulated response -6. Steps 2-5 repeat until the task is completed or max iterations are reached +#### get_id() -Here's a more accurate visual representation: +Return the event_id for a given index. -``` - [Agent] - | - v - [Emit Action] - | - v - [Is Action Executable?] - / \ - Yes No - | | - v v - [Runtime] [user_response_fn] - | | - v v - [Return Observation] [Simulated Response] - \ / - \ / - v v - [Agent receives feedback] - | - v - [Continue or Complete Task] -``` +#### get_index() -In this workflow: +Return the integer index for a given event_id. -- Executable actions (like running commands or executing code) are handled directly by the Runtime -- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` -- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` +### class EventsListBase -This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. +Bases: `Sequence`[`Event`], `ABC` -### Example Implementation +Abstract base class for event lists that can be appended to. -Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: +This provides a common interface for both local EventLog and remote +RemoteEventsList implementations, avoiding circular imports in protocols. -```python -def codeact_user_response(state: State | None) -> str: - msg = ( - 'Please continue working on the task on whatever approach you think is suitable.\n' - 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' - 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' - ) +#### Methods - if state and state.history: - # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up - user_msgs = [ - event - for event in state.history - if isinstance(event, MessageAction) and event.source == 'user' - ] - if len(user_msgs) >= 2: - # let the agent know that it can give up when it has tried 3 times - return ( - msg - + 'If you want to give up, run: exit .\n' - ) - return msg -``` +#### abstractmethod append() -This function does the following: +Add a new event to the list. -1. Provides a standard message encouraging the agent to continue working -2. Checks how many times the agent has attempted to communicate with the user -3. If the agent has made multiple attempts, it provides an option to give up +### class LocalConversation -By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. +Bases: [`BaseConversation`](#class-baseconversation) -### WebSocket Connection -Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md -This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. +#### Properties -## Overview +- `agent`: AgentBase +- `delete_on_close`: bool = True +- `id`: UUID + Get the unique ID of the conversation. +- `llm_registry`: LLMRegistry +- `max_iteration_per_run`: int +- `resolved_plugins`: list[ResolvedPluginSource] | None + Get the resolved plugin sources after plugins are loaded. + Returns None if plugins haven’t been loaded yet, or if no plugins + were specified. Use this for persistence to ensure conversation + resume uses the exact same plugin versions. +- `state`: [ConversationState](#class-conversationstate) + Get the conversation state. + It returns a protocol that has a subset of ConversationState methods + and properties. We will have the ability to access the same properties + of ConversationState on a remote conversation object. + But we won’t be able to access methods that mutate the state. +- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None + Get the stuck detector instance if enabled. +- `workspace`: LocalWorkspace -OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: +#### Methods -1. Receive real-time events from the agent -2. Send user actions to the agent -3. Maintain a persistent connection for ongoing conversations +#### __init__() -## Connecting to the WebSocket +Initialize the conversation. -### Connection Parameters +* Parameters: + * `agent` – The agent to use for the conversation. + * `workspace` – Working directory for agent operations and tool execution. + Can be a string path, Path object, or LocalWorkspace instance. + * `plugins` – Optional list of plugins to load. Each plugin is specified + with a source (github:owner/repo, git URL, or local path), + optional ref (branch/tag/commit), and optional repo_path for + monorepos. Plugins are loaded in order with these merge + semantics: skills override by name (last wins), MCP config + override by key (last wins), hooks concatenate (all run). + * `persistence_dir` – Directory for persisting conversation state and events. + Can be a string path or Path object. + * `conversation_id` – Optional ID for the conversation. If provided, will + be used to identify the conversation. The user might want to + suffix their persistent filestore with this ID. + * `callbacks` – Optional list of callback functions to handle events + * `token_callbacks` – Optional list of callbacks invoked for streaming deltas + * `hook_config` – Optional hook configuration to auto-wire session hooks. + If plugins are loaded, their hooks are combined with this config. + * `max_iteration_per_run` – Maximum number of iterations per run + * `visualizer` – -When connecting to the WebSocket, you need to provide the following query parameters: + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `stuck_detection` – Whether to enable stuck detection + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted + state. If provided, secrets are encrypted when saving and + decrypted when loading. If not provided, secrets are redacted + (lost) on serialization. -- `conversation_id`: The ID of the conversation you want to join -- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) -- `providers_set`: (Optional) A comma-separated list of provider types +#### ask_agent() -### Connection Example +Ask the agent a simple, stateless question and get a direct LLM response. -Here's a basic example of connecting to the WebSocket using JavaScript: +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. -```javascript -import { io } from "socket.io-client"; +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent -const socket = io("http://localhost:3000", { - transports: ["websocket"], - query: { - conversation_id: "your-conversation-id", - latest_event_id: -1, - providers_set: "github,gitlab" // Optional - } -}); +#### close() -socket.on("connect", () => { - console.log("Connected to OpenHands WebSocket"); -}); +Close the conversation and clean up all tool executors. -socket.on("oh_event", (event) => { - console.log("Received event:", event); -}); +#### condense() -socket.on("connect_error", (error) => { - console.error("Connection error:", error); -}); +Synchronously force condense the conversation history. -socket.on("disconnect", (reason) => { - console.log("Disconnected:", reason); -}); -``` +If the agent is currently running, condense() will wait for the +ongoing step to finish before proceeding. -## Sending Actions to the Agent +Raises ValueError if no compatible condenser exists. -To send an action to the agent, use the `oh_user_action` event: +#### property conversation_stats -```javascript -// Send a user message to the agent -socket.emit("oh_user_action", { - type: "message", - source: "user", - message: "Hello, can you help me with my project?" -}); -``` +#### execute_tool() -## Receiving Events from the Agent +Execute a tool directly without going through the agent loop. -The server emits events using the `oh_event` event type. Here are some common event types you might receive: +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. -- User messages (`source: "user", type: "message"`) -- Agent messages (`source: "agent", type: "message"`) -- File edits (`action: "edit"`) -- File writes (`action: "write"`) -- Command executions (`action: "run"`) +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. -Example event handler: +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop -```javascript -socket.on("oh_event", (event) => { - if (event.source === "agent" && event.type === "message") { - console.log("Agent says:", event.message); - } else if (event.action === "run") { - console.log("Command executed:", event.args.command); - console.log("Result:", event.result); - } -}); -``` +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor -## Using Websocat for Testing +#### generate_title() -[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. +Generate a title for the conversation based on the first user message. -### Installation +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses self.agent.llm. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. -```bash -# On macOS -brew install websocat +#### pause() -# On Linux -curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat -chmod +x websocat -sudo mv websocat /usr/local/bin/ -``` +Pause agent execution. -### Connecting to the WebSocket +This method can be called from any thread to request that the agent +pause execution. The pause will take effect at the next iteration +of the run loop (between agent steps). -```bash -# Connect to the WebSocket and print all received messages -echo "40{}" | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +Note: If called during an LLM completion, the pause will not take +effect until the current LLM call completes. -### Sending a Message +#### reject_pending_actions() -```bash -# Send a message to the agent -echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +Reject all pending actions from the agent. -### Complete Example with Websocat +This is a non-invasive method to reject actions between run() calls. +Also clears the agent_waiting_for_confirmation flag. -Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: +#### run() -```bash -# Start a persistent connection -websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +Runs the conversation until the agent finishes. -# In another terminal, send a message -echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ -websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" -``` +In confirmation mode: +- First call: creates actions but doesn’t execute them, stops and waits +- Second call: executes pending actions (implicit confirmation) -## Event Structure +In normal mode: +- Creates and executes actions immediately -Events sent and received through the WebSocket follow a specific structure: +Can be paused between steps -```typescript -interface OpenHandsEvent { - id: string; // Unique event ID - source: string; // "user" or "agent" - timestamp: string; // ISO timestamp - message?: string; // For message events - type?: string; // Event type (e.g., "message") - action?: string; // Action type (e.g., "run", "edit", "write") - args?: any; // Action arguments - result?: any; // Action result -} -``` +#### send_message() -## Best Practices +Send a message to the agent. -1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. -2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. -3. **Error Handling**: Implement proper error handling for connection errors and failed actions. -4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. -## Troubleshooting +#### set_confirmation_policy() -### Connection Issues +Set the confirmation policy and store it in conversation state. -- Verify that the OpenHands server is running and accessible -- Check that you're providing the correct conversation ID -- Ensure your WebSocket URL is correctly formatted +#### set_security_analyzer() -### Authentication Issues +Set the security analyzer for the conversation. -- Make sure you have the necessary authentication cookies if required -- Verify that you have permission to access the specified conversation +#### update_secrets() -### Event Handling Issues +Add secrets to the conversation. -- Check that you're correctly parsing the event data -- Verify that your event handlers are properly registered +* Parameters: + `secrets` – Dictionary mapping secret keys to values or no-arg callables. + SecretValue = str | Callable[[], str]. Callables are invoked lazily + when a command references the secret key. -### Environment Variables Reference -Source: https://docs.openhands.dev/openhands/usage/environment-variables.md +### class RemoteConversation -This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. +Bases: [`BaseConversation`](#class-baseconversation) -## Environment Variable Naming Convention -OpenHands follows a consistent naming pattern for environment variables: +#### Properties -- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) -- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) -- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) -- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) -- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) +- `agent`: AgentBase +- `delete_on_close`: bool = False +- `id`: UUID +- `max_iteration_per_run`: int +- `state`: RemoteState + Access to remote conversation state. +- `workspace`: RemoteWorkspace -## Core Configuration Variables +#### Methods -These variables correspond to the `[core]` section in `config.toml`: +#### __init__() -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | -| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | -| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | -| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | -| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | -| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | -| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | -| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | -| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | -| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | -| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | -| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | -| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | -| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | -| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | -| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | -| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | +Remote conversation proxy that talks to an agent server. -## LLM Configuration Variables +* Parameters: + * `agent` – Agent configuration (will be sent to the server) + * `workspace` – The working directory for agent operations and tool execution. + * `plugins` – Optional list of plugins to load on the server. Each plugin + is a PluginSource specifying source, ref, and repo_path. + * `conversation_id` – Optional existing conversation id to attach to + * `callbacks` – Optional callbacks to receive events (not yet streamed) + * `max_iteration_per_run` – Max iterations configured on server + * `stuck_detection` – Whether to enable stuck detection on server + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `hook_config` – Optional hook configuration for session hooks + * `visualizer` – -These variables correspond to the `[llm]` section in `config.toml`: + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `secrets` – Optional secrets to initialize the conversation with -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | -| `LLM_API_KEY` | string | `""` | API key for the LLM provider | -| `LLM_BASE_URL` | string | `""` | Custom API base URL | -| `LLM_API_VERSION` | string | `""` | API version to use | -| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | -| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | -| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | -| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | -| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | -| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | -| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | -| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | -| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | -| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | -| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | -| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | -| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | -| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | -| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | -| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | -| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | -| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | +#### ask_agent() -### AWS Configuration -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | -| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | -| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | +Ask the agent a simple, stateless question and get a direct LLM response. -## Agent Configuration Variables +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. -These variables correspond to the `[agent]` section in `config.toml`: +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | -| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | -| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | -| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | -| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | -| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | -| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | -| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | +#### close() -## Sandbox Configuration Variables +Close the conversation and clean up resources. -These variables correspond to the `[sandbox]` section in `config.toml`: +Note: We don’t close self._client here because it’s shared with the workspace. +The workspace owns the client and will close it during its own cleanup. +Closing it here would prevent the workspace from making cleanup API calls. -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | -| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | -| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | -| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | -| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | -| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | -| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | -| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | -| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | -| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | -| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | -| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | -| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | -| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | -| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | -| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | -| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | -| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | -| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | -| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | +#### condense() -### Sandbox Environment Variables -Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: +Force condensation of the conversation history. -| Environment Variable | Description | -|---------------------|-------------| -| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | +This method sends a condensation request to the remote agent server. +The server will use the existing condensation request pattern to trigger +condensation if a condenser is configured and handles condensation requests. -## Security Configuration Variables +The condensation will be applied on the server side and will modify the +conversation state by adding a condensation event to the history. -These variables correspond to the `[security]` section in `config.toml`: +* Raises: + `HTTPError` – If the server returns an error (e.g., no condenser configured). -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | -| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | -| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | +#### property conversation_stats -## Debug and Logging Variables +#### execute_tool() -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `DEBUG` | boolean | `false` | Enable general debug logging | -| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | -| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | -| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | +Execute a tool directly without going through the agent loop. -## Runtime-Specific Variables +Note: This method is not yet supported for RemoteConversation. +Tool execution for remote conversations happens on the server side +during the normal agent loop. -### Docker Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | +* Parameters: + * `tool_name` – The name of the tool to execute + * `action` – The action to pass to the tool executor +* Raises: + `NotImplementedError` – Always, as this feature is not yet supported + for remote conversations. -### Remote Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | -| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | +#### generate_title() -### Local Runtime -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | -| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | -| `RUNTIME_ID` | string | `""` | Runtime identifier | -| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | +Generate a title for the conversation based on the first user message. -## Integration Variables +* Parameters: + * `llm` – Optional LLM to use for title generation. If provided, its usage_id + will be sent to the server. If not provided, uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. -### GitHub Integration -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | +#### pause() -### Third-Party API Keys -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `OPENAI_API_KEY` | string | `""` | OpenAI API key | -| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | -| `GOOGLE_API_KEY` | string | `""` | Google API key | -| `AZURE_API_KEY` | string | `""` | Azure API key | -| `TAVILY_API_KEY` | string | `""` | Tavily search API key | +#### reject_pending_actions() -## Server Configuration Variables +#### run() -These are primarily used when running OpenHands as a server: +Trigger a run on the server. -| Environment Variable | Type | Default | Description | -|---------------------|------|---------|-------------| -| `FRONTEND_PORT` | integer | `3000` | Frontend server port | -| `BACKEND_PORT` | integer | `8000` | Backend server port | -| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | -| `BACKEND_HOST` | string | `"localhost"` | Backend host address | -| `WEB_HOST` | string | `"localhost"` | Web server host | -| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | +* Parameters: + * `blocking` – If True (default), wait for the run to complete by polling + the server. If False, return immediately after triggering the run. + * `poll_interval` – Time in seconds between status polls (only used when + blocking=True). Default is 1.0 second. + * `timeout` – Maximum time in seconds to wait for the run to complete + (only used when blocking=True). Default is 3600 seconds. +* Raises: + `ConversationRunError` – If the run fails or times out. -## Deprecated Variables +#### send_message() -These variables are deprecated and should be replaced: +Send a message to the agent. -| Environment Variable | Replacement | Description | -|---------------------|-------------|-------------| -| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | -| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. -## Usage Examples +#### set_confirmation_policy() -### Basic Setup with OpenAI -```bash -export LLM_MODEL="gpt-4o" -export LLM_API_KEY="your-openai-api-key" -export DEBUG=true -``` +Set the confirmation policy for the conversation. -### Docker Deployment with Custom Volumes -```bash -export RUNTIME="docker" -export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" -export SANDBOX_TIMEOUT=300 -``` +#### set_security_analyzer() -### Remote Runtime Configuration -```bash -export RUNTIME="remote" -export SANDBOX_API_KEY="your-remote-api-key" -export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" -``` +Set the security analyzer for the remote conversation. -### Security-Enhanced Setup -```bash -export SECURITY_CONFIRMATION_MODE=true -export SECURITY_SECURITY_ANALYZER="llm" -export DEBUG_RUNTIME=true -``` +#### property stuck_detector -## Notes +Stuck detector for compatibility. +Not implemented for remote conversations. -1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). +#### update_secrets() -2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. +### class SecretRegistry -3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. +Bases: `OpenHandsModel` -4. **Precedence**: Environment variables take precedence over TOML configuration files. +Manages secrets and injects them into bash commands when needed. -5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: - ```bash - docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands - ``` +The secret registry stores a mapping of secret keys to SecretSources +that retrieve the actual secret values. When a bash command is about to be +executed, it scans the command for any secret keys and injects the corresponding +environment variables. -6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. +Secret sources will redact / encrypt their sensitive values as appropriate when +serializing, depending on the content of the context. If a context is present +and contains a ‘cipher’ object, this is used for encryption. If it contains a +boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. +Otherwise secrets are redacted. -### Good vs. Bad Instructions -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md +Additionally, it tracks the latest exported values to enable consistent masking +even when callable secrets fail on subsequent calls. -The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. -## Concrete Examples of Good/Bad Prompts +#### Properties -### Bug Fixing Examples +- `secret_sources`: dict[str, SecretSource] -#### Bad Example +#### Methods -``` -Fix the bug in my code. -``` +#### find_secrets_in_text() -**Why it's bad:** -- No information about what the bug is -- No indication of where to look -- No description of expected vs. actual behavior -- OpenHands would have to guess what's wrong +Find all secret keys mentioned in the given text. -#### Good Example +* Parameters: + `text` – The text to search for secret keys +* Returns: + Set of secret keys found in the text -``` -Fix the TypeError in src/api/users.py line 45. +#### get_secrets_as_env_vars() -Error message: -TypeError: 'NoneType' object has no attribute 'get' +Get secrets that should be exported as environment variables for a command. -Expected behavior: The get_user_preferences() function should return -default preferences when the user has no saved preferences. +* Parameters: + `command` – The bash command to check for secret references +* Returns: + Dictionary of environment variables to export (key -> value) -Actual behavior: It crashes with the error above when user.preferences is None. +#### mask_secrets_in_output() -The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. -``` +Mask secret values in the given text. -**Why it works:** -- Specific file and line number -- Exact error message -- Clear expected vs. actual behavior -- Suggested approach for the fix +This method uses both the current exported values and attempts to get +fresh values from callables to ensure comprehensive masking. -### Feature Development Examples +* Parameters: + `text` – The text to mask secrets in +* Returns: + Text with secret values replaced by `` -#### Bad Example +#### model_config = (configuration object) -``` -Add user authentication to my app. -``` +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -**Why it's bad:** -- Scope is too large and undefined -- No details about authentication requirements -- No mention of existing code or patterns -- Could mean many different things +#### model_post_init() -#### Good Example +This function is meant to behave like a BaseModel method to initialise private attributes. -``` -Add email/password login to our Express.js API. +It takes context as an argument since that’s what pydantic-core passes when calling it. -Requirements: -1. POST /api/auth/login endpoint -2. Accept email and password in request body -3. Validate against users in PostgreSQL database -4. Return JWT token on success, 401 on failure -5. Use bcrypt for password comparison (already in dependencies) +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -Follow the existing patterns in src/api/routes.js for route structure. -Use the existing db.query() helper in src/db/index.js for database access. +#### update_secrets() -Success criteria: I can call the endpoint with valid credentials -and receive a JWT token that works with our existing auth middleware. -``` +Add or update secrets in the manager. -**Why it works:** -- Specific, scoped feature -- Clear technical requirements -- Points to existing patterns to follow -- Defines what "done" looks like +* Parameters: + `secrets` – Dictionary mapping secret keys to either string values + or callable functions that return string values -### Code Review Examples +### class StuckDetector -#### Bad Example +Bases: `object` -``` -Review my code. -``` +Detects when an agent is stuck in repetitive or unproductive patterns. -**Why it's bad:** -- No code provided or referenced -- No indication of what to look for -- No context about the code's purpose -- No criteria for the review +This detector analyzes the conversation history to identify various stuck patterns: +1. Repeating action-observation cycles +2. Repeating action-error cycles +3. Agent monologue (repeated messages without user input) +4. Repeating alternating action-observation patterns +5. Context window errors indicating memory issues -#### Good Example -``` -Review this pull request for our payment processing module: +#### Properties -Focus areas: -1. Security - we're handling credit card data -2. Error handling - payments must never silently fail -3. Idempotency - duplicate requests should be safe +- `action_error_threshold`: int +- `action_observation_threshold`: int +- `alternating_pattern_threshold`: int +- `monologue_threshold`: int +- `state`: [ConversationState](#class-conversationstate) +- `thresholds`: StuckDetectionThresholds -Context: -- This integrates with Stripe API -- It's called from our checkout flow -- We have ~10,000 transactions/day +#### Methods -Please flag any issues as Critical/Major/Minor with explanations. -``` +#### __init__() -**Why it works:** -- Clear scope and focus areas -- Important context provided -- Business implications explained -- Requested output format specified +#### is_stuck() -### Refactoring Examples +Check if the agent is currently stuck. -#### Bad Example +Note: To avoid materializing potentially large file-backed event histories, +only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. +If a user message exists within this window, only events after it are checked. +Otherwise, all events in the window are analyzed. -``` -Make the code better. -``` +#### __init__() -**Why it's bad:** -- "Better" is subjective and undefined -- No specific problems identified -- No goals for the refactoring -- No constraints or requirements +### openhands.sdk.event +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md -#### Good Example +### class ActionEvent -``` -Refactor the UserService class in src/services/user.js: +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -Problems to address: -1. The class is 500+ lines - split into smaller, focused services -2. Database queries are mixed with business logic - separate them -3. There's code duplication in the validation methods -Constraints: -- Keep the public API unchanged (other code depends on it) -- Maintain test coverage (run npm test after changes) -- Follow our existing service patterns in src/services/ +#### Properties -Goal: Improve maintainability while keeping the same functionality. -``` +- `action`: Action | None +- `critic_result`: CriticResult | None +- `llm_response_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str | None +- `responses_reasoning_item`: ReasoningItemModel | None +- `security_risk`: SecurityRisk +- `source`: Literal['agent', 'user', 'environment'] +- `summary`: str | None +- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] +- `thought`: Sequence[TextContent] +- `tool_call`: MessageToolCall +- `tool_call_id`: str +- `tool_name`: str +- `visualize`: Text + Return Rich Text representation of this action event. -**Why it works:** -- Specific problems identified -- Clear constraints and requirements -- Points to patterns to follow -- Measurable success criteria +#### Methods -## Key Principles for Effective Instructions +#### to_llm_message() -### Be Specific +Individual message - may be incomplete for multi-action batches -Vague instructions produce vague results. Be concrete about: +### class AgentErrorEvent -| Instead of... | Say... | -|---------------|--------| -| "Fix the error" | "Fix the TypeError on line 45 of api.py" | -| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | -| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | -| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -### Provide Context +Error triggered by the agent. -Help OpenHands understand the bigger picture: +Note: This event should not contain model “thought” or “reasoning_content”. It +represents an error produced by the agent/scaffold, not model output. -``` -Context to include: -- What does this code do? (purpose) -- Who uses it? (users/systems) -- Why does this matter? (business impact) -- What constraints exist? (performance, compatibility) -- What patterns should be followed? (existing conventions) -``` -**Example with context:** +#### Properties -``` -Add rate limiting to our public API endpoints. +- `error`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this agent error event. -Context: -- This is a REST API serving mobile apps and third-party integrations -- We've been seeing abuse from web scrapers hitting us 1000+ times/minute -- Our infrastructure can handle 100 req/sec per client sustainably -- We use Redis (already available in the project) -- Our API follows the controller pattern in src/controllers/ +#### Methods -Requirement: Limit each API key to 100 requests per minute with -appropriate 429 responses and Retry-After headers. -``` +#### to_llm_message() -### Set Clear Goals +### class Condensation -Define what success looks like: +Bases: [`Event`](#class-event) -``` -Success criteria checklist: -✓ What specific outcome do you want? -✓ How will you verify it worked? -✓ What tests should pass? -✓ What should the user experience be? -``` +This action indicates a condensation of the conversation history is happening. -**Example with clear goals:** -``` -Implement password reset functionality. +#### Properties -Success criteria: -1. User can request reset via POST /api/auth/forgot-password -2. System sends email with secure reset link -3. Link expires after 1 hour -4. User can set new password via POST /api/auth/reset-password -5. Old sessions are invalidated after password change -6. All edge cases return appropriate error messages -7. Existing tests still pass, new tests cover the feature -``` +- `forgotten_event_ids`: list[[EventID](#class-eventid)] +- `has_summary_metadata`: bool + Checks if both summary and summary_offset are present. +- `llm_response_id`: [EventID](#class-eventid) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str | None +- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) + Generates a CondensationSummaryEvent. + Since summary events are not part of the main event store and are generated + dynamically, this property ensures the created event has a unique and consistent + ID based on the condensation event’s ID. + * Raises: + `ValueError` – If no summary is present. +- `summary_offset`: int | None +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. -### Include Constraints +#### Methods -Specify what you can't or won't change: +#### apply() -``` -Constraints to specify: -- API compatibility (can't break existing clients) -- Technology restrictions (must use existing stack) -- Performance requirements (must respond in <100ms) -- Security requirements (must not log PII) -- Time/scope limits (just this one file) -``` +Applies the condensation to a list of events. -## Common Pitfalls to Avoid +This method removes events that are marked to be forgotten and returns a new +list of events. If the summary metadata is present (both summary and offset), +the corresponding CondensationSummaryEvent will be inserted at the specified +offset _after_ the forgotten events have been removed. -### Vague Requirements +### class CondensationRequest - - - ``` - Make the dashboard faster. - ``` - - - ``` - The dashboard takes 5 seconds to load. - - Profile it and optimize to load in under 1 second. - - Likely issues: - - N+1 queries in getWidgetData() - - Uncompressed images - - Missing database indexes - - Focus on the biggest wins first. - ``` - - +Bases: [`Event`](#class-event) -### Missing Context +This action is used to request a condensation of the conversation history. - - - ``` - Add caching to the API. - ``` - - - ``` - Add caching to the product catalog API. - - Context: - - 95% of requests are for the same 1000 products - - Product data changes only via admin panel (rare) - - We already have Redis running for sessions - - Current response time is 200ms, target is <50ms - - Cache strategy: Cache product data in Redis with 5-minute TTL, - invalidate on product update. - ``` - - -### Unrealistic Expectations +#### Properties - - - ``` - Rewrite our entire backend from PHP to Go. - ``` - - - ``` - Create a Go microservice for the image processing currently in - src/php/ImageProcessor.php. - - This is the first step in our gradual migration. - The Go service should: - 1. Expose the same API endpoints - 2. Be deployable alongside the existing PHP app - 3. Include a feature flag to route traffic - - Start with just the resize and crop functions. - ``` - - +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. -### Incomplete Information +#### Methods - - - ``` - The login is broken, fix it. - ``` - - - ``` - Users can't log in since yesterday's deployment. - - Symptoms: - - Login form submits but returns 500 error - - Server logs show: "Redis connection refused" - - Redis was moved to a new host yesterday - - The issue is likely in src/config/redis.js which may - have the old host hardcoded. - - Expected: Login should work with the new Redis at redis.internal:6380 - ``` - - +#### action -## Best Practices +The action type, namely ActionType.CONDENSATION_REQUEST. -### Structure Your Instructions +* Type: + str -Use clear structure for complex requests: +### class CondensationSummaryEvent -``` -## Task -[One sentence describing what you want] +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -## Background -[Context and why this matters] +This event represents a summary generated by a condenser. -## Requirements -1. [Specific requirement] -2. [Specific requirement] -3. [Specific requirement] -## Constraints -- [What you can't change] -- [What must be preserved] +#### Properties -## Success Criteria -- [How to verify it works] -``` +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str + The summary text. -### Provide Examples +#### Methods -Show what you want through examples: +#### to_llm_message() -``` -Add input validation to the user registration endpoint. +### class ConversationStateUpdateEvent -Example of what validation errors should look like: +Bases: [`Event`](#class-event) -{ - "error": "validation_failed", - "details": [ - {"field": "email", "message": "Invalid email format"}, - {"field": "password", "message": "Must be at least 8 characters"} - ] -} +Event that contains conversation state updates. -Validate: -- email: valid format, not already registered -- password: min 8 chars, at least 1 number -- username: 3-20 chars, alphanumeric only -``` +This event is sent via websocket whenever the conversation state changes, +allowing remote clients to stay in sync without making REST API calls. -### Define Success Criteria +All fields are serialized versions of the corresponding ConversationState fields +to ensure compatibility with websocket transmission. -Be explicit about what "done" means: -``` -This task is complete when: -1. All existing tests pass (npm test) -2. New tests cover the added functionality -3. The feature works as described in the acceptance criteria -4. Code follows our style guide (npm run lint passes) -5. Documentation is updated if needed -``` +#### Properties -### Iterate and Refine +- `key`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `value`: Any -Build on previous work: +#### Methods -``` -In our last session, you added the login endpoint. +#### classmethod from_conversation_state() -Now add the logout functionality: -1. POST /api/auth/logout endpoint -2. Invalidate the current session token -3. Clear any server-side session data -4. Follow the same patterns used in login +Create a state update event from a ConversationState object. -The login implementation is in src/api/auth/login.js for reference. -``` +This creates an event containing a snapshot of important state fields. -## Quick Reference +* Parameters: + * `state` – The ConversationState to serialize + * `conversation_id` – The conversation ID for the event +* Returns: + A ConversationStateUpdateEvent with serialized state data -| Element | Bad | Good | -|---------|-----|------| -| Location | "in the code" | "in src/api/users.py line 45" | -| Problem | "it's broken" | "TypeError when user.preferences is None" | -| Scope | "add authentication" | "add JWT-based login endpoint" | -| Behavior | "make it work" | "return 200 with user data on success" | -| Patterns | (none) | "follow patterns in src/services/" | -| Success | (none) | "all tests pass, endpoint returns correct data" | +#### classmethod validate_key() - -The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. - +#### classmethod validate_value() -### OpenHands in Your SDLC -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md +### class Event -OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. +Bases: `DiscriminatedUnionMixin`, `ABC` -## Integration with Development Workflows +Base class for all events. -### Planning Phase -Use OpenHands during planning to accelerate technical decisions: +#### Properties -**Technical specification assistance:** -``` -Create a technical specification for adding search functionality: +- `id`: str +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `timestamp`: str +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. +### class LLMCompletionLogEvent -Requirements from product: -- Full-text search across products and articles -- Filter by category, price range, and date -- Sub-200ms response time at 1000 QPS +Bases: [`Event`](#class-event) -Provide: -1. Architecture options (Elasticsearch vs. PostgreSQL full-text) -2. Data model changes needed -3. API endpoint designs -4. Estimated implementation effort -5. Risks and mitigations -``` +Event containing LLM completion log data. -**Sprint planning support:** -``` -Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: +When an LLM is configured with log_completions=True in a remote conversation, +this event streams the completion log data back to the client through WebSocket +instead of writing it to a file inside the Docker container. -Story 1: As a user, I can reset my password via email -Story 2: As an admin, I can view user activity logs -For each story, create: -- Technical subtasks -- Estimated effort (hours) -- Dependencies on other work -- Testing requirements -``` +#### Properties -### Development Phase +- `filename`: str +- `log_data`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_name`: str +- `source`: Literal['agent', 'user', 'environment'] +- `usage_id`: str +### class LLMConvertibleEvent -OpenHands excels during active development: +Bases: [`Event`](#class-event), `ABC` -**Feature implementation:** -- Write new features with clear specifications -- Follow existing code patterns automatically -- Generate tests alongside code -- Create documentation as you go +Base class for events that can be converted to LLM messages. -**Bug fixing:** -- Analyze error logs and stack traces -- Identify root causes -- Implement fixes with regression tests -- Document the issue and solution -**Code improvement:** -- Refactor for clarity and maintainability -- Optimize performance bottlenecks -- Update deprecated APIs -- Improve error handling +#### Properties -### Testing Phase +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Automate test creation and improvement: +#### Methods -``` -Add comprehensive tests for the UserService module: +#### static events_to_messages() -Current coverage: 45% -Target coverage: 85% +Convert event stream to LLM message stream, handling multi-action batches -1. Analyze uncovered code paths using the codecov module -2. Write unit tests for edge cases -3. Add integration tests for API endpoints -4. Create test data factories -5. Document test scenarios +#### abstractmethod to_llm_message() -Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). -``` +### class MessageEvent -### Review Phase +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -Accelerate code reviews: +Message from either agent or user. -``` -Review this PR for our coding standards: +This is originally the “MessageAction”, but it suppose not to be tool call. -Check for: -1. Security issues (SQL injection, XSS, etc.) -2. Performance concerns -3. Test coverage adequacy -4. Documentation completeness -5. Adherence to our style guide -Provide actionable feedback with severity ratings. -``` +#### Properties -### Deployment Phase +- `activated_skills`: list[str] +- `critic_result`: CriticResult | None +- `extended_content`: list[TextContent] +- `llm_message`: Message +- `llm_response_id`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str +- `sender`: str | None +- `source`: Literal['agent', 'user', 'environment'] +- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] + Return the Anthropic thinking blocks from the LLM message. +- `visualize`: Text + Return Rich Text representation of this message event. -Assist with deployment preparation: +#### Methods -``` -Prepare for production deployment: +#### to_llm_message() -1. Review all changes since last release -2. Check for breaking API changes -3. Verify database migrations are reversible -4. Update the changelog -5. Create release notes -6. Identify rollback steps if needed -``` +### class ObservationBaseEvent -## CI/CD Integration +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. +Base class for anything as a response to a tool call. -### GitHub Actions Integration +Examples include tool execution, error, user reject. -The Software Agent SDK provides composite GitHub Actions for common workflows: -- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments -- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK +#### Properties -For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `tool_call_id`: str +- `tool_name`: str +### class ObservationEvent -### What You Can Automate +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -Using the SDK, you can create GitHub Actions workflows to: -1. **Automatic code review** when a PR is opened -2. **Automatically update docs** weekly when new functionality is added -3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements -4. **Manage TODO comments** and track technical debt -5. **Assign reviewers** based on code ownership patterns +#### Properties -### Getting Started +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `observation`: Observation +- `visualize`: Text + Return Rich Text representation of this observation event. -To integrate OpenHands into your CI/CD: +#### Methods -1. Review the [SDK Getting Started guide](/sdk/getting-started) -2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) -3. Set up your `LLM_API_KEY` as a repository secret -4. Use the provided composite actions or build custom workflows +#### to_llm_message() -See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. +### class PauseEvent -## Team Workflows +Bases: [`Event`](#class-event) -### Solo Developer Workflows +Event indicating that the agent execution was paused by user request. -For individual developers: -**Daily workflow:** -1. **Morning review**: Have OpenHands analyze overnight CI results -2. **Feature development**: Use OpenHands for implementation -3. **Pre-commit**: Request review before pushing -4. **Documentation**: Generate/update docs for changes +#### Properties -**Best practices:** -- Set up automated reviews on all PRs -- Use OpenHands for boilerplate and repetitive tasks -- Keep AGENTS.md updated with project patterns +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this pause event. +### class SystemPromptEvent -### Small Team Workflows +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) -For teams of 2-10 developers: +System prompt added by the agent. -**Collaborative workflow:** -``` -Team Member A: Creates feature branch, writes initial implementation -OpenHands: Reviews code, suggests improvements -Team Member B: Reviews OpenHands suggestions, approves or modifies -OpenHands: Updates documentation, adds missing tests -Team: Merges after final human review -``` +The system prompt can optionally include dynamic context that varies between +conversations. When `dynamic_context` is provided, it is included as a +second content block in the same system message. Cache markers are NOT +applied here - they are applied by `LLM._apply_prompt_caching()` when +caching is enabled, ensuring provider-specific cache control is only added +when appropriate. -**Communication integration:** -- Slack notifications for OpenHands findings -- Automatic issue creation for bugs found -- Weekly summary reports -### Enterprise Team Workflows +#### Properties -For larger organizations: +- `dynamic_context`: TextContent | None +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `system_prompt`: TextContent +- `tools`: list[ToolDefinition] +- `visualize`: Text + Return Rich Text representation of this system prompt event. -**Governance and oversight:** -- Configure approval requirements for OpenHands changes -- Set up audit logging for all AI-assisted changes -- Define scope limits for automated actions -- Establish human review requirements +#### Methods -**Scale patterns:** -``` -Central Platform Team: -├── Defines OpenHands policies -├── Manages integrations -└── Monitors usage and quality +#### system_prompt -Feature Teams: -├── Use OpenHands within policies -├── Customize for team needs -└── Report issues to platform team -``` +The static system prompt text (cacheable across conversations) -## Best Practices +* Type: + openhands.sdk.llm.message.TextContent -### Code Review Integration +#### tools -Set up effective automated reviews: +List of available tools -```yaml -# .openhands/review-config.yml -review: - focus_areas: - - security - - performance - - test_coverage - - documentation - - severity_levels: - block_merge: - - critical - - security - require_response: - - major - informational: - - minor - - suggestion - - ignore_patterns: - - "*.generated.*" - - "vendor/*" -``` +* Type: + list[openhands.sdk.tool.tool.ToolDefinition] -### Pull Request Automation +#### dynamic_context -Automate common PR tasks: +Optional per-conversation context (hosts, repo info, etc.) +Sent as a second TextContent block inside the system message. -| Trigger | Action | -|---------|--------| -| PR opened | Auto-review, label by type | -| Tests fail | Analyze failures, suggest fixes | -| Coverage drops | Identify missing tests | -| PR approved | Update changelog, check docs | +* Type: + openhands.sdk.llm.message.TextContent | None -### Quality Gates +#### to_llm_message() -Define automated quality gates: +Convert to a single system LLM message. -```yaml -quality_gates: - - name: test_coverage - threshold: 80% - action: block_merge - - - name: security_issues - threshold: 0 critical - action: block_merge - - - name: code_review_score - threshold: 7/10 - action: require_review - - - name: documentation - requirement: all_public_apis - action: warn -``` +When `dynamic_context` is present the message contains two content +blocks: the static prompt followed by the dynamic context. Cache markers +are NOT applied here - they are applied by `LLM._apply_prompt_caching()` +when caching is enabled, which marks the static block (index 0) and leaves +the dynamic block (index 1) unmarked for cross-conversation cache sharing. -### Automated Testing +### class TokenEvent -Integrate OpenHands with your testing strategy: +Bases: [`Event`](#class-event) -**Test generation triggers:** -- New code without tests -- Coverage below threshold -- Bug fix without regression test -- API changes without contract tests +Event from VLLM representing token IDs used in LLM interaction. -**Example workflow:** -```yaml -on: - push: - branches: [main] -jobs: - ensure-coverage: - steps: - - name: Check coverage - run: | - COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') - if [ "$COVERAGE" -lt "80" ]; then - openhands generate-tests --target 80 - fi -``` +#### Properties -## Common Integration Patterns +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `prompt_token_ids`: list[int] +- `response_token_ids`: list[int] +- `source`: Literal['agent', 'user', 'environment'] +### class UserRejectObservation -### Pre-Commit Hooks +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -Run OpenHands checks before commits: +Observation when an action is rejected by user or hook. -```bash -# .git/hooks/pre-commit -#!/bin/bash +This event is emitted when: +- User rejects an action during confirmation mode (rejection_source=”user”) +- A PreToolUse hook blocks an action (rejection_source=”hook”) -# Quick code review -openhands review --quick --staged-only -if [ $? -ne 0 ]; then - echo "OpenHands found issues. Review and fix before committing." - exit 1 -fi -``` +#### Properties -### Post-Commit Actions +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `rejection_reason`: str +- `rejection_source`: Literal['user', 'hook'] +- `visualize`: Text + Return Rich Text representation of this user rejection event. -Automate tasks after commits: +#### Methods -```yaml -# .github/workflows/post-commit.yml -on: - push: - branches: [main] +#### to_llm_message() -jobs: - update-docs: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - name: Update API docs - run: openhands update-docs --api - - name: Commit changes - run: | - git add docs/ - git commit -m "docs: auto-update API documentation" || true - git push -``` +### openhands.sdk.llm +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md -### Scheduled Tasks +### class CredentialStore -Run regular maintenance: +Bases: `object` -```yaml -# Weekly dependency check -on: - schedule: - - cron: '0 9 * * 1' # Monday 9am +Store and retrieve OAuth credentials for LLM providers. -jobs: - dependency-review: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - name: Check dependencies - run: | - openhands check-dependencies --security --outdated - - name: Create issues - run: openhands create-issues --from-report deps.json -``` -### Event-Triggered Workflows +#### Properties -You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. +- `credentials_dir`: Path + Get the credentials directory, creating it if necessary. -For more event-driven automation patterns, see: -- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events -- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage +#### Methods -### When to Use OpenHands -Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md +#### __init__() -OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. +Initialize the credential store. -## Task Complexity Guidance +* Parameters: + `credentials_dir` – Optional custom directory for storing credentials. + Defaults to ~/.local/share/openhands/auth/ -### Simple Tasks +#### delete() -**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. +Delete stored credentials for a vendor. -- Adding a new function or method -- Writing unit tests for existing code -- Fixing simple bugs with clear error messages -- Code formatting and style fixes -- Adding documentation or comments -- Simple refactoring (rename, extract method) -- Configuration changes +* Parameters: + `vendor` – The vendor/provider name +* Returns: + True if credentials were deleted, False if they didn’t exist -**Example prompt:** -``` -Add a calculateDiscount() function to src/utils/pricing.js that takes -a price and discount percentage, returns the discounted price. -Add unit tests. -``` +#### get() -### Medium Complexity Tasks +Get stored credentials for a vendor. -**Good for OpenHands** — These tasks may need more context and possibly some iteration. +* Parameters: + `vendor` – The vendor/provider name (e.g., ‘openai’) +* Returns: + OAuthCredentials if found and valid, None otherwise -- Implementing a new API endpoint -- Adding a feature to an existing module -- Debugging issues that span multiple files -- Migrating code to a new pattern -- Writing integration tests -- Performance optimization with clear metrics -- Setting up CI/CD workflows +#### save() -**Example prompt:** -``` -Add a user profile endpoint to our API: -- GET /api/users/:id/profile -- Return user data with their recent activity -- Follow patterns in existing controllers -- Add integration tests -- Handle not-found and unauthorized cases -``` +Save credentials for a vendor. -### Complex Tasks +* Parameters: + `credentials` – The OAuth credentials to save -**May require iteration** — These benefit from breaking down into smaller pieces. +#### update_tokens() -- Large refactoring across many files -- Architectural changes -- Implementing complex business logic -- Multi-service integrations -- Performance optimization without clear cause -- Security audits -- Framework or major dependency upgrades +Update tokens for an existing credential. -**Recommended approach:** -``` -Break large tasks into phases: +* Parameters: + * `vendor` – The vendor/provider name + * `access_token` – New access token + * `refresh_token` – New refresh token (if provided) + * `expires_in` – Token expiry in seconds +* Returns: + Updated credentials, or None if no existing credentials found -Phase 1: "Analyze the current authentication system and document -all touch points that need to change for OAuth2 migration." +### class ImageContent -Phase 2: "Implement the OAuth2 provider configuration and basic -token flow, keeping existing auth working in parallel." +Bases: `BaseContent` -Phase 3: "Migrate the user login flow to use OAuth2, maintaining -backwards compatibility." -``` -## Best Use Cases +#### Properties -### Ideal Scenarios +- `image_urls`: list[str] +- `type`: Literal['image'] -OpenHands is **most effective** when: +#### Methods -| Scenario | Why It Works | -|----------|--------------| -| Clear requirements | OpenHands can work independently | -| Well-defined scope | Less ambiguity, fewer iterations | -| Existing patterns to follow | Consistency with codebase | -| Good test coverage | Easy to verify changes | -| Isolated changes | Lower risk of side effects | +#### model_config = (configuration object) -**Perfect use cases:** +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- **Bug fixes with reproduction steps**: Clear problem, measurable solution -- **Test additions**: Existing code provides the specification -- **Documentation**: Code is the source of truth -- **Boilerplate generation**: Follows established patterns -- **Code review and analysis**: Read-only, analytical tasks +#### to_llm_dict() -### Good Fit Scenarios +Convert to LLM API format. -OpenHands works **well with some guidance** for: +### class LLM -- **Feature implementation**: When requirements are documented -- **Refactoring**: When goals and constraints are clear -- **Debugging**: When you can provide logs and context -- **Code modernization**: When patterns are established -- **API development**: When specs exist +Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` -**Tips for these scenarios:** +Language model interface for OpenHands agents. -1. Provide clear acceptance criteria -2. Point to examples of similar work in the codebase -3. Specify constraints and non-goals -4. Be ready to iterate and clarify +The LLM class provides a unified interface for interacting with various +language models through the litellm library. It handles model configuration, +API authentication, +retry logic, and tool calling capabilities. -### Poor Fit Scenarios +#### Example -**Consider alternatives** when: +```pycon +>>> from openhands.sdk import LLM +>>> from pydantic import SecretStr +>>> llm = LLM( +... model="claude-sonnet-4-20250514", +... api_key=SecretStr("your-api-key"), +... usage_id="my-agent" +... ) +>>> # Use with agent or conversation +``` -| Scenario | Challenge | Alternative | -|----------|-----------|-------------| -| Vague requirements | Unclear what "done" means | Define requirements first | -| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | -| Highly sensitive code | Risk tolerance is zero | Human review essential | -| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | -| Visual design | Subjective aesthetic judgments | Use design tools | -**Red flags that a task may not be suitable:** +#### Properties -- "Make it look better" (subjective) -- "Figure out what's wrong" (too vague) -- "Rewrite everything" (too large) -- "Do what makes sense" (unclear requirements) -- Changes to production infrastructure without review +- `api_key`: str | SecretStr | None +- `api_version`: str | None +- `aws_access_key_id`: str | SecretStr | None +- `aws_region_name`: str | None +- `aws_secret_access_key`: str | SecretStr | None +- `base_url`: str | None +- `caching_prompt`: bool +- `custom_tokenizer`: str | None +- `disable_stop_word`: bool | None +- `disable_vision`: bool | None +- `drop_params`: bool +- `enable_encrypted_reasoning`: bool +- `extended_thinking_budget`: int | None +- `extra_headers`: dict[str, str] | None +- `force_string_serializer`: bool | None +- `input_cost_per_token`: float | None +- `is_subscription`: bool + Check if this LLM uses subscription-based authentication. + Returns True when the LLM was created via LLM.subscription_login(), + which uses the ChatGPT subscription Codex backend rather than the + standard OpenAI API. + * Returns: + True if using subscription-based transport, False otherwise. + * Return type: + bool +- `litellm_extra_body`: dict[str, Any] +- `log_completions`: bool +- `log_completions_folder`: str +- `max_input_tokens`: int | None +- `max_message_chars`: int +- `max_output_tokens`: int | None +- `metrics`: [Metrics](#class-metrics) + Get usage metrics for this LLM instance. + * Returns: + Metrics object containing token usage, costs, and other statistics. +- `model`: str +- `model_canonical_name`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_info`: dict | None + Returns the model info dictionary. +- `modify_params`: bool +- `native_tool_calling`: bool +- `num_retries`: int +- `ollama_base_url`: str | None +- `openrouter_app_name`: str +- `openrouter_site_url`: str +- `output_cost_per_token`: float | None +- `prompt_cache_retention`: str | None +- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None +- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None +- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] +- `retry_max_wait`: int +- `retry_min_wait`: int +- `retry_multiplier`: float +- `safety_settings`: list[dict[str, str]] | None +- `seed`: int | None +- `stream`: bool +- `telemetry`: Telemetry + Get telemetry handler for this LLM instance. + * Returns: + Telemetry object for managing logging and metrics callbacks. +- `temperature`: float | None +- `timeout`: int | None +- `top_k`: float | None +- `top_p`: float | None +- `usage_id`: str -## Limitations +#### Methods -### Current Limitations +#### completion() -Be aware of these constraints: +Generate a completion from the language model. -- **Long-running processes**: Sessions have time limits -- **Interactive debugging**: Can't set breakpoints interactively -- **Visual verification**: Can't see rendered UI easily -- **External system access**: May need credentials configured -- **Large codebase analysis**: Memory and time constraints +This is the method for getting responses from the model via Completion API. +It handles message formatting, tool calling, and response processing. -### Technical Constraints +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API +* Returns: + LLMResponse containing the model’s response and metadata. -| Constraint | Impact | Workaround | -|------------|--------|------------| -| Session duration | Very long tasks may timeout | Break into smaller tasks | -| Context window | Can't see entire large codebase at once | Focus on relevant files | -| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | -| Network access | Some external services may be blocked | Use local resources when possible | +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -### Scope Boundaries +* Raises: + `ValueError` – If streaming is requested (not supported). -OpenHands works within your codebase but has boundaries: +#### format_messages_for_llm() -**Can do:** -- Read and write files in the repository -- Run tests and commands -- Access configured services and APIs -- Browse documentation and reference material +Formats Message objects for LLM consumption. -**Cannot do:** -- Access your local environment outside the sandbox -- Make decisions requiring business context it doesn't have -- Replace human judgment for critical decisions -- Guarantee production-safe changes without review +#### format_messages_for_responses() -## Pre-Task Checklist +Prepare (instructions, input[]) for the OpenAI Responses API. -### Prerequisites +- Skips prompt caching flags and string serializer concerns +- Uses Message.to_responses_value to get either instructions (system) + or input items (others) +- Concatenates system instructions into a single instructions string +- For subscription mode, system prompts are prepended to user content -Before starting a task, ensure: +#### get_token_count() -- [ ] Clear description of what you want -- [ ] Expected outcome is defined -- [ ] Relevant files are identified -- [ ] Dependencies are available -- [ ] Tests can be run +#### is_caching_prompt_active() -### Environment Setup +Check if prompt caching is supported and enabled for current model. -Prepare your repository: +* Returns: + True if prompt caching is supported and enabled for the given + : model. +* Return type: + boolean -```markdown -## AGENTS.md Checklist +#### classmethod load_from_env() -- [ ] Build commands documented -- [ ] Test commands documented -- [ ] Code style guidelines noted -- [ ] Architecture overview included -- [ ] Common patterns described -``` +#### classmethod load_from_json() -See [Repository Setup](/openhands/usage/customization/repository) for details. +#### model_post_init() -### Repository Preparation +This function is meant to behave like a BaseModel method to initialise private attributes. -Optimize for success: +It takes context as an argument since that’s what pydantic-core passes when calling it. -1. **Clean state**: Commit or stash uncommitted changes -2. **Working build**: Ensure the project builds -3. **Passing tests**: Start from a green state -4. **Updated dependencies**: Resolve any dependency issues -5. **Clear documentation**: Update AGENTS.md if needed +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -## Post-Task Review +#### reset_metrics() -### Quality Checks +Reset metrics and telemetry to fresh instances. -After OpenHands completes a task: +This is used by the LLMRegistry to ensure each registered LLM has +independent metrics, preventing metrics from being shared between +LLMs that were created via model_copy(). -- [ ] Review all changed files -- [ ] Understand each change made -- [ ] Check for unintended modifications -- [ ] Verify code style consistency -- [ ] Look for hardcoded values or credentials +When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), +Pydantic’s model_copy() does a shallow copy of private attributes by default, +causing the original and copied LLM to share the same Metrics object. +This method allows the registry to fix this by resetting metrics to None, +which will be lazily recreated when accessed. -### Validation Steps +#### responses() -1. **Run tests**: `npm test`, `pytest`, etc. -2. **Check linting**: Ensure style compliance -3. **Build the project**: Verify it still compiles -4. **Manual testing**: Test the feature yourself -5. **Edge cases**: Try unusual inputs +Alternative invocation path using OpenAI Responses API via LiteLLM. -### Learning from Results +Maps Message[] -> (instructions, input[]) and returns LLMResponse. -After each significant task: +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `include` – Optional list of fields to include in response + * `store` – Whether to store the conversation + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming deltas + kwargs* – Additional arguments passed to the API -**What went well?** -- Note effective prompt patterns -- Document successful approaches -- Update AGENTS.md with learnings +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -**What could improve?** -- Identify unclear instructions -- Note missing context -- Plan better for next time +#### restore_metrics() -**Update your repository:** -```markdown -## Things OpenHands Should Know (add to AGENTS.md) +#### classmethod subscription_login() -- When adding API endpoints, always add to routes/index.js -- Our date format is ISO 8601 everywhere -- All database queries go through the repository pattern -``` +Authenticate with a subscription service and return an LLM instance. -## Decision Framework +This method provides subscription-based access to LLM models that are +available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather +than API credits. It handles credential caching, token refresh, and +the OAuth login flow. -Use this framework to decide if a task is right for OpenHands: +Currently supported vendors: +- “openai”: ChatGPT Plus/Pro subscription for Codex models -``` -Is the task well-defined? -├── No → Define it better first -└── Yes → Continue +Supported OpenAI models: +- gpt-5.1-codex-max +- gpt-5.1-codex-mini +- gpt-5.2 +- gpt-5.2-codex -Do you have clear success criteria? -├── No → Define acceptance criteria -└── Yes → Continue +* Parameters: + * `vendor` – The vendor/provider. Currently only “openai” is supported. + * `model` – The model to use. Must be supported by the vendor’s + subscription service. + * `force_login` – If True, always perform a fresh login even if valid + credentials exist. + * `open_browser` – Whether to automatically open the browser for the + OAuth login flow. + llm_kwargs* – Additional arguments to pass to the LLM constructor. +* Returns: + An LLM instance configured for subscription-based access. +* Raises: + * `ValueError` – If the vendor or model is not supported. + * `RuntimeError` – If authentication fails. -Is the scope manageable (< 100 LOC)? -├── No → Break into smaller tasks -└── Yes → Continue +#### uses_responses_api() -Do examples exist in the codebase? -├── No → Provide examples or patterns -└── Yes → Continue +Whether this model uses the OpenAI Responses API path. -Can you verify the result? -├── No → Add tests or verification steps -└── Yes → ✅ Good candidate for OpenHands -``` +#### vision_is_active() -OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! +### class LLMProfileStore -But it can be particularly useful for certain types of tasks. For instance: +Bases: `object` -- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. -- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. -- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. -- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. -- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. +Standalone utility for persisting LLM configurations. -There are also some tasks where agent struggle a little more. +#### Methods -- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. -- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. +#### __init__() -### Tutorial Library -Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials.md +Initialize the profile store. -Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. +* Parameters: + `base_dir` – Path to the directory where the profiles are stored. + If None is provided, the default directory is used, i.e., + ~/.openhands/profiles. -## Categories Overview +#### delete() -| Category | Best For | Complexity | -|----------|----------|------------| -| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | -| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | -| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | -| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | -| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | -| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | +Delete an existing profile. - -For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. - +If the profile is not present in the profile directory, it does nothing. -## Task Complexity Guidance +* Parameters: + `name` – Name of the profile to delete. +* Raises: + `TimeoutError` – If the lock cannot be acquired. -Before starting, assess your task's complexity: +#### list() -**Simple tasks** (5-15 minutes): -- Single file changes -- Clear, well-defined requirements -- Existing patterns to follow +Returns a list of all profiles stored. -**Medium tasks** (15-45 minutes): -- Multiple file changes -- Some discovery required -- Integration with existing code +* Returns: + List of profile filenames (e.g., [“default.json”, “gpt4.json”]). -**Complex tasks** (45+ minutes): -- Architectural changes -- Multiple components -- Requires iteration +#### load() - -Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. - +Load an LLM instance from the given profile name. -## Best Use Cases +* Parameters: + `name` – Name of the profile to load. +* Returns: + An LLM instance constructed from the profile configuration. +* Raises: + * `FileNotFoundError` – If the profile name does not exist. + * `ValueError` – If the profile file is corrupted or invalid. + * `TimeoutError` – If the lock cannot be acquired. -OpenHands excels at: +#### save() -- **Repetitive tasks**: Boilerplate code, test generation -- **Pattern application**: Following established conventions -- **Analysis**: Code review, debugging, documentation -- **Exploration**: Understanding new codebases +Save a profile to the profile directory. -## Example Tutorials by Category +Note that if a profile name already exists, it will be overwritten. -### Testing +* Parameters: + * `name` – Name of the profile to save. + * `llm` – LLM instance to save + * `include_secrets` – Whether to include the profile secrets. Defaults to False. +* Raises: + `TimeoutError` – If the lock cannot be acquired. -#### Tutorial: Add Unit Tests for a Module +### class LLMRegistry -**Goal**: Achieve 80%+ test coverage for a service module +Bases: `object` -**Prompt**: -``` -Add unit tests for the UserService class in src/services/user.js. +A minimal LLM registry for managing LLM instances by usage ID. -Current coverage: 35% -Target coverage: 80% +This registry provides a simple way to manage multiple LLM instances, +avoiding the need to recreate LLMs with the same configuration. -Requirements: -1. Test all public methods -2. Cover edge cases (null inputs, empty arrays, etc.) -3. Mock external dependencies (database, API calls) -4. Follow our existing test patterns in tests/services/ -5. Use Jest as the testing framework +The registry also ensures that each registered LLM has independent metrics, +preventing metrics from being shared between LLMs that were created via +model_copy(). This is important for scenarios like creating a condenser LLM +from an agent LLM, where each should track its own usage independently. -Focus on these methods: -- createUser() -- updateUser() -- deleteUser() -- getUserById() -``` -**What OpenHands does**: -1. Analyzes the UserService class -2. Identifies untested code paths -3. Creates test file with comprehensive tests -4. Mocks dependencies appropriately -5. Runs tests to verify they pass +#### Properties -**Tips**: -- Provide existing test files as examples -- Specify the testing framework -- Mention any mocking conventions +- `registry_id`: str +- `retry_listener`: Callable[[int, int], None] | None +- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None +- `usage_to_llm`: MappingProxyType + Access the internal usage-ID-to-LLM mapping (read-only view). ---- +#### Methods -#### Tutorial: Add Integration Tests for an API +#### __init__() -**Goal**: Test API endpoints end-to-end +Initialize the LLM registry. -**Prompt**: -``` -Add integration tests for the /api/products endpoints. +* Parameters: + `retry_listener` – Optional callback for retry events. -Endpoints to test: -- GET /api/products (list all) -- GET /api/products/:id (get one) -- POST /api/products (create) -- PUT /api/products/:id (update) -- DELETE /api/products/:id (delete) +#### add() -Requirements: -1. Use our test database (configured in jest.config.js) -2. Set up and tear down test data properly -3. Test success cases and error cases -4. Verify response bodies and status codes -5. Follow patterns in tests/integration/ -``` +Add an LLM instance to the registry. ---- +This method ensures that the LLM has independent metrics before +registering it. If the LLM’s metrics are shared with another +registered LLM (e.g., due to model_copy()), fresh metrics will +be created automatically. -### Data Analysis +* Parameters: + `llm` – The LLM instance to register. +* Raises: + `ValueError` – If llm.usage_id already exists in the registry. -#### Tutorial: Create a Data Processing Script +#### get() -**Goal**: Process CSV data and generate a report +Get an LLM instance from the registry. -**Prompt**: -``` -Create a Python script to analyze our sales data. +* Parameters: + `usage_id` – Unique identifier for the LLM usage slot. +* Returns: + The LLM instance. +* Raises: + `KeyError` – If usage_id is not found in the registry. -Input: sales_data.csv with columns: date, product, quantity, price, region +#### list_usage_ids() -Requirements: -1. Load and validate the CSV data -2. Calculate: - - Total revenue by product - - Monthly sales trends - - Top 5 products by quantity - - Revenue by region -3. Generate a summary report (Markdown format) -4. Create visualizations (bar chart for top products, line chart for trends) -5. Save results to reports/ directory +List all registered usage IDs. -Use pandas for data processing and matplotlib for charts. -``` +#### notify() -**What OpenHands does**: -1. Creates a Python script with proper structure -2. Implements data loading with validation -3. Calculates requested metrics -4. Generates formatted report -5. Creates and saves visualizations +Notify subscribers of registry events. ---- +* Parameters: + `event` – The registry event to notify about. -#### Tutorial: Database Query Analysis +#### subscribe() -**Goal**: Analyze and optimize slow database queries +Subscribe to registry events. -**Prompt**: -``` -Analyze our slow query log and identify optimization opportunities. +* Parameters: + `callback` – Function to call when LLMs are created or updated. -File: logs/slow_queries.log +### class LLMResponse -For each slow query: -1. Explain why it's slow -2. Suggest index additions if helpful -3. Rewrite the query if it can be optimized -4. Estimate the improvement +Bases: `BaseModel` -Create a report in reports/query_optimization.md with: -- Summary of findings -- Prioritized recommendations -- SQL for suggested changes -``` +Result of an LLM completion request. ---- +This type provides a clean interface for LLM completion results, exposing +only OpenHands-native types to consumers while preserving access to the +raw LiteLLM response for internal use. -### Web Scraping -#### Tutorial: Build a Web Scraper +#### Properties -**Goal**: Extract product data from a website +- `id`: str + Get the response ID from the underlying LLM response. + This property provides a clean interface to access the response ID, + supporting both completion mode (ModelResponse) and response API modes + (ResponsesAPIResponse). + * Returns: + The response ID from the LLM response +- `message`: [Message](#class-message) +- `metrics`: [MetricsSnapshot](#class-metricssnapshot) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `raw_response`: ModelResponse | ResponsesAPIResponse -**Prompt**: -``` -Create a web scraper to extract product information from our competitor's site. +#### Methods -Target URL: https://example-store.com/products +#### message -Extract for each product: -- Name -- Price -- Description -- Image URL -- SKU (if available) +The completion message converted to OpenHands Message type -Requirements: -1. Use Python with BeautifulSoup or Scrapy -2. Handle pagination (site has 50 pages) -3. Respect rate limits (1 request/second) -4. Save results to products.json -5. Handle errors gracefully -6. Log progress to console +* Type: + [openhands.sdk.llm.message.Message](#class-message) -Include a README with usage instructions. -``` +#### metrics -**Tips**: -- Specify rate limiting requirements -- Mention error handling expectations -- Request logging for debugging +Snapshot of metrics from the completion request ---- +* Type: + [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) -### Code Review +#### raw_response - -For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). - +The original LiteLLM response (ModelResponse or +ResponsesAPIResponse) for internal use -#### Tutorial: Security-Focused Code Review +* Type: + litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse -**Goal**: Identify security vulnerabilities in a PR +### class Message -**Prompt**: -``` -Review this pull request for security issues: +Bases: `BaseModel` -Focus areas: -1. Input validation - check all user inputs are sanitized -2. Authentication - verify auth checks are in place -3. SQL injection - check for parameterized queries -4. XSS - verify output encoding -5. Sensitive data - ensure no secrets in code -For each issue found, provide: -- File and line number -- Severity (Critical/High/Medium/Low) -- Description of the vulnerability -- Suggested fix with code example +#### Properties -Output format: Markdown suitable for PR comments -``` +- `contains_image`: bool +- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str | None +- `reasoning_content`: str | None +- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None +- `role`: Literal['user', 'system', 'assistant', 'tool'] +- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] +- `tool_call_id`: str | None +- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None ---- +#### Methods -#### Tutorial: Performance Review +#### classmethod from_llm_chat_message() -**Goal**: Identify performance issues in code +Convert a LiteLLMMessage (Chat Completions) to our Message class. -**Prompt**: -``` -Review the OrderService class for performance issues. +Provider-agnostic mapping for reasoning: +- Prefer message.reasoning_content if present (LiteLLM normalized field) +- Extract thinking_blocks from content array (Anthropic-specific) -File: src/services/order.js +#### classmethod from_llm_responses_output() -Check for: -1. N+1 database queries -2. Missing indexes (based on query patterns) -3. Inefficient loops or algorithms -4. Missing caching opportunities -5. Unnecessary data fetching +Convert OpenAI Responses API output items into a single assistant Message. -For each issue: -- Explain the impact -- Show the problematic code -- Provide an optimized version -- Estimate the improvement -``` +Policy (non-stream): +- Collect assistant text by concatenating output_text parts from message items +- Normalize function_call items to MessageToolCall list ---- +#### to_chat_dict() -### Bug Fixing +Serialize message for OpenAI Chat Completions. - -For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. - +* Parameters: + * `cache_enabled` – Whether prompt caching is active. + * `vision_enabled` – Whether vision/image processing is enabled. + * `function_calling_enabled` – Whether native function calling is enabled. + * `force_string_serializer` – Force string serializer instead of list format. + * `send_reasoning_content` – Whether to include reasoning_content in output. -#### Tutorial: Fix a Crash Bug +Chooses the appropriate content serializer and then injects threading keys: +- Assistant tool call turn: role == “assistant” and self.tool_calls +- Tool result turn: role == “tool” and self.tool_call_id (with name) -**Goal**: Diagnose and fix an application crash +#### to_responses_dict() -**Prompt**: -``` -Fix the crash in the checkout process. +Serialize message for OpenAI Responses (input parameter). -Error: -TypeError: Cannot read property 'price' of undefined - at calculateTotal (src/checkout/calculator.js:45) - at processOrder (src/checkout/processor.js:23) +Produces a list of “input” items for the Responses API: +- system: returns [], system content is expected in ‘instructions’ +- user: one ‘message’ item with content parts -> input_text / input_image +(when vision enabled) +- assistant: emits prior assistant content as input_text, +and function_call items for tool_calls +- tool: emits function_call_output items (one per TextContent) +with matching call_id -Steps to reproduce: -1. Add item to cart -2. Apply discount code "SAVE20" -3. Click checkout -4. Crash occurs +#### to_responses_value() -The bug was introduced in commit abc123 (yesterday's deployment). +Return serialized form. -Requirements: -1. Identify the root cause -2. Fix the bug -3. Add a regression test -4. Verify the fix doesn't break other functionality -``` +Either an instructions string (for system) or input items (for other roles). -**What OpenHands does**: -1. Analyzes the stack trace -2. Reviews recent changes -3. Identifies the null reference issue -4. Implements a defensive fix -5. Creates test to prevent regression +### class MessageToolCall ---- +Bases: `BaseModel` -#### Tutorial: Fix a Memory Leak +Transport-agnostic tool call representation. -**Goal**: Identify and fix a memory leak +One canonical id is used for linking across actions/observations and +for Responses function_call_output call_id. -**Prompt**: -``` -Investigate and fix the memory leak in our Node.js application. -Symptoms: -- Memory usage grows 100MB/hour -- After 24 hours, app becomes unresponsive -- Restarting temporarily fixes the issue +#### Properties -Suspected areas: -- Event listeners in src/events/ -- Cache implementation in src/cache/ -- WebSocket connections in src/ws/ +- `arguments`: str +- `id`: str +- `name`: str +- `origin`: Literal['completion', 'responses'] +- `costs`: list[Cost] +- `response_latencies`: list[ResponseLatency] +- `token_usages`: list[TokenUsage] -Analyze these areas and: -1. Identify the leak source -2. Explain why it's leaking -3. Implement a fix -4. Add monitoring to detect future leaks -``` +#### Methods ---- +#### classmethod from_chat_tool_call() -### Feature Development +Create a MessageToolCall from a Chat Completions tool call. -#### Tutorial: Add a REST API Endpoint +#### classmethod from_responses_function_call() -**Goal**: Create a new API endpoint with full functionality +Create a MessageToolCall from a typed OpenAI Responses function_call item. -**Prompt**: -``` -Add a user preferences API endpoint. +Note: OpenAI Responses function_call.arguments is already a JSON string. -Endpoint: /api/users/:id/preferences +#### model_config = (configuration object) -Operations: -- GET: Retrieve user preferences -- PUT: Update user preferences -- PATCH: Partially update preferences +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Preferences schema: -{ - theme: "light" | "dark", - notifications: { email: boolean, push: boolean }, - language: string, - timezone: string -} +#### to_chat_dict() -Requirements: -1. Follow patterns in src/api/routes/ -2. Add request validation with Joi -3. Use UserPreferencesService for business logic -4. Add appropriate error handling -5. Document the endpoint in OpenAPI format -6. Add unit and integration tests -``` +Serialize to OpenAI Chat Completions tool_calls format. -**What OpenHands does**: -1. Creates route handler following existing patterns -2. Implements validation middleware -3. Creates or updates the service layer -4. Adds error handling -5. Generates API documentation -6. Creates comprehensive tests +#### to_responses_dict() ---- +Serialize to OpenAI Responses ‘function_call’ input item format. -#### Tutorial: Implement a Feature Flag System +#### add_cost() -**Goal**: Add feature flags to the application +#### add_response_latency() -**Prompt**: -``` -Implement a feature flag system for our application. +#### add_token_usage() -Requirements: -1. Create a FeatureFlags service -2. Support these flag types: - - Boolean (on/off) - - Percentage (gradual rollout) - - User-based (specific user IDs) -3. Load flags from environment variables initially -4. Add a React hook: useFeatureFlag(flagName) -5. Add middleware for API routes +Add a single usage record. -Initial flags to configure: -- new_checkout: boolean, default false -- dark_mode: percentage, default 10% -- beta_features: user-based +#### deep_copy() -Include documentation and tests. -``` +Create a deep copy of the Metrics object. ---- +#### diff() -## Contributing Tutorials +Calculate the difference between current metrics and a baseline. -Have a great use case? Share it with the community! +This is useful for tracking metrics for specific operations like delegates. -**What makes a good tutorial:** -- Solves a common problem -- Has clear, reproducible steps -- Includes example prompts -- Explains expected outcomes -- Provides tips for success +* Parameters: + `baseline` – A metrics object representing the baseline state +* Returns: + A new Metrics object containing only the differences since the baseline -**How to contribute:** -1. Create a detailed example following this format -2. Test it with OpenHands to verify it works -3. Submit via GitHub pull request to the docs repository -4. Include any prerequisites or setup required +#### get() - -These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. - +Return the metrics in a dictionary. -### Key Features -Source: https://docs.openhands.dev/openhands/usage/key-features.md +#### get_snapshot() - - - - Displays the conversation between the user and OpenHands. - - OpenHands explains its actions in this panel. +Get a snapshot of the current metrics without the detailed lists. - ![overview](/openhands/static/img/chat-panel.png) - - - - Shows the file changes performed by OpenHands. +#### initialize_accumulated_token_usage() - ![overview](/openhands/static/img/changes-tab.png) - - - - Embedded VS Code for browsing and modifying files. - - Can also be used to upload and download files. +#### log() - ![overview](/openhands/static/img/vs-tab.png) - - - - A space for OpenHands and users to run terminal commands. +Log the metrics. - ![overview](/openhands/static/img/terminal-tab.png) - - - - Displays the web server when OpenHands runs an application. - - Users can interact with the running application. +#### merge() - ![overview](/openhands/static/img/app-tab.png) - - - - Used by OpenHands to browse websites. - - The browser is non-interactive. +Merge ‘other’ metrics into this one. - ![overview](/openhands/static/img/browser-tab.png) - - +#### model_config = (configuration object) -### Azure -Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms.md +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Azure OpenAI Configuration +#### classmethod validate_accumulated_cost() -When running OpenHands, you'll need to set the following environment variable using `-e` in the -docker run command: +### class MetricsSnapshot -``` -LLM_API_VERSION="" # e.g. "2023-05-15" -``` +Bases: `BaseModel` -Example: -```bash -docker run -it --pull=always \ - -e LLM_API_VERSION="2023-05-15" - ... -``` +A snapshot of metrics at a point in time. -Then in the OpenHands UI Settings under the `LLM` tab: +Does not include lists of individual costs, latencies, or token usages. - -You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as -<deployment-name> below. - -1. Enable `Advanced` options. -2. Set the following: - - `Custom Model` to azure/<deployment-name> - - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) - - `API Key` to your Azure API key +#### Properties -### Azure OpenAI Configuration +- `accumulated_cost`: float +- `accumulated_token_usage`: TokenUsage | None +- `max_budget_per_task`: float | None +- `model_name`: str -When running OpenHands, set the following environment variable using `-e` in the -docker run command: +#### Methods -``` -LLM_API_VERSION="" # e.g. "2024-02-15-preview" -``` +#### model_config = (configuration object) -### Custom LLM Configurations -Source: https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## How It Works +### class OAuthCredentials -Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: +Bases: `BaseModel` -```toml -# Default LLM configuration -[llm] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.0 +OAuth credentials for subscription-based LLM access. -# Custom LLM configuration for a cheaper model -[llm.gpt3] -model = "gpt-3.5-turbo" -api_key = "your-api-key" -temperature = 0.2 -# Another custom configuration with different parameters -[llm.high-creativity] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.8 -top_p = 0.9 -``` +#### Properties -Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. +- `access_token`: str +- `expires_at`: int +- `refresh_token`: str +- `type`: Literal['oauth'] +- `vendor`: str -## Using Custom Configurations +#### Methods -### With Agents +#### is_expired() -You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: +Check if the access token is expired. -```toml -[agent.RepoExplorerAgent] -# Use the cheaper GPT-3 configuration for this agent -llm_config = 'gpt3' +#### model_config = (configuration object) -[agent.CodeWriterAgent] -# Use the high creativity configuration for this agent -llm_config = 'high-creativity' -``` +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### Configuration Options +### class OpenAISubscriptionAuth -Each named LLM configuration supports all the same options as the default LLM configuration. These include: +Bases: `object` -- Model selection (`model`) -- API configuration (`api_key`, `base_url`, etc.) -- Model parameters (`temperature`, `top_p`, etc.) -- Retry settings (`num_retries`, `retry_multiplier`, etc.) -- Token limits (`max_input_tokens`, `max_output_tokens`) -- And all other LLM configuration options +Handle OAuth authentication for OpenAI ChatGPT subscription access. -For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. -## Use Cases +#### Properties -Custom LLM configurations are particularly useful in several scenarios: +- `vendor`: str + Get the vendor name. -- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. -- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. -- **Different Providers**: Use different LLM providers or API endpoints for different tasks. -- **Testing and Development**: Easily switch between different model configurations during development and testing. +#### Methods -## Example: Cost Optimization +#### __init__() -A practical example of using custom LLM configurations to optimize costs: +Initialize the OpenAI subscription auth handler. -```toml -# Default configuration using GPT-4 for high-quality responses -[llm] -model = "gpt-4" -api_key = "your-api-key" -temperature = 0.0 +* Parameters: + * `credential_store` – Optional custom credential store. + * `oauth_port` – Port for the local OAuth callback server. -# Cheaper configuration for repository exploration -[llm.repo-explorer] -model = "gpt-3.5-turbo" -temperature = 0.2 +#### create_llm() -# Configuration for code generation -[llm.code-gen] -model = "gpt-4" -temperature = 0.0 -max_output_tokens = 2000 +Create an LLM instance configured for Codex subscription access. -[agent.RepoExplorerAgent] -llm_config = 'repo-explorer' +* Parameters: + * `model` – The model to use (must be in OPENAI_CODEX_MODELS). + * `credentials` – OAuth credentials to use. If None, uses stored credentials. + * `instructions` – Optional instructions for the Codex model. + llm_kwargs* – Additional arguments to pass to LLM constructor. +* Returns: + An LLM instance configured for Codex access. +* Raises: + `ValueError` – If the model is not supported or no credentials available. -[agent.CodeWriterAgent] -llm_config = 'code-gen' -``` +#### get_credentials() -In this example: -- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code -- Code generation uses GPT-4 with a higher token limit for generating larger code blocks -- The default configuration remains available for other tasks +Get stored credentials if they exist. -# Custom Configurations with Reserved Names +#### has_valid_credentials() -OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. +Check if valid (non-expired) credentials exist. -## Draft Editor Configuration +#### async login() -The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. +Perform OAuth login flow. -For example, you can define in `config.toml` a draft editor like this: +This starts a local HTTP server to handle the OAuth callback, +opens the browser for user authentication, and waits for the +callback with the authorization code. -```toml -[llm.draft_editor] -model = "gpt-4" -temperature = 0.2 -top_p = 0.95 -presence_penalty = 0.0 -frequency_penalty = 0.0 -``` +* Parameters: + `open_browser` – Whether to automatically open the browser. +* Returns: + The obtained OAuth credentials. +* Raises: + `RuntimeError` – If the OAuth flow fails or times out. -This configuration: -- Uses GPT-4 for high-quality edits and suggestions -- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility -- Uses a high top_p value (0.95) to consider a wide range of token options -- Disables presence and frequency penalties to maintain focus on the specific edits needed +#### logout() -Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: -- Review and suggest code improvements -- Refine existing content while maintaining its core meaning -- Make precise, focused changes to code or text +Remove stored credentials. - -Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. - +* Returns: + True if credentials were removed, False if none existed. -### Google Gemini/Vertex -Source: https://docs.openhands.dev/openhands/usage/llms/google-llms.md +#### async refresh_if_needed() -## Gemini - Google AI Studio Configs +Refresh credentials if they are expired. -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `Gemini` -- `LLM Model` to the model you will be using. -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` -(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). -- `API Key` to your Gemini API key +* Returns: + Updated credentials, or None if no credentials exist. +* Raises: + `RuntimeError` – If token refresh fails. -## VertexAI - Google Cloud Platform Configs +### class ReasoningItemModel -To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment -variables using `-e` in the docker run command: +Bases: `BaseModel` -``` -GOOGLE_APPLICATION_CREDENTIALS="" -VERTEXAI_PROJECT="" -VERTEXAI_LOCATION="" -``` +OpenAI Responses reasoning item (non-stream, subset we consume). -Then set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `VertexAI` -- `LLM Model` to the model you will be using. -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` -(e.g. vertex_ai/<model-name>). +Do not log or render encrypted_content. -### Groq -Source: https://docs.openhands.dev/openhands/usage/llms/groq.md -## Configuration +#### Properties -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `Groq` -- `LLM Model` to the model you will be using. [Visit here to see the list of -models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, -enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). -- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). +- `content`: list[str] | None +- `encrypted_content`: str | None +- `id`: str | None +- `status`: str | None +- `summary`: list[str] -## Using Groq as an OpenAI-Compatible Endpoint +#### Methods -The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you -would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: -1. Enable `Advanced` options -2. Set the following: - - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) - - `Base URL` to `https://api.groq.com/openai/v1` - - `API Key` to your Groq API key +#### model_config = (configuration object) -### LiteLLM Proxy -Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Configuration +### class RedactedThinkingBlock -To use LiteLLM proxy with OpenHands, you need to: +Bases: `BaseModel` -1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) -2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: - * Enable `Advanced` options - * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) - * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) - * `API Key` to your LiteLLM proxy API key +Redacted thinking block for previous responses without extended thinking. -## Supported Models +This is used as a placeholder for assistant messages that were generated +before extended thinking was enabled. -The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy -is configured to handle. -Refer to your LiteLLM proxy configuration for the list of available models and their names. +#### Properties -### Overview -Source: https://docs.openhands.dev/openhands/usage/llms/llms.md +- `data`: str +- `type`: Literal['redacted_thinking'] - -This section is for users who want to connect OpenHands to different LLMs. - +#### Methods - -OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this -page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation -for the canonical list of supported parameters. - +#### model_config = (configuration object) -## Model Recommendations +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some -recommendations for model selection. Our latest benchmarking results can be found in -[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). +### class RegistryEvent -Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: +Bases: `BaseModel` -### Cloud / API-Based Models -- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) -- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) -- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) -- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) -- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) -- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) +#### Properties -If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process -to help others using the same provider! +- `llm`: [LLM](#class-llm) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RouterLLM -For a full list of the providers and models available, please consult the -[litellm documentation](https://docs.litellm.ai/docs/providers). +Bases: [`LLM`](#class-llm) - -OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending -limits and monitor usage. - +Base class for multiple LLM acting as a unified LLM. +This class provides a foundation for implementing model routing by +inheriting from LLM, allowing routers to work with multiple underlying +LLM models while presenting a unified LLM interface to consumers. +Key features: +- Works with multiple LLMs configured via llms_for_routing +- Delegates all other operations/properties to the selected LLM +- Provides routing interface through select_llm() method -### Local / Self-Hosted Models -- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) -- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) +#### Properties -### Known Issues +- `active_llm`: [LLM](#class-llm) | None +- `llms_for_routing`: dict[str, [LLM](#class-llm)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `router_name`: str - -Most current local and open source models are not as powerful. When using such models, you may see long -wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the -models driving it. However, if you do find ones that work, please add them to the verified list above. - +#### Methods -## LLM Configuration +#### completion() -The following can be set in the OpenHands UI through the Settings. Each option is serialized into the -`LLM.load_from_env()` schema before being passed to the Agent SDK: +This method intercepts completion calls and routes them to the appropriate +underlying LLM based on the routing logic implemented in select_llm(). -- `LLM Provider` -- `LLM Model` -- `API Key` -- `Base URL` (through `Advanced` settings) +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API -There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them -as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. -- `LLM_API_VERSION` -- `LLM_EMBEDDING_MODEL` -- `LLM_EMBEDDING_DEPLOYMENT_NAME` -- `LLM_DROP_PARAMS` -- `LLM_DISABLE_VISION` -- `LLM_CACHING_PROMPT` +#### model_post_init() -## LLM Provider Guides +This function is meant to behave like a BaseModel method to initialise private attributes. -We have a few guides for running OpenHands with specific model providers: +It takes context as an argument since that’s what pydantic-core passes when calling it. -- [Azure](/openhands/usage/llms/azure-llms) -- [Google](/openhands/usage/llms/google-llms) -- [Groq](/openhands/usage/llms/groq) -- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) -- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) -- [Moonshot AI](/openhands/usage/llms/moonshot) -- [OpenAI](/openhands/usage/llms/openai-llms) -- [OpenHands](/openhands/usage/llms/openhands-llms) -- [OpenRouter](/openhands/usage/llms/openrouter) +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. -These pages remain the authoritative provider references for both the Agent SDK -and the OpenHands interfaces. +#### abstractmethod select_llm() -## Model Customization +Select which LLM to use based on messages and events. -LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: +This method implements the core routing logic for the RouterLLM. +Subclasses should analyze the provided messages to determine which +LLM from llms_for_routing is most appropriate for handling the request. -- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. -- **Native Tool Calling**: Toggle native function/tool calling capabilities. +* Parameters: + `messages` – List of messages in the conversation that can be used + to inform the routing decision. +* Returns: + The key/name of the LLM to use from llms_for_routing dictionary. -For detailed information about model customization, see -[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). +#### classmethod set_placeholder_model() -### API retries and rate limits +Guarantee model exists before LLM base validation runs. -LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically -retry requests if it receives a Rate Limit Error (429 error code). +#### classmethod validate_llms_not_empty() -You can customize these options as you need for the provider you're using. Check their documentation, and set the -following environment variables to control the number of retries and the time between retries: +### class TextContent -- `LLM_NUM_RETRIES` (Default of 4 times) -- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) -- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) -- `LLM_RETRY_MULTIPLIER` (Default of 2) +Bases: `BaseContent` -If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: -```toml -[llm] -num_retries = 4 -retry_min_wait = 5 -retry_max_wait = 30 -retry_multiplier = 2 -``` +#### Properties -### Local LLMs -Source: https://docs.openhands.dev/openhands/usage/llms/local-llms.md +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str +- `type`: Literal['text'] -## News +#### Methods -- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! +#### to_llm_dict() -## Quickstart: Running OpenHands with a Local LLM using LM Studio +Convert to LLM API format. -This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. +### class ThinkingBlock -We recommend: -- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. -- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. +Bases: `BaseModel` -### Hardware Requirements +Anthropic thinking block for extended thinking feature. -Running Qwen3-Coder-30B-A3B-Instruct requires: -- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or -- A Mac with Apple Silicon with at least 32GB of RAM +This represents the raw thinking blocks returned by Anthropic models +when extended thinking is enabled. These blocks must be preserved +and passed back to the API for tool use scenarios. -### 1. Install LM Studio -Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). +#### Properties -### 2. Download the Model +- `signature`: str | None +- `thinking`: str +- `type`: Literal['thinking'] -1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. -2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. +#### Methods -![image](./screenshots/01_lm_studio_open_model_hub.png) +#### model_config = (configuration object) -3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -![image](./screenshots/02_lm_studio_download_devstral.png) +### openhands.sdk.security +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md -4. Wait for the download to finish. +### class AlwaysConfirm -### 3. Load the Model +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. -2. Click the "Select a model to load" dropdown at the top of the application window. +#### Methods -![image](./screenshots/03_lm_studio_open_load_model.png) +#### model_config = (configuration object) -3. Enable the "Manually choose model load parameters" switch. -4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) +#### should_confirm() -5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. -6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. -7. Click "Load Model" to start loading the model. +Determine if an action with the given risk level requires confirmation. -![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. -### 4. Start the LLM server +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -1. Enable the switch next to "Status" at the top-left of the Window. -2. Take note of the Model API Identifier shown on the sidebar on the right. +### class ConfirmRisky -![image](./screenshots/06_lm_studio_start_server.png) +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -### 5. Start OpenHands -1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: +#### Properties -```bash -docker run -it --rm --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e LOG_ALL_EVENTS=true \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:1.4 -``` +- `confirm_unknown`: bool +- `threshold`: [SecurityRisk](#class-securityrisk) -2. Wait until the server is running (see log below): -``` -Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f -Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 -Starting OpenHands... -Running OpenHands as root -14:22:13 - openhands:INFO: server_config.py:50 - Using config class None -INFO: Started server process [8] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) -``` +#### Methods -3. Visit `http://localhost:3000` in your browser. +#### model_config = (configuration object) -### 6. Configure OpenHands to use the LLM server +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. +#### should_confirm() -When started for the first time, OpenHands will prompt you to set up the LLM provider. +Determine if an action with the given risk level requires confirmation. -1. Click "see advanced settings" to open the LLM Settings page. +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. -![image](./screenshots/07_openhands_open_advanced_settings.png) +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -2. Enable the "Advanced" switch at the top of the page to show all the available settings. +#### classmethod validate_threshold() -3. Set the following values: - - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") - - **Base URL**: `http://host.docker.internal:1234/v1` - - **API Key**: `local-llm` +### class ConfirmationPolicyBase -4. Click "Save Settings" to save the configuration. +Bases: `DiscriminatedUnionMixin`, `ABC` -![image](./screenshots/08_openhands_configure_local_llm_parameters.png) +#### Methods -That's it! You can now start using OpenHands with the local LLM server. +#### model_config = (configuration object) -If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Advanced: Alternative LLM Backends +#### abstractmethod should_confirm() -This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. +Determine if an action with the given risk level requires confirmation. -### Create an OpenAI-Compatible Endpoint with Ollama +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. -- Install Ollama following [the official documentation](https://ollama.com/download). -- Example launch command for Qwen3-Coder-30B-A3B-Instruct: +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -```bash -# ⚠️ WARNING: OpenHands requires a large context size to work properly. -# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. -# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. -OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & -ollama pull qwen3-coder:30b -``` +### class GraySwanAnalyzer -### Create an OpenAI-Compatible Endpoint with vLLM or SGLang +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) -First, download the model checkpoint: +Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. -```bash -huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct -``` +This analyzer sends conversation history and pending actions to the GraySwan +Cygnal API for security analysis. The API returns a violation score which is +mapped to SecurityRisk levels. -#### Serving the model using SGLang +Environment Variables: +: GRAYSWAN_API_KEY: Required API key for GraySwan authentication + GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy -- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). -- Example launch command (with at least 2 GPUs): +#### Example -```bash -SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ - --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --port 8000 \ - --tp 2 --dp 1 \ - --host 0.0.0.0 \ - --api-key mykey --context-length 131072 +```pycon +>>> from openhands.sdk.security.grayswan import GraySwanAnalyzer +>>> analyzer = GraySwanAnalyzer() +>>> risk = analyzer.security_risk(action_event) ``` -#### Serving the model using vLLM - -- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). -- Example launch command (with at least 2 GPUs): -```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --host 0.0.0.0 --port 8000 \ - --api-key mykey \ - --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --enable-prefix-caching -``` +#### Properties -If you are interested in further improved inference speed, you can also try Snowflake's version -of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), -which can achieve up to 2x speedup in some cases. +- `api_key`: SecretStr | None +- `api_url`: str +- `history_limit`: int +- `low_threshold`: float +- `max_message_chars`: int +- `medium_threshold`: float +- `policy_id`: str | None +- `timeout`: float -1. Install the Arctic Inference library that automatically patches vLLM: +#### Methods -```bash -pip install git+https://github.com/snowflakedb/ArcticInference.git -``` +#### close() -2. Run the launch command with speculative decoding enabled: +Clean up resources. -```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --host 0.0.0.0 --port 8000 \ - --api-key mykey \ - --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ - --speculative-config '{"method": "suffix"}' -``` +#### model_config = (configuration object) -### Run OpenHands (Alternative Backends) +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### Using Docker +#### model_post_init() -Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). +Initialize the analyzer after model creation. -#### Using Development Mode +#### security_risk() -Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. +Analyze action for security risks using GraySwan API. -Start OpenHands using `make run`. +This method converts the conversation history and the pending action +to OpenAI message format and sends them to the GraySwan Cygnal API +for security analysis. -### Configure OpenHands (Alternative Backends) +* Parameters: + `action` – The ActionEvent to analyze +* Returns: + SecurityRisk level based on GraySwan analysis -Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. +#### set_events() -1. Click **"see advanced settings"** to access the full configuration panel. -2. Enable the **Advanced** toggle at the top of the page. -3. Set the following parameters, if you followed the examples above: - - **Custom Model**: `openai/` - - For **Ollama**: `openai/qwen3-coder:30b` - - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` - - **Base URL**: `http://host.docker.internal:/v1` - Use port `11434` for Ollama, or `8000` for SGLang and vLLM. - - **API Key**: - - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) - - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) +Set the events for context when analyzing actions. -### Moonshot AI -Source: https://docs.openhands.dev/openhands/usage/llms/moonshot.md +* Parameters: + `events` – Sequence of events to use as context for security analysis -## Using Moonshot AI with OpenHands +#### validate_thresholds() -[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. +Validate that thresholds are properly ordered. -### Setup +### class LLMSecurityAnalyzer -1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) -2. Generate an API key from your account settings -3. Configure OpenHands to use Moonshot AI: +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) -| Setting | Value | -| --- | --- | -| LLM Provider | `moonshot` | -| LLM Model | `kimi-k2-0711-preview` | -| API Key | Your Moonshot API key | +LLM-based security analyzer. -### Recommended Models +This analyzer respects the security_risk attribute that can be set by the LLM +when generating actions, similar to OpenHands’ LLMRiskAnalyzer. -- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. +It provides a lightweight security analysis approach that leverages the LLM’s +understanding of action context and potential risks. -### OpenAI -Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms.md +#### Methods -## Configuration +#### model_config = (configuration object) -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -* `LLM Provider` to `OpenAI` -* `LLM Model` to the model you will be using. -[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) -If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). -* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Using OpenAI-Compatible Endpoints +#### security_risk() -Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). +Evaluate security risk based on LLM-provided assessment. -## Using an OpenAI Proxy +This method checks if the action has a security_risk attribute set by the LLM +and returns it. The LLM may not always provide this attribute but it defaults to +UNKNOWN if not explicitly set. -If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: -1. Enable `Advanced` options -2. Set the following: - - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) - - `Base URL` to the URL of your OpenAI proxy - - `API Key` to your OpenAI API key +### class NeverConfirm -### OpenHands -Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) -## Obtain Your OpenHands LLM API Key +#### Methods -1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). -2. Go to the Settings page and navigate to the `API Keys` tab. -3. Copy your `LLM API Key`. +#### model_config = (configuration object) -![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Configuration +#### should_confirm() -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -- `LLM Provider` to `OpenHands` -- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) -- `API Key` to your OpenHands LLM API key copied from above +Determine if an action with the given risk level requires confirmation. -## Using OpenHands LLM Provider in the CLI +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. -1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). -2. To select OpenHands as the LLM provider: - - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. - - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then - choose `openhands` and finally the model. +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. -![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) +### class SecurityAnalyzerBase +Bases: `DiscriminatedUnionMixin`, `ABC` - -When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy - +Abstract base class for security analyzers. -## Using OpenHands LLM Provider with the SDK +Security analyzers evaluate the risk of actions before they are executed +and can influence the conversation flow based on security policies. -You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. +This is adapted from OpenHands SecurityAnalyzer but designed to work +with the agent-sdk’s conversation-based architecture. -### Configuration +#### Methods -The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: +#### analyze_event() -```bash -export LLM_API_KEY="your-openhands-api-key" -export LLM_MODEL="openhands/claude-sonnet-4-20250514" -``` +Analyze an event for security risks. -### Example +This is a convenience method that checks if the event is an action +and calls security_risk() if it is. Non-action events return None. -```python -from openhands.sdk import LLM +* Parameters: + `event` – The event to analyze +* Returns: + ActionSecurityRisk if event is an action, None otherwise -# The openhands/ prefix auto-configures the base URL -llm = LLM.load_from_env() +#### analyze_pending_actions() -# Or configure directly -llm = LLM( - model="openhands/claude-sonnet-4-20250514", - api_key="your-openhands-api-key", -) -``` +Analyze all pending actions in a conversation. -The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. +This method gets all unmatched actions from the conversation state +and analyzes each one for security risks. -### Available Models +* Parameters: + `conversation` – The conversation to analyze +* Returns: + List of tuples containing (action, risk_level) for each pending action -When using the SDK, prefix any model from the pricing table below with `openhands/`: -- `openhands/claude-sonnet-4-20250514` -- `openhands/claude-sonnet-4-5-20250929` -- `openhands/claude-opus-4-20250514` -- `openhands/gpt-5-2025-08-07` -- etc. +#### model_config = (configuration object) - -If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. - +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Pricing +#### abstractmethod security_risk() -Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: +Evaluate the security risk of an ActionEvent. +This is the core method that analyzes an ActionEvent and returns its risk level. +Implementations should examine the action’s content, context, and potential +impact to determine the appropriate risk level. -| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | -|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| -| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | -| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | -| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | -| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | -| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | -| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | -| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | -| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | -| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | -| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | -| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | -| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | -| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | -| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | -| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | +* Parameters: + `action` – The ActionEvent to analyze for security risks +* Returns: + ActionSecurityRisk enum indicating the risk level -**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. +#### should_require_confirmation() -### OpenRouter -Source: https://docs.openhands.dev/openhands/usage/llms/openrouter.md +Determine if an action should require user confirmation. -## Configuration +This implements the default confirmation logic based on risk level +and confirmation mode settings. -When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: -* `LLM Provider` to `OpenRouter` -* `LLM Model` to the model you will be using. -[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). -If the model is not in the list, enable `Advanced` options, and enter it in -`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). -* `API Key` to your OpenRouter API key. +* Parameters: + * `risk` – The security risk level of the action + * `confirmation_mode` – Whether confirmation mode is enabled +* Returns: + True if confirmation is required, False otherwise -### OpenHands GitHub Action -Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md +### class SecurityRisk -## Using the Action in the OpenHands Repository +Bases: `str`, `Enum` -To use the OpenHands GitHub Action in a repository, you can: +Security risk levels for actions. -1. Create an issue in the repository. -2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. +Based on OpenHands security risk levels but adapted for agent-sdk. +Integer values allow for easy comparison and ordering. -The action will automatically trigger and attempt to resolve the issue. -## Installing the Action in a New Repository +#### Properties -To install the OpenHands GitHub Action in your own repository, follow -the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). +- `description`: str + Get a human-readable description of the risk level. +- `visualize`: Text + Return Rich Text representation of this risk level. -## Usage Tips +#### Methods -### Iterative resolution +#### HIGH = 'HIGH' -1. Create an issue in the repository. -2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. -3. Review the attempt to resolve the issue by checking the pull request. -4. Follow up with feedback through general comments, review comments, or inline thread comments. -5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. +#### LOW = 'LOW' -### Label versus Macro +#### MEDIUM = 'MEDIUM' -- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. -- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. +#### UNKNOWN = 'UNKNOWN' -## Advanced Settings +#### get_color() -### Add custom repository settings +Get the color for displaying this risk level in Rich text. -You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). +#### is_riskier() -### Custom configurations +Check if this risk level is riskier than another. -GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. -The customization options you can set are: +Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is +less risky than HIGH. UNKNOWN is not comparable to any other level. -| **Attribute name** | **Type** | **Purpose** | **Example** | -| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | -| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | -| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | -| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | -| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | -| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | -| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | +To make this act like a standard well-ordered domain, we reflexively consider +risk levels to be riskier than themselves. That is: -### Configure -Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md + for risk_level in list(SecurityRisk): + : assert risk_level.is_riskier(risk_level) -## Prerequisites + # More concretely: + assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) + assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) + assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) -- [OpenHands is running](/openhands/usage/run-openhands/local-setup) +This can be disabled by setting the reflexive parameter to False. -## Launching the GUI Server +* Parameters: + other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. + reflexive (bool*) – Whether the relationship is reflexive. +* Raises: + `ValueError` – If either risk level is UNKNOWN. -### Using the CLI Command +### openhands.sdk.tool +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md -You can launch the OpenHands GUI server directly from the command line using the `serve` command: +### class Action - -**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` -installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker -directly (see the [Docker section](#using-docker-directly) below). - +Bases: `Schema`, `ABC` -```bash -openhands serve -``` +Base schema for input action. -This command will: -- Check that Docker is installed and running -- Pull the required Docker images -- Launch the OpenHands GUI server at http://localhost:3000 -- Use the same configuration directory (`~/.openhands`) as the CLI mode -#### Mounting Your Current Directory +#### Properties -To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `visualize`: Text + Return Rich Text representation of this action. + This method can be overridden by subclasses to customize visualization. + The base implementation displays all action fields systematically. +### class ExecutableTool -```bash -openhands serve --mount-cwd -``` +Bases: `Protocol` -This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. +Protocol for tools that are guaranteed to have a non-None executor. -#### Using GPU Support +This eliminates the need for runtime None checks and type narrowing +when working with tools that are known to be executable. -If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: -```bash -openhands serve --gpu -``` +#### Properties -This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: +- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] +- `name`: str -```bash -openhands serve --gpu --mount-cwd -``` +#### Methods -**Prerequisites for GPU support:** -- NVIDIA GPU drivers must be installed on your host system -- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured +#### __init__() -#### Requirements +### class FinishTool -Before using the `openhands serve` command, ensure that: -- Docker is installed and running on your system -- You have internet access to pull the required Docker images -- Port 3000 is available on your system +Bases: `ToolDefinition[FinishAction, FinishObservation]` -The CLI will automatically check these requirements and provide helpful error messages if anything is missing. +Tool for signaling the completion of a task or conversation. -### Using Docker Directly -Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. +#### Properties -## Overview +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### Initial Setup +#### Methods -1. Upon first launch, you'll see a settings popup. -2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, - select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the - `Custom Model` text box. -3. Enter the corresponding `API Key` for your chosen provider. -4. Click `Save Changes` to apply the settings. +#### classmethod create() -### Settings +Create FinishTool instance. -You can use the Settings page at any time to: +* Parameters: + * `conv_state` – Optional conversation state (not used by FinishTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single FinishTool instance. +* Raises: + `ValueError` – If any parameters are provided. -- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). -- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). -- [Configure MCP servers](/openhands/usage/settings/mcp-settings). -- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), - [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) - and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). -- Set application settings like your preferred language, notifications and other preferences. -- [Manage custom secrets](/openhands/usage/settings/secrets-settings). +#### name = 'finish' -### Key Features +### class Observation -For an overview of the key features available inside a conversation, please refer to the -[Key Features](/openhands/usage/key-features) section of the documentation. +Bases: `Schema`, `ABC` -## Other Ways to Run Openhands -- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) -- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) +Base schema for output observation. -### Setup -Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md -## Recommended Methods for Running Openhands on Your Local System +#### Properties -### System Requirements +- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' +- `content`: list[TextContent | ImageContent] +- `is_error`: bool +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str + Extract all text content from the observation. + * Returns: + Concatenated text from all TextContent items in content. +- `to_llm_content`: Sequence[TextContent | ImageContent] + Default content formatting for converting observation to LLM readable content. + Subclasses can override to provide richer content (e.g., images, diffs). +- `visualize`: Text + Return Rich Text representation of this observation. + Subclasses can override for custom visualization; by default we show the + same text that would be sent to the LLM. -- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) -- Linux -- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) +#### Methods -A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. +#### classmethod from_text() -### Prerequisites +Utility to create an Observation from a simple text string. - +* Parameters: + * `text` – The text content to include in the observation. + * `is_error` – Whether this observation represents an error. + kwargs* – Additional fields for the observation subclass. +* Returns: + An Observation instance with the text wrapped in a TextContent. - +### class ThinkTool - **Docker Desktop** +Bases: `ToolDefinition[ThinkAction, ThinkObservation]` - 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). - 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. - +Tool for logging thoughts without making changes. - - - Tested with Ubuntu 22.04. - +#### Properties - **Docker Desktop** +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). +#### Methods - +#### classmethod create() - +Create ThinkTool instance. - **WSL** +* Parameters: + * `conv_state` – Optional conversation state (not used by ThinkTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single ThinkTool instance. +* Raises: + `ValueError` – If any parameters are provided. - 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). - 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. +#### name = 'think' - **Ubuntu (Linux Distribution)** +### class Tool - 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. - 2. Restart computer when prompted. - 3. Open Ubuntu from Start menu to complete setup. - 4. Verify installation: `wsl --list` should show Ubuntu. +Bases: `BaseModel` - **Docker Desktop** +Defines a tool to be initialized for the agent. - 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). - 2. Open Docker Desktop, go to `Settings` and confirm the following: - - General: `Use the WSL 2 based engine` is enabled. - - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. +This is only used in agent-sdk for type schema for server use. - - The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. - - +#### Properties - +- `name`: str +- `params`: dict[str, Any] -### Start the App +#### Methods -#### Option 1: Using the CLI Launcher with uv (Recommended) +#### model_config = (configuration object) -We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -**Install uv** (if you haven't already): +#### classmethod validate_name() -See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. +Validate that name is not empty. -**Install OpenHands**: -```bash -uv tool install openhands --python 3.12 -``` +#### classmethod validate_params() -**Launch OpenHands**: -```bash -# Launch the GUI server -openhands serve +Convert None params to empty dict. -# Or with GPU support (requires nvidia-docker) -openhands serve --gpu +### class ToolAnnotations -# Or with current directory mounted -openhands serve --mount-cwd -``` +Bases: `BaseModel` -This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. +Annotations to provide hints about the tool’s behavior. -**Upgrade OpenHands**: -```bash -uv tool upgrade openhands --python 3.12 -``` +Based on Model Context Protocol (MCP) spec: +[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) - -If you prefer to use pip and have Python 3.12+ installed: +#### Properties -```bash -# Install OpenHands -pip install openhands +- `destructiveHint`: bool +- `idempotentHint`: bool +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `openWorldHint`: bool +- `readOnlyHint`: bool +- `title`: str | None +### class ToolDefinition -# Launch the GUI server -openhands serve -``` +Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` -Note that you'll still need `uv` installed for the default MCP servers to work properly. +Base class for all tool implementations. - +This class serves as a base for the discriminated union of all tool types. +All tools must inherit from this class and implement the .create() method for +proper initialization with executors and parameters. -#### Option 2: Using Docker Directly +Features: +- Normalize input/output schemas (class or dict) into both model+schema. +- Validate inputs before execute. +- Coerce outputs only if an output model is defined; else return vanilla JSON. +- Export MCP tool description. - +#### Examples -```bash -docker run -it --rm --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e LOG_ALL_EVENTS=true \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:1.4 -``` +Simple tool with no parameters: +: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): + : @classmethod + def create(cls, conv_state=None, + `
` + ``` + ** + ``` + `
` + params): + `
` + > return [cls(name=”finish”, …, executor=FinishExecutor())] -
+Complex tool with initialization parameters: +: class TerminalTool(ToolDefinition[TerminalAction, + : TerminalObservation]): + @classmethod + def create(cls, conv_state, + `
` + ``` + ** + ``` + `
` + params): + `
` + > executor = TerminalExecutor( + > : working_dir=conv_state.workspace.working_dir, + > `
` + > ``` + > ** + > ``` + > `
` + > params, + `
` + > ) + > return [cls(name=”terminal”, …, executor=executor)] -> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. -You'll find OpenHands running at http://localhost:3000! +#### Properties -### Setup +- `action_type`: type[[Action](#class-action)] +- `annotations`: [ToolAnnotations](#class-toolannotations) | None +- `description`: str +- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] +- `meta`: dict[str, Any] | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: ClassVar[str] = '' +- `observation_type`: type[[Observation](#class-observation)] | None +- `title`: str -After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. -This can be done during the initial settings popup or by selecting the `Settings` -button (gear icon) in the UI. +#### Methods -If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options -and manually enter it with the correct prefix in the `Custom Model` text box. -The `Advanced` options also allow you to specify a `Base URL` if required. +#### action_from_arguments() -#### Getting an API Key +Create an action from parsed arguments. -OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: +This method can be overridden by subclasses to provide custom logic +for creating actions from arguments (e.g., for MCP tools). - +* Parameters: + `arguments` – The parsed arguments from the tool call. +* Returns: + The action instance created from the arguments. - +#### as_executable() -1. [Log in to OpenHands Cloud](https://app.all-hands.dev). -2. Go to the Settings page and navigate to the `API Keys` tab. -3. Copy your `LLM API Key`. +Return this tool as an ExecutableTool, ensuring it has an executor. -OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). +This method eliminates the need for runtime None checks by guaranteeing +that the returned tool has a non-None executor. - +* Returns: + This tool instance, typed as ExecutableTool. +* Raises: + `NotImplementedError` – If the tool has no executor. - +#### abstractmethod classmethod create() -1. [Create an Anthropic account](https://console.anthropic.com/). -2. [Generate an API key](https://console.anthropic.com/settings/keys). -3. [Set up billing](https://console.anthropic.com/settings/billing). +Create a sequence of Tool instances. - +This method must be implemented by all subclasses to provide custom +initialization logic, typically initializing the executor with parameters +from conv_state and other optional parameters. - +* Parameters: + args** – Variable positional arguments (typically conv_state as first arg). + kwargs* – Optional parameters for tool initialization. +* Returns: + A sequence of Tool instances. Even single tools are returned as a sequence + to provide a consistent interface and eliminate union return types. -1. [Create an OpenAI account](https://platform.openai.com/). -2. [Generate an API key](https://platform.openai.com/api-keys). -3. [Set up billing](https://platform.openai.com/account/billing/overview). +#### classmethod resolve_kind() - +Resolve a kind string to its corresponding tool class. - +* Parameters: + `kind` – The name of the tool class to resolve +* Returns: + The tool class corresponding to the kind +* Raises: + `ValueError` – If the kind is unknown -1. Create a Google account if you don't already have one. -2. [Generate an API key](https://aistudio.google.com/apikey). -3. [Set up billing](https://aistudio.google.com/usage?tab=billing). +#### set_executor() - +Create a new Tool instance with the given executor. - +#### to_mcp_tool() -If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. +Convert a Tool to an MCP tool definition. - +Allow overriding input/output schemas (usually by subclasses). - +* Parameters: + * `input_schema` – Optionally override the input schema. + * `output_schema` – Optionally override the output schema. -Consider setting usage limits to control costs. +#### to_openai_tool() -#### Using a Local LLM +Convert a Tool to an OpenAI tool. - -Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. - +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + to the action schema for LLM to predict. This is useful for + tools that may have safety risks, so the LLM can reason about + the risk level before calling the tool. + * `action_type` – Optionally override the action_type to use for the schema. + This is useful for MCPTool to use a dynamically created action type + based on the tool’s input schema. -To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. -#### Setting Up Search Engine +#### to_responses_tool() -OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. +Convert a Tool to a Responses API function tool (LiteLLM typed). -To enable search functionality in OpenHands: +For Responses API, function tools expect top-level keys: +(JSON configuration object) -1. Get a Tavily API key from [tavily.com](https://tavily.com/). -2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + * `action_type` – Optional override for the action type -For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. -### Versions +### class ToolExecutor -The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: -- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. -For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. -- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. -This version is unstable and is recommended for testing or development purposes only. +Bases: `ABC`, `Generic` -## Next Steps +Executor function type for a Tool. -- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories -- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) -- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) -- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) +#### Methods -### Docker Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker.md +#### close() -The **Docker sandbox** runs the agent server inside a Docker container. This is -the default and recommended option for most users. +Close the executor and clean up resources. - - In some self-hosted deployments, the sandbox provider is controlled via the - legacy RUNTIME environment variable. Docker is the default. - +Default implementation does nothing. Subclasses should override +this method to perform cleanup (e.g., closing connections, +terminating processes, etc.). +### openhands.sdk.utils +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md -## Why Docker? +Utility functions for the OpenHands SDK. -- Isolation: reduces risk when the agent runs commands. -- Reproducibility: consistent environment across machines. +### deprecated() -## Mounting your code into the sandbox +Return a decorator that deprecates a callable with explicit metadata. -If you want OpenHands to work directly on a local repository, mount it into the -sandbox. +Use this helper when you can annotate a function, method, or property with +@deprecated(…). It transparently forwards to `deprecation.deprecated()` +while filling in the SDK’s current version metadata unless custom values are +supplied. -### Recommended: CLI launcher +### maybe_truncate() -If you start OpenHands via: +Truncate the middle of content if it exceeds the specified length. -```bash -openhands serve --mount-cwd -``` +Keeps the head and tail of the content to preserve context at both ends. +Optionally saves the full content to a file for later investigation. -your current directory will be mounted into the sandbox workspace. +* Parameters: + * `content` – The text content to potentially truncate + * `truncate_after` – Maximum length before truncation. If None, no truncation occurs + * `truncate_notice` – Notice to insert in the middle when content is truncated + * `save_dir` – Working directory to save full content file in + * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) +* Returns: + Original content if under limit, or truncated content with head and tail + preserved and reference to saved file if applicable -### Using SANDBOX_VOLUMES +### sanitize_openhands_mentions() -You can also configure mounts via the SANDBOX_VOLUMES environment -variable (format: host_path:container_path[:mode]): +Sanitize @OpenHands mentions in text to prevent self-mention loops. -```bash -export SANDBOX_VOLUMES=$PWD:/workspace:rw -``` +This function inserts a zero-width joiner (ZWJ) after the @ symbol in +@OpenHands mentions, making them non-clickable in GitHub comments while +preserving readability. The original case of the mention is preserved. - - Anything mounted read-write into /workspace can be modified by the - agent. - +* Parameters: + `text` – The text to sanitize +* Returns: + Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) -## Custom sandbox images +### Examples -To customize the container image (extra tools, system deps, etc.), see -[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). +```pycon +>>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") +'Thanks @u200dOpenHands for the help!' +>>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") +'Check @u200dopenhands and @u200dOPENHANDS' +>>> sanitize_openhands_mentions("No mention here") +'No mention here' +``` -### Overview -Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview.md +### sanitized_env() -A **sandbox** is the environment where OpenHands runs commands, edits files, and -starts servers while working on your task. +Return a copy of env with sanitized values. -In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. +PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored +libraries win. This function restores the original value so that subprocess +will not use them. -## Sandbox providers +### warn_deprecated() -OpenHands supports multiple sandbox “providers”, with different tradeoffs: +Emit a deprecation warning for dynamic access to a legacy feature. -- **Docker sandbox (recommended)** - - Runs the agent server inside a Docker container. - - Good isolation from your host machine. +Prefer this helper when a decorator is not practical—e.g. attribute accessors, +data migrations, or other runtime paths that must conditionally warn. Provide +explicit version metadata so the SDK reports consistent messages and upgrades +to `deprecation.UnsupportedWarning` after the removal threshold. -- **Process sandbox (unsafe, but fast)** - - Runs the agent server as a regular process on your machine. - - No container isolation. +### openhands.sdk.workspace +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md -- **Remote sandbox** - - Runs the agent server in a remote environment. - - Used by managed deployments and some hosted setups. +### class BaseWorkspace -## Selecting a provider (current behavior) +Bases: `DiscriminatedUnionMixin`, `ABC` -In some deployments, the provider selection is still controlled via the legacy -RUNTIME environment variable: +Abstract base class for workspace implementations. -- RUNTIME=docker (default) -- RUNTIME=process (aka legacy RUNTIME=local) -- RUNTIME=remote +Workspaces provide a sandboxed environment where agents can execute commands, +read/write files, and perform other operations. All workspace implementations +support the context manager protocol for safe resource management. - - The user-facing terminology in V1 is sandbox, but the configuration knob - may still be called RUNTIME while the migration is in progress. - +#### Example -## Terminology note (V0 vs V1) +```pycon +>>> with workspace: +... result = workspace.execute_command("echo 'hello'") +... content = workspace.read_file("example.txt") +``` -Older documentation refers to these environments as **runtimes**. -Those legacy docs are now in the Legacy (V0) section of the Web tab. -### Process Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/process.md +#### Properties -The **Process sandbox** runs the agent server directly on your machine as a -regular process. +- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] - - This mode provides **no sandbox isolation**. +#### Methods - The agent can read/write files your user account can access and execute - commands on your host system. +#### abstractmethod execute_command() - Only use this in controlled environments. - +Execute a bash command on the system. -## When to use it +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory for the command (optional) + * `timeout` – Timeout in seconds (defaults to 30.0) +* Returns: + Result containing stdout, stderr, exit_code, and other + : metadata +* Return type: + [CommandResult](#class-commandresult) +* Raises: + `Exception` – If command execution fails -- Local development when Docker is unavailable -- Some CI environments -- Debugging issues that only reproduce outside containers +#### abstractmethod file_download() -## Choosing process mode +Download a file from the system. -In some deployments, this is selected via the legacy RUNTIME -environment variable: +* Parameters: + * `source_path` – Path to the source file on the system + * `destination_path` – Path where the file should be downloaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file download fails -```bash -export RUNTIME=process -# (legacy alias) -# export RUNTIME=local -``` +#### abstractmethod file_upload() -If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). +Upload a file to the system. -### Remote Sandbox -Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote.md +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be uploaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file upload fails -A **remote sandbox** runs the agent server in a remote execution environment -instead of on your local machine. +#### abstractmethod git_changes() -This is typically used by managed deployments (e.g., OpenHands Cloud) and -advanced self-hosted setups. +Get the git changes for the repository at the path given. -## Selecting remote mode +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed -In some self-hosted deployments, remote sandboxes are selected via the legacy -RUNTIME environment variable: +#### abstractmethod git_diff() -```bash -export RUNTIME=remote -``` +Get the git diff for the file at the path given. -Remote sandboxes require additional configuration (API URL + API key). The exact -variable names depend on your deployment, but you may see legacy names like: +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed -- SANDBOX_REMOTE_RUNTIME_API_URL -- SANDBOX_API_KEY +#### model_config = (configuration object) -## Notes +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) - depending on the provider. -- Configuration and credentials vary by deployment. +#### pause() -If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). +Pause the workspace to conserve resources. -### API Keys Settings -Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md +For local workspaces, this is a no-op. +For container-based workspaces, this pauses the container. - - These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). - +* Raises: + `NotImplementedError` – If the workspace type does not support pausing. -## Overview +#### resume() -Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to -OpenHands Cloud +Resume a paused workspace. -## OpenHands LLM Key +For local workspaces, this is a no-op. +For container-based workspaces, this resumes the container. - -You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. - +* Raises: + `NotImplementedError` – If the workspace type does not support resuming. -You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), -[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will -use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. +### class CommandResult -## OpenHands API Key +Bases: `BaseModel` -These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the -[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). +Result of executing a command in the workspace. -### Create API Key -1. Navigate to the `Settings > API Keys` page. -2. Click `Create API Key`. -3. Give your API key a name and click `Create`. +#### Properties -### Delete API Key +- `command`: str +- `exit_code`: int +- `stderr`: str +- `stdout`: str +- `timeout_occurred`: bool -1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. -2. Click `Delete` to confirm removal. +#### Methods -### Application Settings -Source: https://docs.openhands.dev/openhands/usage/settings/application-settings.md +#### model_config = (configuration object) -## Overview +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -The Application settings allows you to customize various application-level behaviors in OpenHands, including -language preferences, notification settings, custom Git author configuration and more. +### class FileOperationResult -## Setting Maximum Budget Per Conversation +Bases: `BaseModel` -To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) -in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but -you can choose to continue the conversation with a prompt. +Result of a file upload or download operation. -## Git Author Settings -OpenHands provides the ability to customize the Git author information used when making commits and creating -pull requests on your behalf. +#### Properties -By default, OpenHands uses the following Git author information for all commits and pull requests: +- `destination_path`: str +- `error`: str | None +- `file_size`: int | None +- `source_path`: str +- `success`: bool -- **Username**: `openhands` -- **Email**: `openhands@all-hands.dev` +#### Methods -To override the defaults: +#### model_config = (configuration object) -1. Navigate to the `Settings > Application` page. -2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. -3. Click `Save Changes` +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - - When you configure a custom Git author, OpenHands will use your specified username and email as the primary author - for commits and pull requests. OpenHands will remain as a co-author. - +### class LocalWorkspace -### Integrations Settings -Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md +Bases: [`BaseWorkspace`](#class-baseworkspace) -## Overview +Local workspace implementation that operates on the host filesystem. -OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some -integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether -you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or -[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). +LocalWorkspace provides direct access to the local filesystem and command execution +environment. It’s suitable for development and testing scenarios where the agent +should operate directly on the host system. -## OpenHands Cloud Integrations Settings +#### Example - - These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). - +```pycon +>>> workspace = LocalWorkspace(working_dir="/path/to/project") +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` -### GitHub Settings +#### Methods -- `Configure GitHub Repositories` - Allows you to -[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. +#### __init__() -### Slack Settings +Create a new model by parsing and validating input data from keyword arguments. -- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in - your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. +Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be +validated to form a valid model. -## Running on Your Own Integrations Settings +self is explicitly positional-only to allow self as a field name. - - These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). - +#### execute_command() -### Version Control Integrations +Execute a bash command locally. -#### GitHub Setup +Uses the shared shell execution utility to run commands with proper +timeout handling, output streaming, and error management. -OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, command, and + : timeout_occurred +* Return type: + [CommandResult](#class-commandresult) - - +#### file_download() - 1. **Generate a Personal Access Token (PAT)**: - - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. - - **Tokens (classic)** - - Required scopes: - - `repo` (Full control of private repositories) - - **Fine-grained tokens** - - All Repositories (You can select specific repositories, but this will impact what returns in repo search) - - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `GitHub Token` field. - - Click `Save Changes` to apply the changes. +Download (copy) a file locally. - If you're working with organizational repositories, additional setup may be required: +For local systems, file download is implemented as a file copy operation +using shutil.copy2 to preserve metadata. - 1. **Check organization requirements**: - - Organization admins may enforce specific token policies. - - Some organizations require tokens to be created with SSO enabled. - - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). - 2. **Verify organization access**: - - Go to your token settings on GitHub. - - Look for the organization under `Organization access`. - - If required, click `Enable SSO` next to your organization. - - Complete the SSO authorization process. - +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. - - Try regenerating the token. +#### file_upload() - - **Organization Access Denied**: - - Check if SSO is required but not enabled. - - Verify organization membership. - - Contact organization admin if token policies are blocking access. - - +Upload (copy) a file locally. -#### GitLab Setup +For local systems, file upload is implemented as a file copy operation +using shutil.copy2 to preserve metadata. -OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) - - - 1. **Generate a Personal Access Token (PAT)**: - - On GitLab, go to `User Settings > Access Tokens`. - - Create a new token with the following scopes: - - `api` (API access) - - `read_user` (Read user information) - - `read_repository` (Read repository) - - `write_repository` (Write repository) - - Set an expiration date or leave it blank for a non-expiring token. - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `GitLab Token` field. - - Click `Save Changes` to apply the changes. +#### git_changes() - 3. **(Optional): Restrict agent permissions** - - Create another PAT using Step 1 and exclude `api` scope . - - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. - - OpenHands will use the higher scope token, and the agent will use the lower scope token. - +Get the git changes for the repository at the path given. - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed - - **Access Denied**: - - Verify project access permissions. - - Check if the token has the necessary scopes. - - For group/organization repositories, ensure you have proper access. - - +#### git_diff() -#### BitBucket Setup - - -1. **Generate an App password**: - - On Bitbucket, go to `Account Settings > App Password`. - - Create a new password with the following scopes: - - `account`: `read` - - `repository: write` - - `pull requests: write` - - `issues: write` - - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. - 2. **Enter token in OpenHands**: - - Navigate to the `Settings > Integrations` page. - - Paste your token in the `BitBucket Token` field. - - Click `Save Changes` to apply the changes. - +Get the git diff for the file at the path given. - - - **Token Not Recognized**: - - Check that the token hasn't expired. - - Verify the token has the required scopes. - +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed - +#### model_config = (configuration object) -### Language Model (LLM) Settings -Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings.md +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -## Overview +#### pause() -The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is -supported by litellm, but it requires a powerful model to work properly. -[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some -additional LLM settings on this page. +Pause the workspace (no-op for local workspaces). -## Basic LLM Settings +Local workspaces have nothing to pause since they operate directly +on the host filesystem. -The most popular providers and models are available in the basic settings. Some of the providers have been verified to -work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and -Mistral AI. +#### resume() -1. Choose your preferred provider using the `LLM Provider` dropdown. -2. Choose your favorite model using the `LLM Model` dropdown. -3. Set the `API Key` for your chosen provider and model and click `Save Changes`. +Resume the workspace (no-op for local workspaces). -This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first -restart older conversations. +Local workspaces have nothing to resume since they operate directly +on the host filesystem. -## Advanced LLM Settings +### class RemoteWorkspace -Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use -this when your preferred provider or model does not exist in the basic settings dropdowns. +Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) -1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the - custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have - [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). -2. `Base URL`: If your provider has a specific base URL, specify it here. -3. `API Key`: Set the API key for your custom model. -4. Click `Save Changes` +Remote workspace implementation that connects to an OpenHands agent server. -### Memory Condensation +RemoteWorkspace provides access to a sandboxed environment running on a remote +OpenHands agent server. This is the recommended approach for production deployments +as it provides better isolation and security. -The memory condenser manages the language model's context by ensuring only the most important and relevant information -is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running -conversations. +#### Example -- `Enable memory condensation` - Turn on this setting to activate this feature. -- `Memory condenser max history size` - The condenser will summarize the history after this many events. +```pycon +>>> workspace = RemoteWorkspace( +... host="https://agent-server.example.com", +... working_dir="/workspace" +... ) +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` -### Model Context Protocol (MCP) -Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md -## Overview +#### Properties -Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These -servers can provide additional functionality to the agent, such as specialized data processing, external API access, -or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). +- `alive`: bool + Check if the remote workspace is alive by querying the health endpoint. + * Returns: + True if the health endpoint returns a successful response, False otherwise. +- `client`: Client -## Supported MCPs +#### Methods -OpenHands supports the following MCP transport protocols: +#### execute_command() -* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) -* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) -* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) +Execute a bash command on the remote system. -## How MCP Works +This method starts a bash command via the remote agent server API, +then polls for the output until the command completes. -When OpenHands starts, it: +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, and other metadata +* Return type: + [CommandResult](#class-commandresult) -1. Reads the MCP configuration. -2. Connects to any configured SSE and SHTTP servers. -3. Starts any configured stdio servers. -4. Registers the tools provided by these servers with the agent. +#### file_download() -The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: +Download a file from the remote system. -1. OpenHands routes the call to the appropriate MCP server. -2. The server processes the request and returns a response. -3. OpenHands converts the response to an observation and presents it to the agent. +Requests the file from the remote system via HTTP API and saves it locally. -## Configuration +* Parameters: + * `source_path` – Path to the source file on remote system + * `destination_path` – Path where the file should be saved locally +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) -MCP configuration can be defined in: -* The OpenHands UI in the `Settings > MCP` page. -* The `config.toml` file under the `[mcp]` section if not using the UI. +#### file_upload() -### Configuration Options +Upload a file to the remote system. - - - SSE servers are configured using either a string URL or an object with the following properties: +Reads the local file and sends it to the remote system via HTTP API. - - `url` (required) - - Type: `str` - - Description: The URL of the SSE server. +* Parameters: + * `source_path` – Path to the local source file + * `destination_path` – Path where the file should be uploaded on remote system +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) - - `api_key` (optional) - - Type: `str` - - Description: API key for authentication. - - - SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: +#### git_changes() - - `url` (required) - - Type: `str` - - Description: The URL of the SHTTP server. +Get the git changes for the repository at the path given. - - `api_key` (optional) - - Type: `str` - - Description: API key for authentication. +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed - - `timeout` (optional) - - Type: `int` - - Default: `60` - - Range: `1-3600` seconds (1 hour maximum) - - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. - - **Use Cases:** - - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. - - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. - - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. - - This timeout only applies to individual tool calls, not server connection establishment. - - - - - While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for - better reliability and performance. - +#### git_diff() - Stdio servers are configured using an object with the following properties: +Get the git diff for the file at the path given. - - `name` (required) - - Type: `str` - - Description: A unique name for the server. +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed - - `command` (required) - - Type: `str` - - Description: The command to run the server. +#### model_config = (configuration object) - - `args` (optional) - - Type: `list of str` - - Default: `[]` - - Description: Command-line arguments to pass to the server. +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - - `env` (optional) - - Type: `dict of str to str` - - Default: `{}` - - Description: Environment variables to set for the server process. - - +#### model_post_init() -#### When to Use Direct Stdio +Override this method to perform additional initialization after __init__ and model_construct. +This is useful if you want to do some validation that requires the entire model to be initialized. -Direct stdio connections may still be appropriate in these scenarios: -- **Development and testing**: Quick prototyping of MCP servers. -- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. -- **Local-only environments**: When you don't want to manage additional proxy processes. +#### reset_client() -### Configuration Examples +Reset the HTTP client to force re-initialization. - - - For stdio-based MCP servers, we recommend using MCP proxy tools like - [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. - [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to - HTTP/SSE endpoints. +This is useful when connection parameters (host, api_key) have changed +and the client needs to be recreated with new values. - Start the proxy servers separately: - ```bash - # Terminal 1: Filesystem server proxy - supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 +### class Workspace - # Terminal 2: Fetch server proxy - supergateway --stdio "uvx mcp-server-fetch" --port 8081 - ``` +### class Workspace - Then configure OpenHands to use the HTTP endpoint: +Bases: `object` - ```toml - [mcp] - # SSE Servers - Recommended approach using proxy tools - sse_servers = [ - # Basic SSE server with just a URL - "http://example.com:8080/mcp", +Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. - # SuperGateway proxy for fetch server - "http://localhost:8081/sse", +Usage: +: - Workspace(working_dir=…) -> LocalWorkspace + - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace - # External MCP service with authentication - {url="https://api.example.com/mcp/sse", api_key="your-api-key"} - ] +### Agent +Source: https://docs.openhands.dev/sdk/arch/agent.md - # SHTTP Servers - Modern streamable HTTP transport (recommended) - shttp_servers = [ - # Basic SHTTP server with default 60s timeout - "https://api.example.com/mcp/shttp", - - # Server with custom timeout for heavy operations - { - url = "https://files.example.com/mcp/shttp", - api_key = "your-api-key", - timeout = 1800 # 30 minutes for large file processing - } - ] - ``` - - - - This setup is not Recommended for production. - - ```toml - [mcp] - # Direct stdio servers - use only for development/testing - stdio_servers = [ - # Basic stdio server - {name="fetch", command="uvx", args=["mcp-server-fetch"]}, - - # Stdio server with environment variables - { - name="filesystem", - command="npx", - args=["@modelcontextprotocol/server-filesystem", "/"], - env={ - "DEBUG": "true" - } - } - ] - ``` - - For production use, we recommend using proxy tools like SuperGateway. - - +The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. -Other options include: +**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) -- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. -- **Docker-based proxies**: Containerized solutions for better isolation. -- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. +## Core Responsibilities -### Secrets Management -Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md +The Agent system has four primary responsibilities: -## Overview +1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history +2. **Tool Orchestration** - Select and execute tools, handle results and errors +3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) +4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) -OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be -accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment -variables in the agent's runtime environment. +## Architecture -## Accessing the Secrets Manager +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +flowchart TB + subgraph Input[" "] + Events["Event History"] + Context["Agent Context
Skills + Prompts"] + end + + subgraph Core["Agent Core"] + Condense["Condenser
History compression"] + Reason["LLM Query
Generate actions"] + Security["Security Analyzer
Risk assessment"] + end + + subgraph Execution[" "] + Tools["Tool Executor
Action → Observation"] + Results["Observation Events"] + end + + Events --> Condense + Context -.->|Skills| Reason + Condense --> Reason + Reason --> Security + Security --> Tools + Tools --> Results + Results -.->|Feedback| Events + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Reason primary + class Condense,Security secondary + class Tools tertiary +``` -Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. +### Key Components -## Adding a New Secret -1. Click `Add a new secret`. -2. Fill in the following fields: - - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. - - **Value**: The sensitive information you want to store. - - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. -3. Click `Add secret` to save. +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | +| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | +| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | +| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | +| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | -## Editing a Secret +## Reasoning-Action Loop -1. Click the `Edit` button next to the secret you want to modify. -2. You can update the name and description of the secret. - - For security reasons, you cannot view or edit the value of an existing secret. If you need to change the - value, delete the secret and create a new one. - +The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: -## Deleting a Secret +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% +flowchart TB + Start["step() called"] + Pending{"Pending
actions?"} + ExecutePending["Execute pending actions"] + + HasCondenser{"Has
condenser?"} + Condense["Call condenser.condense()"] + CondenseResult{"Result
type?"} + EmitCondensation["Emit Condensation event"] + UseView["Use View events"] + UseRaw["Use raw events"] + + Query["Query LLM with messages"] + ContextExceeded{"Context
window
exceeded?"} + EmitRequest["Emit CondensationRequest"] + + Parse{"Response
type?"} + CreateActions["Create ActionEvents"] + CreateMessage["Create MessageEvent"] + + Confirmation{"Need
confirmation?"} + SetWaiting["Set WAITING_FOR_CONFIRMATION"] + + Execute["Execute actions"] + Observe["Create ObservationEvents"] + + Return["Return"] + + Start --> Pending + Pending -->|Yes| ExecutePending --> Return + Pending -->|No| HasCondenser + + HasCondenser -->|Yes| Condense + HasCondenser -->|No| UseRaw + Condense --> CondenseResult + CondenseResult -->|Condensation| EmitCondensation --> Return + CondenseResult -->|View| UseView --> Query + UseRaw --> Query + + Query --> ContextExceeded + ContextExceeded -->|Yes| EmitRequest --> Return + ContextExceeded -->|No| Parse + + Parse -->|Tool calls| CreateActions + Parse -->|Message| CreateMessage --> Return + + CreateActions --> Confirmation + Confirmation -->|Yes| SetWaiting --> Return + Confirmation -->|No| Execute + + Execute --> Observe + Observe --> Return + + style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -1. Click the `Delete` button next to the secret you want to remove. -2. Select `Confirm` to delete the secret. +**Step Execution Flow:** -## Using Secrets in the Agent - - All custom secrets are automatically exported as environment variables in the agent's runtime environment. - - You can access them in your code using standard environment variable access methods. For example, if you create a - secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or - `os.environ['OPENAI_API_KEY']` in Python. +1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return +2. **Condensation:** If condenser exists: + - Call `condenser.condense()` with current event view + - If returns `View`: use condensed events for LLM query (continue in same step) + - If returns `Condensation`: emit event and return (will be processed next step) +3. **LLM Query:** Query LLM with messages from event history + - If context window exceeded: emit `CondensationRequest` and return +4. **Response Parsing:** Parse LLM response into events + - Tool calls → create `ActionEvent`(s) + - Text message → create `MessageEvent` and return +5. **Confirmation Check:** If actions need user approval: + - Set conversation status to `WAITING_FOR_CONFIRMATION` and return +6. **Action Execution:** Execute tools and create `ObservationEvent`(s) -### Prompting Best Practices -Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md +**Key Characteristics:** +- **Stateless:** Agent holds no mutable state between steps +- **Event-Driven:** Reads from event history, writes new events +- **Interruptible:** Each step is atomic and can be paused/resumed -## Characteristics of Good Prompts +## Agent Context -Good prompts are: +The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: -- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. -- **Location-specific**: Specify the locations in the codebase that should be modified, if known. -- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Context["AgentContext"] + + subgraph Skills["Skills"] + Repo["repo
Always active"] + Knowledge["knowledge
Trigger-based"] + end + SystemAug["System prompt prefix/suffix
Per-conversation"] + System["Prompt template
Per-conversation"] + + subgraph Application["Applied to LLM"] + SysPrompt["System Prompt"] + UserMsg["User Messages"] + end + + Context --> Skills + Context --> SystemAug + Repo --> SysPrompt + Knowledge -.->|When triggered| UserMsg + System --> SysPrompt + SystemAug --> SysPrompt + + style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -## Examples +| Skill Type | Activation | Use Case | +|------------|------------|----------| +| **repo** | Always included | Project-specific context, conventions | +| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | -### Good Prompt Examples +Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. -- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. -- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. -- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. -### Bad Prompt Examples +## Tool Execution -- Make the code better. (Too vague, not concrete) -- Rewrite the entire backend to use a different framework. (Not appropriately scoped) -- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) +Tools follow a **strict action-observation pattern**: -## Tips for Effective Prompting - -- Be as specific as possible about the desired outcome or the problem to be solved. -- Provide context, including relevant file paths and line numbers if available. -- Break large tasks into smaller, manageable prompts. -- Include relevant error messages or logs. -- Specify the programming language or framework, if not obvious. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + LLM["LLM generates tool_call"] + Convert["Convert to ActionEvent"] + + Decision{"Confirmation
mode?"} + Defer["Store as pending"] + + Execute["Execute tool"] + Success{"Success?"} + + Obs["ObservationEvent
with result"] + Error["ObservationEvent
with error"] + + LLM --> Convert + Convert --> Decision + + Decision -->|Yes| Defer + Decision -->|No| Execute + + Execute --> Success + Success -->|Yes| Obs + Success -->|No| Error + + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -The more precise and informative your prompt, the better OpenHands can assist you. +**Execution Modes:** -See [First Projects](/overview/first-projects) for more examples of helpful prompts. +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | -### Troubleshooting -Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md +**Security Integration:** - -OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. - +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation -### Launch docker client failed +## Component Relationships -**Description** +### How Agent Interacts -When running OpenHands, the following error is seen: -``` -Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Resolution** +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries -Try these in order: -* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. -* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. -* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. -* Reinstall Docker Desktop. -### Permission Error +## See Also -**Description** +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction -On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. +### Agent Server Package +Source: https://docs.openhands.dev/sdk/arch/agent-server.md -**Resolution** +The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. -* Check if the `~/.openhands` is owned by `root`. If so, you can: - * Change the directory's ownership: `sudo chown : ~/.openhands`. - * or update permissions on the directory: `sudo chmod 777 ~/.openhands` - * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. -* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running - OpenHands. +**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) -### On Linux, Getting ConnectTimeout Error +## Purpose -**Description** +The Agent Server enables: +- **Remote execution**: Clients interact with agents via HTTP API +- **Multi-user isolation**: Each user gets isolated workspace +- **Container orchestration**: Manages Docker containers for workspaces +- **Centralized management**: Monitor and control all agents +- **Scalability**: Horizontal scaling with multiple servers -When running on Linux, you might run into the error `ERROR:root:: timed out`. +## Architecture Overview -**Resolution** +```mermaid +graph TB + Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] + + API --> Auth[Authentication] + API --> Router[API Router] + + Router --> WS[Workspace Manager] + Router --> Conv[Conversation Handler] + + WS --> Docker[Docker Manager] + Docker --> C1[Container 1
User A] + Docker --> C2[Container 2
User B] + Docker --> C3[Container 3
User C] + + Conv --> Agent[Software Agent SDK] + Agent --> C1 + Agent --> C2 + Agent --> C3 + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style WS fill:#e8f5e8 + style Docker fill:#f3e5f5 + style Agent fill:#fce4ec +``` -If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that -these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker -[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. +### Key Components -If that does not solve the issue, try incrementally adding the following parameters to the docker run command: -* `--network host` -* `-e SANDBOX_USE_HOST_NETWORK=true` -* `-e DOCKER_HOST_ADDR=127.0.0.1` +**1. FastAPI Server** +- HTTP REST API endpoints +- Authentication and authorization +- Request validation +- WebSocket support for streaming -### Internal Server Error. Ports are not available +**2. Workspace Manager** +- Creates and manages Docker containers +- Isolates workspaces per user +- Handles container lifecycle +- Manages resource limits -**Description** +**3. Conversation Handler** +- Routes requests to appropriate workspace +- Manages conversation state +- Handles concurrent requests +- Supports streaming responses -When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP -...: bind: An attempt was made to access a socket in a -way forbidden by its access permissions.")` is encountered. +**4. Docker Manager** +- Interfaces with Docker daemon +- Builds and pulls images +- Creates and destroys containers +- Monitors container health -**Resolution** +## Design Decisions -* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: -``` -Restart-Service -Name "winnat" -``` +### Why HTTP API? -### Unable to access VS Code tab via local IP +Alternative approaches considered: +- **gRPC**: More efficient but harder for web clients +- **WebSockets only**: Good for streaming but not RESTful +- **HTTP + WebSockets**: Best of both worlds -**Description** +**Decision**: HTTP REST for operations, WebSockets for streaming +- ✅ Works from any client (web, mobile, CLI) +- ✅ Easy to debug (curl, Postman) +- ✅ Standard authentication (API keys, OAuth) +- ✅ Streaming where needed -When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" -error, while other parts of the UI work fine. +### Why Container Per User? -**Resolution** +Alternative approaches: +- **Shared container**: Multiple users in one container +- **Container per session**: New container each conversation +- **Container per user**: One container per user (chosen) -This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. -To fix this: +**Decision**: Container per user +- ✅ Strong isolation between users +- ✅ Persistent workspace across sessions +- ✅ Better resource management +- ⚠️ More containers, but worth it for isolation -1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: - ```bash - docker run -it --rm \ - -e SANDBOX_VSCODE_PORT=41234 \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/.openhands \ - -p 3000:3000 \ - -p 41234:41234 \ - --add-host host.docker.internal:host-gateway \ - --name openhands-app \ - docker.openhands.dev/openhands/openhands:latest - ``` +### Why FastAPI? - > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. +Alternative frameworks: +- **Flask**: Simpler but less type-safe +- **Django**: Too heavyweight +- **FastAPI**: Modern, fast, type-safe (chosen) -2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. -3. If running with the development workflow, you can set this in your `config.toml` file: - ```toml - [sandbox] - vscode_port = 41234 - ``` +**Decision**: FastAPI +- ✅ Automatic API documentation (OpenAPI) +- ✅ Type validation with Pydantic +- ✅ Async support for performance +- ✅ WebSocket support built-in -### GitHub Organization Rename Issues +## API Design -**Description** +### Key Endpoints -After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. +**Workspace Management** +``` +POST /workspaces Create new workspace +GET /workspaces/{id} Get workspace info +DELETE /workspaces/{id} Delete workspace +POST /workspaces/{id}/execute Execute command +``` -**Resolution** - -* Update your git remote URL: - ```bash - # Check current remote - git remote get-url origin - - # Update SSH remote - git remote set-url origin git@github.com:OpenHands/OpenHands.git - - # Or update HTTPS remote - git remote set-url origin https://github.com/OpenHands/OpenHands.git - ``` -* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` -* Find and update any hardcoded references: - ```bash - git grep -i "all-hands-ai" - git grep -i "ghcr.io/all-hands-ai" - ``` +**Conversation Management** +``` +POST /conversations Create conversation +GET /conversations/{id} Get conversation +POST /conversations/{id}/messages Send message +GET /conversations/{id}/stream Stream responses (WebSocket) +``` -### COBOL Modernization -Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md +**Health & Monitoring** +``` +GET /health Server health check +GET /metrics Prometheus metrics +``` -Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. +### Authentication - -This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). - +**API Key Authentication** +```bash +curl -H "Authorization: Bearer YOUR_API_KEY" \ + https://agent-server.example.com/conversations +``` -## The COBOL Modernization Challenge +**Per-user workspace isolation** +- API key → user ID mapping +- Each user gets separate workspace +- Users can't access each other's workspaces -[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. +### Streaming Responses -The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. +**WebSocket for real-time updates** +```python +async with websocket_connect(url) as ws: + # Send message + await ws.send_json({"message": "Hello"}) + + # Receive events + async for event in ws: + if event["type"] == "message": + print(event["content"]) +``` -## Overview +**Why streaming?** +- Real-time feedback to users +- Show agent thinking process +- Better UX for long-running tasks -COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. +## Deployment Models -OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: +### 1. Local Development -1. **Understanding**: Analyze and document existing COBOL code -2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# -3. **Validation**: Ensure the modernized code behaves identically to the original +Run server locally for testing: +```bash +# Start server +openhands-agent-server --port 8000 -In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. +# Or with Docker +docker run -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` -## Understanding +**Use case**: Development and testing -A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. +### 2. Single-Server Deployment -Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: +Deploy on one server (VPS, EC2, etc.): +```bash +# Install +pip install openhands-agent-server +# Run with systemd/supervisor +openhands-agent-server \ + --host 0.0.0.0 \ + --port 8000 \ + --workers 4 ``` -For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. -Save the results in `business_functions.json` in the following format: +**Use case**: Small deployments, prototypes, MVPs -{ - ..., - "COBIL00C.cbl": { - "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", - "references": [ - "docs/billing.md#bill-payment", - "docs/transactions.md#transaction-action" - ], - }, - ... -} +### 3. Multi-Server Deployment + +Scale horizontally with load balancer: +``` + Load Balancer + | + +-------------+-------------+ + | | | + Server 1 Server 2 Server 3 + (Agents) (Agents) (Agents) + | | | + +-------------+-------------+ + | + Shared State Store + (Database, Redis, etc.) ``` -OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. +**Use case**: Production SaaS, high traffic, need redundancy -## Translation +### 4. Kubernetes Deployment -With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. +Container orchestration with Kubernetes: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + template: + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 +``` -One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. +**Use case**: Enterprise deployments, auto-scaling, high availability -An example prompt is below: +## Resource Management -``` -Convert the COBOL files in `/src` to Java in `/src/java`. +### Container Limits -Requirements: -1. Create a Java class for each COBOL program -2. Preserve the business logic and data structures (see `business_functions.json`) -3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) -4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) -5. Implement proper error handling with try-catch blocks -6. Add JavaDoc comments explaining the purpose of each class and method -7. In JavaDoc comments, include traceability to the original COBOL source using - the format: @source : (e.g., @source CBACT01C.cbl:73-77) -8. Create a clean, maintainable object-oriented design -9. Each Java file should be compilable and follow Java best practices +Set per-workspace resource limits: +```python +# In server configuration +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "2g", # 2GB RAM + "cpus": "2", # 2 CPU cores + "disk": "10g" # 10GB disk + }, + "timeout": 300, # 5 min timeout +} ``` -Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. - -## Validation +**Why limit resources?** +- Prevent one user from consuming all resources +- Fair usage across users +- Protect server from runaway processes +- Cost control -Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. +### Cleanup & Garbage Collection -Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. +**Container lifecycle**: +- Containers created on first use +- Kept alive between requests (warm) +- Cleaned up after inactivity timeout +- Force cleanup on server shutdown -## Scaling Up +**Storage management**: +- Old workspaces deleted automatically +- Disk usage monitored +- Alerts when approaching limits -The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. +## Security Considerations -The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): +### Multi-Tenant Isolation -```python -while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - # Migrating agent converts COBOL to Java - migration_conversation.send_message(migration_prompt) - migration_conversation.run() - - # Critiquing agent evaluates the conversion - critique_conversation.send_message(critique_prompt) - critique_conversation.run() - - # Parse the score and decide whether to continue - current_score = parse_critique_score(critique_file) -``` +**Container isolation**: +- Each user gets separate container +- Containers can't communicate +- Network isolation (optional) +- File system isolation -By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: +**API isolation**: +- API keys mapped to users +- Users can only access their workspaces +- Server validates all permissions -``` -Evaluate the quality of the COBOL to Java migration in `/src`. +### Input Validation -For each Java file, assess using the following criteria: -1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? -2. Code Quality: Is the code clean, readable, and following Java 17 conventions? -3. Completeness: Are all COBOL features properly converted? -4. Best Practices: Does it use proper OOP, error handling, and documentation? +**Server validates**: +- API request schemas +- Command injection attempts +- Path traversal attempts +- File size limits -For each instance of a criteria not met, deduct a point. +**Defense in depth**: +- API validation +- Container validation +- Docker security features +- OS-level security -Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. +### Network Security -Save the results in `critique.json` in the following format: +**Best practices**: +- HTTPS only (TLS certificates) +- Firewall rules (only port 443/8000) +- Rate limiting +- DDoS protection -{ - "total_score": -12, - "files": [ - { - "cobol": "COBIL00C.cbl", - "java": "bill_payment.java", - "scores": { - "correctness": 0, - "code_quality": 0, - "completeness": -1, - "best_practices": -2 - }, - "feedback": [ - "Rename single-letter variables to meaningful names.", - "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", - ], - }, - ... - ] +**Container networking**: +```python +# Disable network for workspace +WORKSPACE_CONFIG = { + "network_mode": "none" # No network access } -``` - -In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. -This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. +# Or allow specific hosts +WORKSPACE_CONFIG = { + "allowed_hosts": ["api.example.com"] +} +``` -## Try It Yourself +## Monitoring & Observability -The full iterative refinement example is available in the OpenHands SDK: +### Health Checks ```bash -export LLM_API_KEY="your-api-key" -cd software-agent-sdk -uv run python examples/01_standalone_sdk/31_iterative_refinement.py -``` +# Simple health check +curl https://agent-server.example.com/health -For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. +# Response +{ + "status": "healthy", + "docker": "connected", + "workspaces": 15, + "uptime": 86400 +} +``` +### Metrics -## Related Resources +**Prometheus metrics**: +- Request count and latency +- Active workspaces +- Container resource usage +- Error rates -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +**Logging**: +- Structured JSON logs +- Per-request tracing +- Workspace events +- Error tracking -### Automated Code Review -Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review.md +### Alerting -Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. +**Alert on**: +- Server down +- High error rate +- Resource exhaustion +- Container failures -## Overview +## Client SDK -The OpenHands PR Review workflow is a GitHub Actions workflow that: +Python SDK for interacting with Agent Server: -- **Triggers automatically** when PRs are opened or when you request a review -- **Analyzes code changes** in the context of your entire repository -- **Posts inline comments** directly on specific lines of code in the PR -- **Provides fast feedback** - typically within 2-3 minutes +```python +from openhands.client import AgentServerClient -## How It Works +client = AgentServerClient( + url="https://agent-server.example.com", + api_key="your-api-key" +) -The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: +# Create conversation +conversation = client.create_conversation() -1. **Trigger**: The workflow runs when: - - A new non-draft PR is opened - - A draft PR is marked as ready for review - - The `review-this` label is added to a PR - - `openhands-agent` is requested as a reviewer +# Send message +response = client.send_message( + conversation_id=conversation.id, + message="Hello, agent!" +) -2. **Analysis**: The agent receives the complete PR diff and uses two skills: - - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices - - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API +# Stream responses +for event in client.stream_conversation(conversation.id): + if event.type == "message": + print(event.content) +``` -3. **Output**: Review comments are posted directly on the PR with: - - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) - - Specific line references - - Actionable suggestions with code examples +**Client handles**: +- Authentication +- Request/response serialization +- Error handling +- Streaming +- Retries -### Review Styles +## Cost Considerations -Choose between two review styles: +### Server Costs -| Style | Description | Best For | -|-------|-------------|----------| -| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | -| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | +**Compute**: CPU and memory for containers +- Each active workspace = 1 container +- Typically 1-2 GB RAM per workspace +- 0.5-1 CPU core per workspace -## Quick Start +**Storage**: Workspace files and conversation state +- ~1-10 GB per workspace (depends on usage) +- Conversation history in database - - - Create `.github/workflows/pr-review-by-openhands.yml` in your repository: +**Network**: API requests and responses +- Minimal (mostly text) +- Streaming adds bandwidth - ```yaml - name: PR Review by OpenHands +### Cost Optimization - on: - pull_request_target: - types: [opened, ready_for_review, labeled, review_requested] +**1. Idle timeout**: Shutdown containers after inactivity +```python +WORKSPACE_CONFIG = { + "idle_timeout": 3600 # 1 hour +} +``` - permissions: - contents: read - pull-requests: write - issues: write +**2. Resource limits**: Don't over-provision +```python +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "1g", # Smaller limit + "cpus": "0.5" # Fractional CPU + } +} +``` - jobs: - pr-review: - if: | - (github.event.action == 'opened' && github.event.pull_request.draft == false) || - github.event.action == 'ready_for_review' || - github.event.label.name == 'review-this' || - github.event.requested_reviewer.login == 'openhands-agent' - runs-on: ubuntu-latest - steps: - - name: Run PR Review - uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main - with: - llm-model: anthropic/claude-sonnet-4-5-20250929 - review-style: standard - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} - ``` - +**3. Shared resources**: Use single server for multiple low-traffic apps - - Go to your repository's **Settings → Secrets and variables → Actions** and add: - - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) - +**4. Auto-scaling**: Scale servers based on demand - - Create a `review-this` label in your repository: - 1. Go to **Issues → Labels** - 2. Click **New label** - 3. Name: `review-this` - 4. Description: `Trigger OpenHands PR review` - +## When to Use Agent Server - - Open a PR and either: - - Add the `review-this` label, OR - - Request `openhands-agent` as a reviewer - - +### Use Agent Server When: -## Composite Action +✅ **Multi-user system**: Web app with many users +✅ **Remote clients**: Mobile app, web frontend +✅ **Centralized management**: Need to monitor all agents +✅ **Workspace isolation**: Users shouldn't interfere +✅ **SaaS product**: Building agent-as-a-service +✅ **Scaling**: Need to handle concurrent users -The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: +**Examples**: +- Chatbot platforms +- Code assistant web apps +- Agent marketplaces +- Enterprise agent deployments -- Checking out the SDK at the specified version -- Setting up Python and dependencies -- Running the PR review agent -- Uploading logs as artifacts +### Use Standalone SDK When: -### Action Inputs +✅ **Single-user**: Personal tool or script +✅ **Local execution**: Running on your machine +✅ **Full control**: Need programmatic access +✅ **Simpler deployment**: No server management +✅ **Lower latency**: No network overhead -| Input | Description | Required | Default | -|-------|-------------|----------|---------| -| `llm-model` | LLM model to use | Yes | - | -| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | -| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | -| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | -| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | -| `llm-api-key` | LLM API key | Yes | - | -| `github-token` | GitHub token for API access | Yes | - | +**Examples**: +- CLI tools +- Automation scripts +- Local development +- Desktop applications - -Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. - +### Hybrid Approach -## Customization +Use SDK locally but RemoteAPIWorkspace for execution: +- Agent logic in your Python code +- Execution happens on remote server +- Best of both worlds -### Repository-Specific Review Guidelines +## Building Custom Agent Server -Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: +The server is extensible for custom needs: -```markdown ---- -name: code-review -description: Custom code review guidelines for this repository -triggers: -- /codereview ---- +**Custom authentication**: +```python +from openhands.agent_server import AgentServer -# Repository Code Review Guidelines +class CustomAgentServer(AgentServer): + async def authenticate(self, request): + # Custom auth logic + return await oauth_verify(request) +``` -You are reviewing code for [Your Project Name]. Follow these guidelines: +**Custom workspace configuration**: +```python +server = AgentServer( + workspace_factory=lambda user: DockerWorkspace( + image=f"custom-image-{user.tier}", + resource_limits=user.resource_limits + ) +) +``` -## Review Decisions +**Custom middleware**: +```python +@server.middleware +async def logging_middleware(request, call_next): + # Custom logging + response = await call_next(request) + return response +``` -### When to APPROVE -- Configuration changes following existing patterns -- Documentation-only changes -- Test-only changes without production code changes -- Simple additions following established conventions +## Next Steps -### When to COMMENT -- Issues that need attention (bugs, security concerns) -- Suggestions for improvement -- Questions about design decisions +### For Usage Examples -## Core Principles +- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API +- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options -1. **[Your Principle 1]**: Description -2. **[Your Principle 2]**: Description +### For Related Architecture -## What to Check +- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details +- [SDK Architecture](/sdk/arch/sdk) - Core framework +- [Architecture Overview](/sdk/arch/overview) - System design -- **[Category 1]**: What to look for -- **[Category 2]**: What to look for +### For Implementation Details -## Repository Conventions +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples -- Use [your linter] for style checking -- Follow [your style guide] -- Tests should be in [your test directory] -``` +### Condenser +Source: https://docs.openhands.dev/sdk/arch/condenser.md - -The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. - +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). -### Workflow Configuration +**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) -Customize the workflow by modifying the action inputs: +## Core Responsibilities -```yaml -- name: Run PR Review - uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main - with: - # Change the LLM model - llm-model: anthropic/claude-sonnet-4-5-20250929 - # Use a custom LLM endpoint - llm-base-url: https://your-llm-proxy.example.com - # Switch to "roasted" style for brutally honest reviews - review-style: roasted - # Pin to a specific SDK version for stability - sdk-version: main - # Secrets - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} -``` +The Condenser system has four primary responsibilities: -### Trigger Customization +1. **History Compression** - Reduce event lists to fit within context windows +2. **Threshold Detection** - Determine when condensation should trigger +3. **Summary Generation** - Create meaningful summaries via LLM or heuristics +4. **View Management** - Transform event history into LLM-ready views -Modify when reviews are triggered by editing the workflow conditions: +## Architecture -```yaml -# Only trigger on label (disable auto-review on PR open) -if: github.event.label.name == 'review-this' +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["CondenserBase
Abstract base"] + end + + subgraph Implementations["Concrete Implementations"] + NoOp["NoOpCondenser
No compression"] + LLM["LLMSummarizingCondenser
LLM-based"] + Pipeline["PipelineCondenser
Multi-stage"] + end + + subgraph Process["Condensation Process"] + View["View
Event history"] + Check["should_condense()?"] + Condense["get_condensation()"] + Result["View | Condensation"] + end + + subgraph Output["Condensation Output"] + CondEvent["Condensation Event
Summary metadata"] + NewView["Condensed View
Reduced tokens"] + end + + Base --> NoOp + Base --> LLM + Base --> Pipeline + + View --> Check + Check -->|Yes| Condense + Check -->|No| Result + Condense --> CondEvent + CondEvent --> NewView + NewView --> Result + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class LLM,Pipeline secondary + class Check,Condense tertiary +``` -# Only trigger when specific reviewer is requested -if: github.event.requested_reviewer.login == 'openhands-agent' +### Key Components -# Trigger on all PRs (including drafts) -if: | - github.event.action == 'opened' || - github.event.action == 'synchronize' -``` +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | +| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | +| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | +| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | +| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | +| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | -## Security Considerations +## Condenser Types -The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. +### NoOpCondenser - -**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. +Pass-through condenser that performs no compression: -To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. - +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["View"] + NoOp["NoOpCondenser"] + Same["Same View"] + + View --> NoOp --> Same + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -## Example Reviews +### LLMSummarizingCondenser -See real automated reviews in action on the OpenHands Software Agent SDK repository: +Uses an LLM to generate summaries of conversation history: -| PR | Description | Review Highlights | -|----|-------------|-------------------| -| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | -| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | -| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | -| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + View["Long View
120+ events"] + Check["Threshold
exceeded?"] + Summarize["LLM Summarization"] + Summary["Summary Text"] + Metadata["Condensation Event"] + AddToHistory["Add to History"] + NextStep["Next Step: View.from_events()"] + NewView["Condensed View"] + + View --> Check + Check -->|Yes| Summarize + Summarize --> Summary + Summary --> Metadata + Metadata --> AddToHistory + AddToHistory --> NextStep + NextStep --> NewView + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -## Troubleshooting +**Process:** +1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) +2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) +3. **LLM Call:** Generate summary of middle events using dedicated LLM +4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` +5. **Add to History:** Agent adds `Condensation` to event log and returns early +6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary - - - - Ensure the `LLM_API_KEY` secret is set correctly - - Check that the label name matches exactly (`review-this`) - - Verify the workflow file is in `.github/workflows/` - - Check the Actions tab for workflow run errors - - - - - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission - - Check the workflow logs for API errors - - Verify the PR is not from a fork with restricted permissions - - - - - Large PRs may take longer to analyze - - Consider splitting large PRs into smaller ones - - Check if the LLM API is experiencing delays - - +**Configuration:** +- **`max_size`:** Event count threshold before condensation triggers (default: 120) +- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) +- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) -## Related Resources +### PipelineCondenser -- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script -- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews -- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows -- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud -- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills - -### Dependency Upgrades -Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md +Chains multiple condensers in sequence: -Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["Original View"] + C1["Condenser 1"] + C2["Condenser 2"] + C3["Condenser 3"] + Final["Final View"] + + View --> C1 --> C2 --> C3 --> Final + + style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -## Overview +**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) -OpenHands helps with dependency management by: +## Condensation Flow -- **Analyzing dependencies**: Identifying outdated packages and their versions -- **Planning upgrades**: Creating upgrade strategies and migration guides -- **Implementing changes**: Updating code to handle breaking changes -- **Validating results**: Running tests and verifying functionality +### Trigger Mechanisms -## Dependency Analysis Examples +Condensers can be triggered in two ways: -### Identifying Outdated Dependencies +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Automatic["Automatic Trigger"] + Agent1["Agent Step"] + Build1["View.from_events()"] + Check1["condenser.condense(view)"] + Trigger1["should_condense()?"] + end + + Agent1 --> Build1 --> Check1 --> Trigger1 + + style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -Start by understanding your current dependency state: +**Automatic Trigger:** +- **When:** Threshold exceeded (e.g., event count > `max_size`) +- **Who:** Agent calls `condenser.condense()` each step +- **Purpose:** Proactively keep context within limits -``` -Analyze the dependencies in this project and create a report: -1. List all direct dependencies with current and latest versions -2. Identify dependencies more than 2 major versions behind -3. Flag any dependencies with known security vulnerabilities -4. Highlight dependencies that are deprecated or unmaintained -5. Prioritize which updates are most important +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Manual["Manual Trigger"] + Error["LLM Context Error"] + Request["CondensationRequest Event"] + NextStep["Next Agent Step"] + Trigger2["condense() detects request"] + end + + Error --> Request --> NextStep --> Trigger2 + + style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` +**Manual Trigger:** +- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) +- **Who:** Agent (on LLM context window error) or application code +- **Purpose:** Force compression when context limit exceeded -**Example output:** - -| Package | Current | Latest | Risk | Priority | -|---------|---------|--------|------|----------| -| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | -| react | 16.8.0 | 18.2.0 | Outdated | Medium | -| express | 4.17.1 | 4.18.2 | Minor update | Low | -| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | +### Condensation Workflow -### Security-Related Dependency Upgrades +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent calls condense(view)"] + + Decision{"should_condense?"} + + ReturnView["Return View
Agent proceeds"] + + Extract["Select Events to Keep/Forget"] + Generate["LLM Generates Summary"] + Create["Create Condensation Event"] + ReturnCond["Return Condensation"] + AddHistory["Agent adds to history"] + NextStep["Next Step: View.from_events()"] + FilterEvents["Filter forgotten events"] + InsertSummary["Insert summary at offset"] + NewView["New condensed view"] + + Start --> Decision + Decision -->|No| ReturnView + Decision -->|Yes| Extract + Extract --> Generate + Generate --> Create + Create --> ReturnCond + ReturnCond --> AddHistory + AddHistory --> NextStep + NextStep --> FilterEvents + FilterEvents --> InsertSummary + InsertSummary --> NewView + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: +**Key Steps:** -- Automating vulnerability detection and remediation -- Integrating with security scanners (Snyk, Dependabot, CodeQL) -- Building automated pipelines for security fixes -- Using OpenHands agents to create pull requests automatically +1. **Threshold Check:** `should_condense()` determines if condensation needed +2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) +3. **Summary Generation:** LLM creates compressed representation of forgotten events +4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary +5. **Return to Agent:** Condenser returns `Condensation` (not `View`) +6. **History Update:** Agent adds `Condensation` to event log and exits step +7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary -### Compatibility Checking +## View and Condensation -Check for compatibility issues before upgrading: +### View Structure -``` -Check compatibility for upgrading React from 16 to 18: +A `View` represents the conversation history as it will be sent to the LLM: -1. Review our codebase for deprecated React patterns -2. List all components using lifecycle methods -3. Identify usage of string refs or findDOMNode -4. Check third-party library compatibility with React 18 -5. Estimate the effort required for migration +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Full Event List
+ Condensation events"] + FromEvents["View.from_events()"] + Filter["Filter forgotten events"] + Insert["Insert summary"] + View["View
LLMConvertibleEvents"] + Convert["events_to_messages()"] + LLM["LLM Input"] + + Events --> FromEvents + FromEvents --> Filter + Filter --> Insert + Insert --> View + View --> Convert + Convert --> LLM + + style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -**Compatibility matrix:** - -| Dependency | React 16 | React 17 | React 18 | Action Needed | -|------------|----------|----------|----------|---------------| -| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | -| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | -| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | - -## Automated Upgrade Examples +**View Components:** +- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) +- **`unhandled_condensation_request`:** Flag for pending manual condensation +- **`condensations`:** List of all Condensation events processed +- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics -### Version Updates +### Condensation Event -Perform straightforward version updates: +When condensation occurs, a `Condensation` event is created: - - - ``` - Update all patch and minor versions in package.json: - - 1. Review each update for changelog notes - 2. Update package.json with new versions - 3. Update package-lock.json - 4. Run the test suite - 5. List any deprecation warnings - ``` - - - ``` - Update dependencies in requirements.txt: +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Old["Middle Events
~60 events"] + Summary["Summary Text
LLM-generated"] + Event["Condensation Event
forgotten_event_ids"] + Applied["View.from_events()"] + New["New View
~60 events + summary"] - 1. Check each package for updates - 2. Update requirements.txt with compatible versions - 3. Update requirements-dev.txt similarly - 4. Run tests and verify functionality - 5. Note any deprecation warnings - ``` -
- - ``` - Update dependencies in pom.xml: + Old -.->|Summarized| Summary + Summary --> Event + Event --> Applied + Applied --> New - 1. Check for newer versions of each dependency - 2. Update version numbers in pom.xml - 3. Run mvn dependency:tree to check conflicts - 4. Run the test suite - 5. Document any API changes encountered - ``` - -
- -### Breaking Change Handling - -When major versions introduce breaking changes: - + style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -Upgrade axios from v0.x to v1.x and handle breaking changes: -1. List all breaking changes in axios 1.0 changelog -2. Find all axios usages in our codebase -3. For each breaking change: - - Show current code - - Show updated code - - Explain the change -4. Create a git commit for each logical change -5. Verify all tests pass -``` +**Condensation Fields:** +- **`forgotten_event_ids`:** List of event IDs to filter out +- **`summary`:** Compressed text representation of forgotten events +- **`summary_offset`:** Index where summary event should be inserted +- Inherits from `Event`: `id`, `timestamp`, `source` -**Example transformation:** +## Rolling Window Pattern -```javascript -// Before (axios 0.x) -import axios from 'axios'; -axios.defaults.baseURL = 'https://api.example.com'; -const response = await axios.get('/users', { - cancelToken: source.token -}); +`RollingCondenser` implements a common pattern for threshold-based condensation: -// After (axios 1.x) -import axios from 'axios'; -axios.defaults.baseURL = 'https://api.example.com'; -const controller = new AbortController(); -const response = await axios.get('/users', { - signal: controller.signal -}); +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + View["Current View
120+ events"] + Check["Count Events"] + + Compare{"Count >
max_size?"} + + Keep["Keep All Events"] + + Split["Split Events"] + Head["Head
First 4 events"] + Middle["Middle
~56 events"] + Tail["Tail
~56 events"] + Summarize["LLM Summarizes Middle"] + Result["Head + Summary + Tail
~60 events total"] + + View --> Check + Check --> Compare + + Compare -->|Under| Keep + Compare -->|Over| Split + + Split --> Head + Split --> Middle + Split --> Tail + + Middle --> Summarize + Head --> Result + Summarize --> Result + Tail --> Result + + style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### Code Adaptation +**Rolling Window Strategy:** +1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts +2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context +3. **Summarize Middle:** Compress events between head and tail into summary +4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) -Adapt code to new API patterns: +## Component Relationships -``` -Migrate our codebase from moment.js to date-fns: +### How Condenser Integrates -1. List all moment.js usages in our code -2. Map moment methods to date-fns equivalents -3. Update imports throughout the codebase -4. Handle any edge cases where APIs differ -5. Remove moment.js from dependencies -6. Verify all date handling still works correctly +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Condenser["Condenser"] + State["Conversation State"] + Events["Event Log"] + + Agent -->|"View.from_events()"| State + State -->|View| Agent + Agent -->|"condense(view)"| Condenser + Condenser -->|"View | Condensation"| Agent + Agent -->|Adds Condensation| Events + + style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -**Migration map:** +**Relationship Characteristics:** +- **Agent → State**: Calls `View.from_events()` to get current view +- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered +- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) +- **Agent → Events**: Adds `Condensation` event to log when returned -| moment.js | date-fns | Notes | -|-----------|----------|-------| -| `moment()` | `new Date()` | Different return type | -| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | -| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | -| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | +## See Also -## Testing and Validation Examples +- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning +- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management +- **[Events](/sdk/arch/events)** - Condensation event type and append-only log +- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers -### Automated Test Execution +### Conversation +Source: https://docs.openhands.dev/sdk/arch/conversation.md -Run comprehensive tests after upgrades: +The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. -``` -After the dependency upgrades, validate the application: +**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) -1. Run the full test suite (unit, integration, e2e) -2. Check test coverage hasn't decreased -3. Run type checking (if applicable) -4. Run linting with new lint rule versions -5. Build the application for production -6. Report any failures with analysis -``` +## Core Responsibilities -### Integration Testing +The Conversation system has four primary responsibilities: -Verify integrations still work: +1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents +2. **State Orchestration** - Maintain conversation history, events, and execution status +3. **Workspace Coordination** - Bridge agent operations with execution environments +4. **Runtime Services** - Provide persistence, monitoring, security, and visualization -``` -Test our integrations after upgrading the AWS SDK: +## Architecture -1. Test S3 operations (upload, download, list) -2. Test DynamoDB operations (CRUD) -3. Test Lambda invocations -4. Test SQS send/receive -5. Compare behavior to before the upgrade -6. Note any subtle differences +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart LR + User["User Code"] + + subgraph Factory[" "] + Entry["Conversation()"] + end + + subgraph Implementations[" "] + Local["LocalConversation
Direct execution"] + Remote["RemoteConversation
Via agent-server API"] + end + + subgraph Core[" "] + State["ConversationState
• agent
workspace • stats • ..."] + EventLog["ConversationState.events
Event storage"] + end + + User --> Entry + Entry -.->|LocalWorkspace| Local + Entry -.->|RemoteWorkspace| Remote + + Local --> State + Remote --> State + + State --> EventLog + + classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + + class Entry factory + class Local,Remote impl + class State,EventLog core + class Persist,Stuck,Viz,Secrets service ``` -### Regression Detection +### Key Components -Detect regressions from upgrades: +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | +| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | +| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | +| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | +| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | -``` -Check for regressions after upgrading the ORM: +## Factory Pattern -1. Run database operation benchmarks -2. Compare query performance before and after -3. Verify all migrations still work -4. Check for any N+1 queries introduced -5. Validate data integrity in test database -6. Document any behavioral changes +The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Input["Conversation(agent, workspace)"] + Check{Workspace Type?} + Local["LocalConversation
Agent runs in-process"] + Remote["RemoteConversation
Agent runs via API"] + + Input --> Check + Check -->|str or LocalWorkspace| Local + Check -->|RemoteWorkspace| Remote + + style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -## Additional Examples +**Dispatch Logic:** +- **Local:** String paths or `LocalWorkspace` → in-process execution +- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket -### Security-Driven Upgrade +This abstraction enables switching deployment modes without code changes—just swap the workspace type. -``` -We have a critical security vulnerability in jsonwebtoken. +## State Management -Current: jsonwebtoken@8.5.1 -Required: jsonwebtoken@9.0.0 +State updates follow a **two-path pattern** depending on the type of change: -Perform the upgrade: -1. Check for breaking changes in v9 -2. Find all usages of jsonwebtoken in our code -3. Update any deprecated methods -4. Update the package version -5. Verify all JWT operations work -6. Run security tests +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["State Update Request"] + Lock["Acquire FIFO Lock"] + Decision{New Event?} + + StateOnly["Update State Fields
stats, status, metadata"] + EventPath["Append to Event Log
messages, actions, observations"] + + Callback["Trigger Callbacks"] + Release["Release Lock"] + + Start --> Lock + Lock --> Decision + Decision -->|No| StateOnly + Decision -->|Yes| EventPath + StateOnly --> Callback + EventPath --> Callback + Callback --> Release + + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px ``` -### Framework Major Upgrade +**Two Update Patterns:** -``` -Upgrade our Next.js application from 12 to 14: +1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) +2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur -Key areas to address: -1. App Router migration (pages -> app) -2. New metadata API -3. Server Components by default -4. New Image component -5. Route handlers replacing API routes +**Thread Safety:** +- FIFO Lock ensures ordered, atomic updates +- Callbacks fire after successful commit +- Read operations never block writes -For each area: -- Show current implementation -- Show new implementation -- Test the changes -``` +## Execution Models -### Multi-Package Coordinated Upgrade +The conversation system supports two execution models with identical APIs: +### Local vs Remote Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Local["LocalConversation"] + L1["User sends message"] + L2["Agent executes in-process"] + L3["Direct tool calls"] + L4["Events via callbacks"] + L1 --> L2 --> L3 --> L4 + end + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -Upgrade our React ecosystem packages together: -Current: -- react: 17.0.2 -- react-dom: 17.0.2 -- react-router-dom: 5.3.0 -- @testing-library/react: 12.1.2 +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Remote["RemoteConversation"] + R1["User sends message"] + R2["HTTP → Agent Server"] + R3["Isolated container execution"] + R4["WebSocket event stream"] + R1 --> R2 --> R3 --> R4 + end + style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Target: -- react: 18.2.0 -- react-dom: 18.2.0 -- react-router-dom: 6.x -- @testing-library/react: 14.x +| Aspect | LocalConversation | RemoteConversation | +|--------|-------------------|-------------------| +| **Execution** | In-process | Remote container/server | +| **Communication** | Direct function calls | HTTP + WebSocket | +| **State Sync** | Immediate | Network serialized | +| **Use Case** | Development, CLI tools | Production, web apps | +| **Isolation** | Process-level | Container-level | -Create an upgrade plan that handles all these together, -addressing breaking changes in the correct order. -``` +**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. -## Related Resources +## Auxiliary Services -- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities -- [Security Guide](/sdk/guides/security) - Security best practices for AI agents -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +The conversation system provides pluggable services that operate independently on the event stream: -### Incident Triage -Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md +| Service | Purpose | Architecture Pattern | +|---------|---------|---------------------| +| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | +| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | +| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | +| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | -When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point - -This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). - +## Component Relationships -## Overview +### How Conversation Interacts -Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Conv["Conversation"] + Agent["Agent"] + WS["Workspace"] + Tools["Tools"] + LLM["LLM"] + + Conv -->|Delegates to| Agent + Conv -->|Configures| WS + Agent -.->|Updates| Conv + Agent -->|Uses| Tools + Agent -->|Queries| LLM + + style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. +**Relationship Characteristics:** +- **Conversation → Agent**: One-way orchestration, agent reports back via state updates +- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation +- **Agent → Conversation**: Indirect via state events -OpenHands accelerates incident response by: +## See Also -- **Automated error analysis**: AI agents investigate errors and provide detailed reports -- **Root cause identification**: Connect symptoms to underlying issues in your codebase -- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues -- **Integration with monitoring tools**: Work directly with platforms like Datadog +- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design +- **[Event System](/sdk/arch/events)** - Event types and flow +- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples -## Automated Datadog Error Analysis +### Design Principles +Source: https://docs.openhands.dev/sdk/arch/design.md -The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. +The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. -### How It Works +[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. -[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. +## Optional Isolation over Mandatory Sandboxing -[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. + +**V0 Challenge:** +Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. +Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. + -### Triggering Automated Debugging +**V1 Principle:** +**Sandboxing should be opt-in, not universal.** +V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. +When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. -The GitHub Actions workflow can be triggered in two ways: +## Stateless by Default, One Source of Truth for State -1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. + +**V0 Challenge:** +V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. + -2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. +**V1 Principle:** +**Keep everything stateless, with exactly one mutable state.** +All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. +The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. -### Automated Investigation Process +## Clear Boundaries between Agent and Applications -When the workflow runs, it automatically performs the following steps: + +**V0 Challenge:** +The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. +Heavy research dependencies and benchmark integrations further bloated production builds. + -1. Get detailed info from the DataDog API -2. Create or find an existing GitHub issue to track the error -3. Clone all relevant repositories to get full code context -4. Run an OpenHands agent to analyze the error and investigate the code -5. Post the findings as a comment on the GitHub issue +**V1 Principle:** +**Maintain strict separation of concerns.** +V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). +Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. -The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. - -The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. - +## Composable Components for Extensibility -## Setting Up the Workflow + +**V0 Challenge:** +Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. + -To set up automated Datadog debugging in your own repository: +**V1 Principle:** +**Everything should be composable and safe to extend.** +Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. +Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. -1. Copy the workflow file to `.github/workflows/` in your repository -2. Configure the required secrets (Datadog API keys, LLM API key) -3. Customize the default queries and repository lists for your needs -4. Run the workflow manually or set up scheduled runs +### Events +Source: https://docs.openhands.dev/sdk/arch/events.md -The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. +The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. -Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. +**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) -## Manual Incident Investigation +## Core Responsibilities -You can also use OpenHands directly to investigate incidents without the automated workflow. +The Event System has four primary responsibilities: -### Log Analysis +1. **Type Safety** - Enforce event schemas through Pydantic models +2. **LLM Integration** - Convert events to/from LLM message formats +3. **Append-Only Log** - Maintain immutable event history +4. **Service Integration** - Enable observers to react to event streams -OpenHands can analyze logs to identify patterns and anomalies: +## Architecture +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% +flowchart TB + Base["Event
Base class"] + LLMBase["LLMConvertibleEvent
Abstract base"] + + subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] + Message["MessageEvent
User/assistant text"] + Action["ActionEvent
Tool calls"] + System["SystemPromptEvent
Initial system prompt"] + CondSummary["CondensationSummaryEvent
Condenser summary"] + + ObsBase["ObservationBaseEvent
Base for tool responses"] + Observation["ObservationEvent
Tool results"] + UserReject["UserRejectObservation
User rejected action"] + AgentError["AgentErrorEvent
Agent error"] + end + + subgraph Internals["Internal Events
NOT visible to the LLM"] + ConvState["ConversationStateUpdateEvent
State updates"] + CondReq["CondensationRequest
Request compression"] + Cond["Condensation
Compression result"] + Pause["PauseEvent
User pause"] + end + + Base --> LLMBase + Base --> Internals + LLMBase --> LLMTypes + ObsBase --> Observation + ObsBase --> UserReject + ObsBase --> AgentError + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base,LLMBase,Message,Action,SystemPromptEvent primary + class ObsBase,Observation,UserReject,AgentError secondary + class ConvState,CondReq,Cond,Pause tertiary ``` -Analyze these application logs for the incident that occurred at 14:32 UTC: -1. Identify the first error or warning that appeared -2. Trace the sequence of events leading to the failure -3. Find any correlated errors across services -4. Identify the user or request that triggered the issue -5. Summarize the timeline of events -``` +### Key Components -**Log analysis capabilities:** - -| Log Type | Analysis Capabilities | -|----------|----------------------| -| Application logs | Error patterns, exception traces, timing anomalies | -| Access logs | Traffic patterns, slow requests, error responses | -| System logs | Resource exhaustion, process crashes, system errors | -| Database logs | Slow queries, deadlocks, connection issues | +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | +| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | +| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | +| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | +| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | +| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | +| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | +| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | +| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | +| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | +| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | +| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | +| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | -### Stack Trace Analysis +## Event Types -Deep dive into stack traces: +### LLM-Convertible Events -``` -Analyze this stack trace from our production error: +Events that participate in agent reasoning and can be converted to LLM messages: -[paste full stack trace] -1. Identify the exception type and message -2. Trace back to our code (not framework code) -3. Identify the likely cause -4. Check if this code path has changed recently -5. Suggest a fix -``` +| Event Type | Source | Content | LLM Role | +|------------|--------|---------|----------| +| **MessageEvent (user)** | user | Text, images | `user` | +| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | +| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | +| **ObservationEvent** | environment | Tool execution result | `tool` | +| **UserRejectObservation** | environment | Rejection reason | `tool` | +| **AgentErrorEvent** | agent | Error details | `tool` | +| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | +| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | -**Multi-language support:** +The event system bridges agent events to LLM messages: - - - ``` - Analyze this Java exception: - - java.lang.OutOfMemoryError: Java heap space - at java.util.Arrays.copyOf(Arrays.java:3210) - at java.util.ArrayList.grow(ArrayList.java:265) - at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) - - Identify: - 1. What operation is consuming memory? - 2. Is there a memory leak or just too much data? - 3. What's the fix? - ``` - - - ``` - Analyze this Python traceback: - - Traceback (most recent call last): - File "app/api/orders.py", line 45, in create_order - order = OrderService.create(data) - File "app/services/order.py", line 89, in create - inventory.reserve(item_id, quantity) - AttributeError: 'NoneType' object has no attribute 'reserve' - - What's None and why? - ``` - - - ``` - Analyze this Node.js error: +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event List"] + Filter["Filter LLMConvertibleEvent"] + Group["Group ActionEvents
by llm_response_id"] + Convert["Convert to Messages"] + LLM["LLM Input"] - TypeError: Cannot read property 'map' of undefined - at processItems (/app/src/handlers/items.js:23:15) - at async handleRequest (/app/src/api/router.js:45:12) + Events --> Filter + Filter --> Group + Group --> Convert + Convert --> LLM - What's undefined and how should we handle it? - ``` -
-
- -### Root Cause Analysis - -Identify the underlying cause of an incident: - + style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -Perform root cause analysis for this incident: -Symptoms: -- API response times increased 5x at 14:00 -- Error rate jumped from 0.1% to 15% -- Database CPU spiked to 100% +**Special Handling - Parallel Function Calling:** -Available data: -- Application metrics (Grafana dashboard attached) -- Recent deployments: v2.3.1 deployed at 13:45 -- Database slow query log (attached) +When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): +1. Group all ActionEvents by `llm_response_id` +2. Combine into single Message with multiple `tool_calls` +3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included +4. All subsequent events in the batch have empty thought fields -Identify the root cause using the 5 Whys technique. +**Example:** +``` +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) ``` -## Common Incident Patterns - -OpenHands can recognize and help diagnose these common patterns: -- **Connection pool exhaustion**: Increasing connection errors followed by complete failure -- **Memory leaks**: Gradual memory increase leading to OOM -- **Cascading failures**: One service failure triggering others -- **Thundering herd**: Simultaneous requests overwhelming a service -- **Split brain**: Inconsistent state across distributed components +### Internal Events -## Quick Fix Generation +Events for metadata, control flow, and user actions (not sent to LLM): -Once the root cause is identified, generate fixes: +| Event Type | Source | Purpose | Key Fields | +|------------|--------|---------|------------| +| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | +| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | +| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | -``` -We've identified the root cause: a missing null check in OrderProcessor.java line 156. +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools -Generate a fix that: -1. Adds proper null checking -2. Logs when null is encountered -3. Returns an appropriate error response -4. Includes a unit test for the edge case -5. Is minimally invasive for a hotfix -``` +## Component Relationships -## Best Practices +### How Events Integrate -### Investigation Checklist +## `source` vs LLM `role` -Use this checklist when investigating: +Events often carry **two different concepts** that are easy to confuse: -1. **Scope the impact** - - How many users affected? - - What functionality is broken? - - What's the business impact? +- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. +- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. -2. **Establish timeline** - - When did it start? - - What changed around that time? - - Is it getting worse or stable? +These fields are **intentionally independent**. -3. **Gather data** - - Application logs - - Infrastructure metrics - - Recent deployments - - Configuration changes +Common examples include: -4. **Form hypotheses** - - List possible causes - - Rank by likelihood - - Test systematically +- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. +- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. -5. **Implement fix** - - Choose safest fix - - Test before deploying - - Monitor after deployment +**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. -### Common Pitfalls - -Avoid these common incident response mistakes: +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event System"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + Services["Auxiliary Services"] + + Agent -->|Reads| Events + Agent -->|Writes| Events + Conversation -->|Manages| Events + Tools -->|Creates| Events + Events -.->|Stream| Services + + style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -- **Jumping to conclusions**: Gather data before assuming the cause -- **Changing multiple things**: Make one change at a time to isolate effects -- **Not documenting**: Record all actions for the post-mortem -- **Ignoring rollback**: Always have a rollback plan before deploying fixes - +**Relationship Characteristics:** +- **Agent → Events**: Reads history for context, writes actions/messages +- **Conversation → Events**: Owns and persists event log +- **Tools → Events**: Create ObservationEvents after execution +- **Services → Events**: Read-only observers for monitoring, visualization - -For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. - +## Error Events: Agent vs Conversation -## Related Resources +Two distinct error events exist in the SDK, with different purpose and visibility: -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +- AgentErrorEvent + - Type: ObservationBaseEvent (LLM-convertible) + - Scope: Error for a specific tool call (has tool_name and tool_call_id) + - Source: "agent" + - LLM visibility: Sent as a tool message so the model can react/recover + - Effect: Conversation continues; not a terminal state + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py -### Spark Migrations -Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md +- ConversationErrorEvent + - Type: Event (not LLM-convertible) + - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) + - Source: typically "environment" + - LLM visibility: Not sent to the model + - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py -Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. +## See Also -## Overview - -Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. - -Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. - -Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. +- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events +- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management +- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation +- **[Condenser](/sdk/arch/condenser)** - Event history compression -Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: +### LLM +Source: https://docs.openhands.dev/sdk/arch/llm.md -1. **Understanding**: Analyze the existing codebase to identify what needs to change and why -2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences -3. **Validation**: Verify that migrated pipelines produce identical results to the originals +The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. -In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. +**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) -## Understanding +## Core Responsibilities -Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. +The LLM system has five primary responsibilities: -Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. +1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers +2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) +3. **Configuration Management** - Load from environment, JSON, or programmatic configuration +4. **Telemetry & Cost** - Track usage, latency, and costs across providers +5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries -If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: +## Architecture +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% +flowchart TB + subgraph Configuration["Configuration Sources"] + Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] + JSON["JSON Files
config/llm.json"] + Code["Programmatic
LLM(...)"] + end + + subgraph Core["Core LLM"] + Model["LLM Model
Pydantic configuration"] + Pipeline["Request Pipeline
Retry, timeout, telemetry"] + end + + subgraph Backend["LiteLLM Backend"] + Providers["100+ Providers
OpenAI, Anthropic, etc."] + end + + subgraph Output["Telemetry"] + Usage["Token Usage"] + Cost["Cost Tracking"] + Latency["Latency Metrics"] + end + + Env --> Model + JSON --> Model + Code --> Model + + Model --> Pipeline + Pipeline --> Providers + + Pipeline --> Usage + Pipeline --> Cost + Pipeline --> Latency + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Model primary + class Pipeline secondary + class LiteLLM tertiary ``` -Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. - -Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. - -Then, for each source file, identify - -1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) -2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) -3. Configuration properties that have changed defaults or been renamed -4. Dependencies that need version updates -Save the results in `migration_inventory.json` in the following format: +### Key Components -{ - ..., - "src/main/scala/etl/TransformJob.scala": { - "deprecated_apis": [ - {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} - ], - "behavioral_changes": [ - {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} - ], - "config_changes": [], - "risk": "medium" - }, - ... -} -``` +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | +| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | +| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | +| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | +| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | +| **Telemetry** | Usage tracking | Token counts, costs, latency | -Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. +## Configuration -## Migration +See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. -With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. +### Programmatic Configuration -To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. +Create LLM instances directly in code: -The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Code["Python Code"] + LLM["LLM(model=...)"] + Agent["Agent"] + + Code --> LLM + LLM --> Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -An example prompt is below: +**Example:** +```python +from pydantic import SecretStr +from openhands.sdk import LLM +llm = LLM( + model="anthropic/claude-sonnet-4.1", + api_key=SecretStr("sk-ant-123"), + temperature=0.1, + timeout=120, +) ``` -Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. -Use `migration_inventory.json` to guide the changes. +### Environment Variable Configuration -For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. +Load from environment using naming convention: -Requirements: -1. Replace all deprecated APIs with their Spark 3.0 equivalents -2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) -3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions -4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical -5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists -6. Update import statements for any relocated classes -7. Preserve all existing business logic and output schemas +**Environment Variable Pattern:** +- **Prefix:** All variables start with `LLM_` +- **Mapping:** `LLM_FIELD` → `field` (lowercased) +- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr + +**Common Variables:** +```bash +export LLM_MODEL="anthropic/claude-sonnet-4.1" +export LLM_API_KEY="sk-ant-123" +export LLM_USAGE_ID="primary" +export LLM_TIMEOUT="120" +export LLM_NUM_RETRIES="5" ``` -Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. +### JSON Configuration -## Validation +Serialize and load from JSON files: -Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. +**Example:** +```python +# Save +llm.model_dump_json(exclude_none=True, indent=2) -The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. +# Load +llm = LLM.load_from_json("config/llm.json") +``` -An example prompt for data-level comparison: +**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). +If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. -``` -Validate the migrated Spark application in `/src` against the original. -1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` -2. Compare outputs: - - Row counts must match exactly - - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns - - Flag any NULL handling differences -3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments -4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory +## Request Pipeline -Save the results in `validation_report.json` in the following format: +### Completion Flow -{ - "jobs": [ - { - "name": "daily_etl", - "data_match": true, - "row_count": {"v2": 1000000, "v3": 1000000}, - "column_diffs": [], - "performance": { - "duration_seconds": {"v2": 340, "v3": 285}, - "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} - } - }, - ... - ] -} +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% +flowchart TB + Request["completion() or responses() call"] + Validate["Validate Config"] + + Attempt["LiteLLM Request"] + Success{"Success?"} + + Retry{"Retries
remaining?"} + Wait["Exponential Backoff"] + + Telemetry["Record Telemetry"] + Response["Return Response"] + Error["Raise Error"] + + Request --> Validate + Validate --> Attempt + Attempt --> Success + + Success -->|Yes| Telemetry + Success -->|No| Retry + + Retry -->|Yes| Wait + Retry -->|No| Error + + Wait --> Attempt + Telemetry --> Response + + style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. +**Pipeline Stages:** -Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. +1. **Validation:** Check required fields (model, messages) +2. **Request:** Call LiteLLM with provider-specific formatting +3. **Retry Logic:** Exponential backoff on failures (configurable) +4. **Telemetry:** Record tokens, cost, latency +5. **Response:** Return completion or raise error -## Beyond Version Upgrades +### Responses API Support -While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: +In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. -- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. -- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. +#### Architecture -In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. - -## Related Resources - -- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents -- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Check{"Model supports
Responses API?"} + + subgraph Standard["Standard Path"] + ChatFormat["Format as
Chat Messages"] + ChatCall["litellm.completion()"] + end + + subgraph ResponsesPath["Responses Path"] + RespFormat["Format as
instructions + input[]"] + RespCall["litellm.responses()"] + end + + ChatResponse["ModelResponse"] + RespResponse["ResponsesAPIResponse"] + + Parse["Parse to Message"] + Return["LLMResponse"] + + Check -->|No| ChatFormat + Check -->|Yes| RespFormat + + ChatFormat --> ChatCall + RespFormat --> RespCall + + ChatCall --> ChatResponse + RespCall --> RespResponse + + ChatResponse --> Parse + RespResponse --> Parse + + Parse --> Return + + style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -### Vulnerability Remediation -Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md +#### Supported Models -Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. +Models that automatically use the Responses API path: -## The Challenge +| Pattern | Examples | Documentation | +|---------|----------|---------------| +| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | -The traditional approach to vulnerability remediation is manual and time-consuming: +**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). -1. Scan repositories for vulnerabilities -2. Review each vulnerability and its impact -3. Research the fix (usually a version upgrade) -4. Update dependency files -5. Test the changes -6. Create pull requests -7. Get reviews and merge -This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. +## Provider Integration -**What if we could automate this entire process using AI agents?** +### LiteLLM Abstraction -## Automated Vulnerability Remediation with OpenHands +Software Agent SDK uses LiteLLM for provider abstraction: -The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + SDK["Software Agent SDK"] + LiteLLM["LiteLLM"] + + subgraph Providers["100+ Providers"] + OpenAI["OpenAI"] + Anthropic["Anthropic"] + Google["Google"] + Azure["Azure"] + Others["..."] + end + + SDK --> LiteLLM + LiteLLM --> OpenAI + LiteLLM --> Anthropic + LiteLLM --> Google + LiteLLM --> Azure + LiteLLM --> Others + + style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -OpenHands assists with vulnerability remediation by: +**Benefits:** +- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. +- **Unified API:** Same interface regardless of provider +- **Format Translation:** Provider-specific request/response formatting +- **Error Handling:** Normalized error codes and messages -- **Identifying vulnerabilities**: Analyzing code for common security issues -- **Understanding impact**: Explaining the risk and exploitation potential -- **Implementing fixes**: Generating secure code to address vulnerabilities -- **Validating remediation**: Verifying fixes are effective and complete +### LLM Providers -## Two Approaches to Vulnerability Fixing +Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. +The pages linked below live under the OpenHands app section but apply +verbatim to SDK applications because both layers wrap the same +`openhands.sdk.llm.LLM` interface. -### 1. Point to a GitHub Repository +| Provider / scenario | Documentation | +| --- | --- | +| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | +| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | +| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | +| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | +| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | +| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | +| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | +| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | +| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | +| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | -Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. +When you follow any of those guides while building with the SDK, create an +`LLM` object using the documented parameters (for example, API keys, base URLs, +or custom headers) and pass it into your agent or registry. The OpenHands UI +surfacing is simply a convenience layer on top of the same configuration model. -### 2. Upload Security Scanner Reports -Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. +## Telemetry and Cost Tracking -This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. +### Telemetry Collection -## Architecture Overview +LLM requests automatically collect metrics: -A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Request["LLM Request"] + + subgraph Metrics + Tokens["Token Counts
Input/Output"] + Cost["Cost
USD"] + Latency["Latency
ms"] + end + + Events["Event Log"] + + Request --> Tokens + Request --> Cost + Request --> Latency + + Tokens --> Events + Cost --> Events + Latency --> Events + + style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -The key architectural components include: +**Tracked Metrics:** +- **Token Usage:** Input tokens, output tokens, total +- **Cost:** Per-request cost using configured rates +- **Latency:** Request duration in milliseconds +- **Errors:** Failure types and retry counts -- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) -- **WebSocket interface**: Enables real-time status updates on agent actions and operations -- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider -- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud +### Cost Configuration -This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. +Configure per-token costs for custom models: -## Example: Vulnerability Fixer Application +```python +llm = LLM( + model="custom/my-model", + input_cost_per_token=0.00001, # $0.01 per 1K tokens + output_cost_per_token=0.00003, # $0.03 per 1K tokens +) +``` -An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: +**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) -1. User points to a repository or uploads a security scan report -2. Agent analyzes the vulnerabilities -3. Agent creates fixes and pull requests automatically -4. User reviews and merges the changes +**Custom Costs:** Override for: +- Internal models +- Custom pricing agreements +- Cost estimation for budgeting -## Security Scanning Integration +## Component Relationships -Use OpenHands to analyze security scanner output: +### How LLM Integrates +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + LLM["LLM"] + Agent["Agent"] + Conversation["Conversation"] + Events["Events"] + Security["Security Analyzer"] + Condenser["Context Condenser"] + + Agent -->|Uses| LLM + LLM -->|Records| Events + Security -.->|Optional| LLM + Condenser -.->|Optional| LLM + Conversation -->|Provides context| Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -We ran a security scan and found these issues. Analyze each one: -1. SQL Injection in src/api/users.py:45 -2. XSS in src/templates/profile.html:23 -3. Hardcoded credential in src/config/database.py:12 -4. Path traversal in src/handlers/files.py:67 +**Relationship Characteristics:** +- **Agent → LLM**: Agent uses LLM for reasoning and tool calls +- **LLM → Events**: LLM requests/responses recorded as events +- **Security → LLM**: Optional security analyzer can use separate LLM +- **Condenser → LLM**: Optional context condenser can use separate LLM +- **Configuration**: LLM configured independently, passed to agent +- **Telemetry**: LLM metrics flow through event system to UI/logging -For each vulnerability: -- Explain what the vulnerability is -- Show how it could be exploited -- Rate the severity (Critical/High/Medium/Low) -- Suggest a fix -``` +## See Also -## Common Vulnerability Patterns +- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions +- **[Events](/sdk/arch/events)** - LLM request/response event types +- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis +- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration -OpenHands can detect these common vulnerability patterns: +### MCP Integration +Source: https://docs.openhands.dev/sdk/arch/mcp.md -| Vulnerability | Pattern | Example | -|--------------|---------|---------| -| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | -| XSS | Unescaped user input in HTML | `
${user_comment}
` | -| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | -| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | -| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | +The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. -## Automated Remediation +**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) -### Applying Security Patches +## Core Responsibilities -Fix identified vulnerabilities: +The MCP Integration system has four primary responsibilities: - - - ``` - Fix the SQL injection vulnerability in src/api/users.py: - - Current code: - query = f"SELECT * FROM users WHERE id = {user_id}" - cursor.execute(query) +1. **MCP Client Management** - Connect to and communicate with MCP servers +2. **Tool Discovery** - Enumerate available tools from MCP servers +3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions +4. **Execution Bridge** - Execute MCP tool calls from agent actions + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Client["MCP Client"] + Sync["MCPClient
Sync/Async bridge"] + Async["AsyncMCPClient
FastMCP base"] + end - Requirements: - 1. Use parameterized queries - 2. Add input validation - 3. Maintain the same functionality - 4. Add a test case for the fix - ``` + subgraph Bridge["Tool Bridge"] + Def["MCPToolDefinition
Schema conversion"] + Exec["MCPToolExecutor
Execution handler"] + end - **Fixed code:** - ```python - # Using parameterized query - query = "SELECT * FROM users WHERE id = %s" - cursor.execute(query, (user_id,)) - ``` -
- - ``` - Fix the XSS vulnerability in src/templates/profile.html: + subgraph Integration["Agent Integration"] + Action["MCPToolAction
Dynamic model"] + Obs["MCPToolObservation
Result wrapper"] + end - Current code: -
${user.bio}
+ subgraph External["External"] + Server["MCP Server
stdio/HTTP"] + Tools["External Tools"] + end - Requirements: - 1. Properly escape user content - 2. Consider Content Security Policy - 3. Handle rich text if needed - 4. Test with malicious input - ``` + Sync --> Async + Async --> Server - **Fixed code:** - ```html - -
{{ user.bio | escape }}
- ``` -
- - ``` - Fix the command injection in src/utils/network.py: + Server --> Def + Def --> Exec - Current code: - def ping_host(hostname): - os.system(f"ping -c 1 {hostname}") + Exec --> Action + Action --> Server + Server --> Obs - Requirements: - 1. Use safe subprocess calls - 2. Validate input format - 3. Avoid shell=True - 4. Handle errors properly - ``` + Server -.->|Spawns| Tools - **Fixed code:** - ```python - import subprocess - import re + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - def ping_host(hostname): - # Validate hostname format - if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): - raise ValueError("Invalid hostname") - - # Use subprocess without shell - result = subprocess.run( - ["ping", "-c", "1", hostname], - capture_output=True, - text=True - ) - return result.returncode == 0 - ``` - -
- -### Code-Level Vulnerability Fixes - -Fix application-level security issues: - + class Sync,Async primary + class Def,Exec secondary + class Action,Obs tertiary ``` -Fix the broken access control in our API: -Issue: Users can access other users' data by changing the ID in the URL. +### Key Components -Current code: -@app.get("/api/users/{user_id}/documents") -def get_documents(user_id: int): - return db.get_documents(user_id) +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | -Requirements: -1. Add authorization check -2. Verify requesting user matches or is admin -3. Return 403 for unauthorized access -4. Log access attempts -5. Add tests for authorization -``` +## MCP Client -**Fixed code:** +### Sync/Async Bridge -```python -@app.get("/api/users/{user_id}/documents") -def get_documents(user_id: int, current_user: User = Depends(get_current_user)): - # Check authorization - if current_user.id != user_id and not current_user.is_admin: - logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") - raise HTTPException(status_code=403, detail="Not authorized") +The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Sync["Sync Code
Agent execution"] + Bridge["call_async_from_sync()"] + Executor["AsyncExecutor
Background loop"] + Async["Async MCP Call"] + Server["MCP Server"] + Result["Result"] - return db.get_documents(user_id) + Sync --> Bridge + Bridge --> Executor + Executor --> Async + Async --> Server + Server --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -## Security Testing +**Bridge Pattern:** +- **Problem:** MCP protocol is async, but agent tools run synchronously +- **Solution:** Background event loop that executes async code from sync contexts +- **Benefit:** Agents use MCP tools without async/await in tool definitions -Test your fixes thoroughly: +**Client Features:** +- **Lifecycle Management:** `__enter__`/`__exit__` for context manager +- **Timeout Support:** Configurable timeouts for MCP operations +- **Error Handling:** Wraps MCP errors in observations +- **Connection Pooling:** Reuses connections across tool calls -``` -Create security tests for the SQL injection fix: +### MCP Server Configuration -1. Test with normal input -2. Test with SQL injection payloads: - - ' OR '1'='1 - - '; DROP TABLE users; -- - - UNION SELECT * FROM passwords -3. Test with special characters -4. Test with null/empty input -5. Verify error handling doesn't leak information +MCP servers are configured using the FastMCP format: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } +} ``` -## Automated Remediation Pipeline +**Configuration Fields:** +- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) +- **args:** Arguments to pass to command +- **env:** Environment variables (optional) -Create an end-to-end automated pipeline: +## Tool Discovery and Conversion -``` -Create an automated vulnerability remediation pipeline: +### Discovery Flow -1. Parse Snyk/Dependabot/CodeQL alerts -2. Categorize by severity and type -3. For each vulnerability: - - Create a branch - - Apply the fix - - Run tests - - Create a PR with: - - Description of vulnerability - - Fix applied - - Test results -4. Request review from security team -5. Auto-merge low-risk fixes after tests pass +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Config"] + Spawn["Spawn Server"] + List["List Tools"] + + subgraph Convert["Convert Each Tool"] + Schema["MCP Schema"] + Action["Generate Action Model"] + Def["Create ToolDefinition"] + end + + Register["Register in ToolRegistry"] + + Config --> Spawn + Spawn --> List + List --> Schema + + Schema --> Action + Action --> Def + Def --> Register + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -## Building Your Own Vulnerability Fixer - -The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. +**Discovery Steps:** -To build your own vulnerability remediation agent: +1. **Spawn Server:** Launch MCP server via stdio +2. **List Tools:** Call `tools/list` MCP endpoint +3. **Parse Schemas:** Extract tool names, descriptions, parameters +4. **Generate Models:** Dynamically create Pydantic models for actions +5. **Create Definitions:** Wrap in `ToolDefinition` objects +6. **Register:** Add to agent's tool registry -1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent -2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) -3. Configure the agent to create pull requests automatically -4. Set up human review workflows for critical fixes +### Schema Conversion -As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. +MCP tool schemas are converted to SDK tool definitions: -## Related Resources +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP Tool Schema
JSON Schema"] + Parse["Parse Parameters"] + Model["Dynamic Pydantic Model
MCPToolAction"] + Def["ToolDefinition
SDK format"] + + MCP --> Parse + Parse --> Model + Model --> Def + + style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example -- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents -- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies -- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts +**Conversion Rules:** -### Windows Without WSL -Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl.md +| MCP Schema | SDK Action Model | +|------------|------------------| +| **name** | Class name (camelCase) | +| **description** | Docstring | +| **inputSchema** | Pydantic fields | +| **required** | Field(required=True) | +| **type** | Python type hints | - - This way of running OpenHands is not officially supported. It is maintained by the community and may not work. - +**Example:** -# Running OpenHands GUI on Windows Without WSL +```python +# MCP Schema +{ + "name": "fetch_url", + "description": "Fetch content from URL", + "inputSchema": { + "type": "object", + "properties": { + "url": {"type": "string"}, + "timeout": {"type": "number"} + }, + "required": ["url"] + } +} -This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. +# Generated Action Model +class FetchUrl(MCPToolAction): + """Fetch content from URL""" + url: str + timeout: float | None = None +``` -## Prerequisites +## Tool Execution -1. **Windows 10/11** - A modern Windows operating system -2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) -3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet -4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) -5. **Git** - For cloning the repository and version control -6. **Node.js and npm** - For running the frontend +### Execution Flow -## Step 1: Install Required Software +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Agent["Agent generates action"] + Action["MCPToolAction"] + Executor["MCPToolExecutor"] + + Convert["Convert to MCP format"] + Call["MCP call_tool"] + Server["MCP Server"] + + Result["MCP Result"] + Obs["MCPToolObservation"] + Return["Return to Agent"] + + Agent --> Action + Action --> Executor + Executor --> Convert + Convert --> Call + Call --> Server + Server --> Result + Result --> Obs + Obs --> Return + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -1. **Install Python 3.12 or 3.13** - - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) - - During installation, check "Add Python to PATH" - - Verify installation by opening PowerShell and running: - ```powershell - python --version - ``` +**Execution Steps:** -2. **Install PowerShell 7** - - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) - - Choose the MSI installer appropriate for your system (x64 for most modern computers) - - Run the installer with default options - - Verify installation by opening a new terminal and running: - ```powershell - pwsh --version - ``` - - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors +1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` +2. **Executor Lookup:** Find `MCPToolExecutor` for tool name +3. **Format Conversion:** Convert action fields to MCP arguments +4. **MCP Call:** Execute `call_tool` via MCP client +5. **Result Parsing:** Parse MCP result (text, images, resources) +6. **Observation Creation:** Wrap in `MCPToolObservation` +7. **Error Handling:** Catch exceptions, return error observations -3. **Install .NET Core Runtime** - - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) - - Choose the latest .NET Core Runtime (not SDK) - - Verify installation by opening PowerShell and running: - ```powershell - dotnet --info - ``` - - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. +### MCPToolExecutor -4. **Install Git** - - Download Git from [git-scm.com](https://git-scm.com/download/win) - - Use default installation options - - Verify installation: - ```powershell - git --version - ``` +Executors bridge SDK actions to MCP calls: -5. **Install Node.js and npm** - - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) - - During installation, accept the default options which will install npm as well - - Verify installation: - ```powershell - node --version - npm --version - ``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Executor["MCPToolExecutor"] + Client["MCP Client"] + Name["tool_name"] + + Executor -->|Uses| Client + Executor -->|Knows| Name + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -6. **Install Poetry** - - Open PowerShell as Administrator and run: - ```powershell - (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - - ``` - - Add Poetry to your PATH: - ```powershell - $env:Path += ";$env:APPDATA\Python\Scripts" - ``` - - Verify installation: - ```powershell - poetry --version - ``` +**Executor Responsibilities:** +- **Client Management:** Hold reference to MCP client +- **Tool Identification:** Know which MCP tool to call +- **Argument Conversion:** Transform action fields to MCP format +- **Result Handling:** Parse MCP responses +- **Error Recovery:** Handle connection errors, timeouts, server failures -## Step 2: Clone and Set Up OpenHands +## MCP Tool Lifecycle -1. **Clone the Repository** - ```powershell - git clone https://github.com/OpenHands/OpenHands.git - cd OpenHands - ``` +### From Configuration to Execution -2. **Install Dependencies** - ```powershell - poetry install - ``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Load["Load MCP Config"] + Start["Start Conversation"] + Spawn["Spawn MCP Servers"] + Discover["Discover Tools"] + Register["Register Tools"] + + Ready["Agent Ready"] + + Step["Agent Step"] + LLM["LLM Tool Call"] + Execute["Execute MCP Tool"] + Result["Return Observation"] + + End["End Conversation"] + Cleanup["Close MCP Clients"] + + Load --> Start + Start --> Spawn + Spawn --> Discover + Discover --> Register + Register --> Ready + + Ready --> Step + Step --> LLM + LLM --> Execute + Execute --> Result + Result --> Step + + Step --> End + End --> Cleanup + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` - This will install all required dependencies, including: - - pythonnet - Required for Windows PowerShell integration - - All other OpenHands dependencies +**Lifecycle Phases:** -## Step 3: Run OpenHands +| Phase | Operations | Components | +|-------|-----------|------------| +| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | +| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | +| **Execution** | Handle tool calls | Agent, MCPToolAction | +| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | -1. **Build the Frontend** - ```powershell - cd frontend - npm install - npm run build - cd .. - ``` +## MCP Annotations - This will build the frontend files that the backend will serve. +MCP tools can include metadata hints for agents: -2. **Start the Backend** - ```powershell - # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell - pwsh - $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" - ``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Tool["MCP Tool"] + + subgraph Annotations + ReadOnly["readOnlyHint"] + Destructive["destructiveHint"] + Progress["progressEnabled"] + end + + Security["Security Analysis"] + + Tool --> ReadOnly + Tool --> Destructive + Tool --> Progress + + ReadOnly --> Security + Destructive --> Security + + style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` - This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. +**Annotation Types:** - > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. +| Annotation | Meaning | Use Case | +|------------|---------|----------| +| **readOnlyHint** | Tool doesn't modify state | Lower security risk | +| **destructiveHint** | Tool modifies/deletes data | Require confirmation | +| **progressEnabled** | Tool reports progress | Show progress UI | - > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. +These annotations feed into the security analyzer for risk assessment. -3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** - ```powershell - cd frontend - npm run dev - ``` +## Component Relationships -4. **Access the OpenHands GUI** +### How MCP Integrates - Open your browser and navigate to: - ``` - http://localhost:3000 - ``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP System"] + Skills["Skills"] + Tools["Tool Registry"] + Agent["Agent"] + Security["Security"] + + Skills -->|Configures| MCP + MCP -->|Registers| Tools + Agent -->|Uses| Tools + MCP -->|Provides hints| Security + + style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` - > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` +**Relationship Characteristics:** +- **Skills → MCP**: Repository skills can embed MCP configurations +- **MCP → Tools**: MCP tools registered alongside native tools +- **Agent → Tools**: Agents use MCP tools like any other tool +- **MCP → Security**: Annotations inform security risk assessment +- **Transparent Integration**: Agent doesn't distinguish MCP from native tools -## Installing and Running the CLI +## Design Rationale -To install and run the OpenHands CLI on Windows without WSL, follow these steps: +**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. -### 1. Install uv (Python Package Manager) +**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. -Open PowerShell as Administrator and run: +**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. -```powershell -powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" -``` +**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. -### 2. Install .NET SDK (Required) +**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. -The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: +**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. -```powershell -winget install Microsoft.DotNet.SDK.8 -``` +## See Also -Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). +- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment +- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library -After installation, restart your PowerShell session to ensure the environment variables are updated. +### Overview +Source: https://docs.openhands.dev/sdk/arch/overview.md -### 3. Install and Run OpenHands +The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. -After installing the prerequisites, install OpenHands with: +Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. -```powershell -uv tool install openhands --python 3.12 -``` +## Relationship with OpenHands Applications -Then run OpenHands: +The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. +- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. +- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. +- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. -```powershell -openhands -``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% +graph TB + subgraph Interfaces["OpenHands Interfaces"] + UI[OpenHands GUI
React frontend] + CLI[OpenHands CLI
Command-line interface] + Custom[Your Custom Client
Automations & workflows] + end -To upgrade OpenHands in the future: + SDK[Software Agent SDK
openhands.sdk + tools + workspace] + + subgraph External["External Services"] + LLM[LLM Providers
OpenAI, Anthropic, etc.] + Runtime[Runtime Services
Docker, Remote API, etc.] + end -```powershell -uv tool upgrade openhands --python 3.12 + UI --> SDK + CLI --> SDK + Custom --> SDK + + SDK --> LLM + SDK --> Runtime + + classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class UI,CLI,Custom interface + class SDK sdk + class LLM,Runtime external ``` -### Troubleshooting CLI Issues -#### CoreCLR Error +## Four-Package Architecture -If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: +The agent-sdk is organized into four distinct Python packages: -1. Install the .NET SDK as described in step 2 above -2. Verify that your system PATH includes the .NET SDK directories -3. Restart your PowerShell session completely after installing the .NET SDK -4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell +| Package | What It Does | When You Need It | +|---------|-------------|------------------| +| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | +| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | +| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | +| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | -To verify your .NET installation, run: +### Two Deployment Modes -```powershell -dotnet --info -``` +The SDK supports two deployment architectures depending on your needs: -This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. +#### Mode 1: Local Development -If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). +**Installation:** Just install `openhands-sdk` + `openhands-tools` -## Limitations on Windows +```bash +pip install openhands-sdk openhands-tools +``` -When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: +**Architecture:** -1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools + + SDK -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 +``` -2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. +- `LocalWorkspace` included in SDK (no extra install) +- Everything runs in one process +- Perfect for prototyping and simple use cases +- Quick setup, no Docker required -3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. +#### Mode 2: Production / Sandboxed -4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. +**Installation:** Install all 4 packages -## Troubleshooting +```bash +pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server +``` -### "System.Management.Automation" Not Found Error +**Architecture:** -If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% +flowchart LR + + WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk + + subgraph WS[" "] + direction LR + Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws + Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws + end + + Server["openhands.agent_server
FastAPI + WebSocket"]:::server + Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · …"]:::tools + + WSBase -.->|extended by| Docker + WSBase -.->|extended by| Remote + Docker -->|spawns container with| Server + Remote -->|connects via HTTP to| Server + Server -->|runs| Agent + Agent -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 + classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 + classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + + style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none +``` -> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. +- `RemoteWorkspace` auto-spawns agent-server in containers +- Sandboxed execution for security +- Multi-user deployments +- Distributed systems (e.g., Kubernetes) support -To resolve this issue: + +**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). + -1. **Install the latest version of PowerShell 7** from the official Microsoft repository: - - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) - - Download and install the latest MSI package for your system architecture (x64 for most systems) - - During installation, ensure you select the following options: - - "Add PowerShell to PATH environment variable" - - "Register Windows PowerShell 7 as the default shell" - - "Enable PowerShell remoting" - - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default - -2. **Restart your terminal or command prompt** to ensure the new PowerShell is available +### SDK Package (`openhands.sdk`) -3. **Verify the installation** by running: - ```powershell - pwsh --version - ``` +**Purpose:** Core components and base classes for OpenHands agent. - You should see output indicating PowerShell 7.x.x +**Key Components:** +- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop +- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle +- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry +- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration +- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) +- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) +- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation +- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management +- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution -4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: - ```powershell - pwsh - cd path\to\openhands - $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" - ``` +**Design:** Stateless, immutable components with type-safe Pydantic models. - > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". +**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. -5. **If the issue persists**, ensure that you have the .NET Runtime installed: - - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) - - Choose ".NET Runtime" (not SDK) version 6.0 or later - - After installation, verify it's properly installed by running: - ```powershell - dotnet --info - ``` - - Restart your computer after installation - - Try running OpenHands again +**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) -6. **Ensure that the .NET Framework is properly installed** on your system: - - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off - - Make sure ".NET Framework 4.8 Advanced Services" is enabled - - Click OK and restart if prompted +### Tools Package (`openhands.tools`) -This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. -## OpenHands Cloud + +**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. + -### Bitbucket Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md +**Purpose:** Pre-built tools following consistent patterns. -## Prerequisites +**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). + +For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. + -## Adding Bitbucket Repository Access -Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. +### Workspace Package (`openhands.workspace`) -## Working With Bitbucket Repos in Openhands Cloud +**Purpose:** Workspace implementations extending SDK base classes. -After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and -branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! +**Key Components:** Docker Workspace, Remote API Workspace, and more. -![Connect Repo](/openhands/static/img/connect-repo.png) +**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. -## IP Whitelisting +**Use Cases:** Sandboxed execution, multi-user deployments, production environments. -If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow -OpenHands to access your repositories: + +For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). + -### Core App IP -``` -34.68.58.200 -``` +### Agent Server Package (`openhands.agent_server`) -### Runtime IPs -``` -34.10.175.217 -34.136.162.246 -34.45.0.142 -34.28.69.126 -35.224.240.213 -34.70.174.52 -34.42.4.87 -35.222.133.153 -34.29.175.97 -34.60.55.59 -``` +**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. -## Next Steps +**Features:** +- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode +- Service management with isolated per-user sessions +- API key authentication and health checking -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). -### Cloud API -Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md +**Use Cases:** Multi-user web apps, SaaS products, distributed systems. -For the available API endpoints, refer to the -[OpenHands API Reference](https://docs.openhands.dev/api-reference). + +For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). + -## Obtaining an API Key +## How Components Work Together -To use the OpenHands Cloud API, you'll need to generate an API key: +### Basic Execution Flow (Local) -1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. -2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. -3. Click `Create API Key`. -4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. -5. Copy the generated API key and store it securely. It will only be shown once. +When you send a message to an agent, here's what happens: -## API Usage Example (V1) +```mermaid +sequenceDiagram + participant You + participant Conversation + participant Agent + participant LLM + participant Tool + + You->>Conversation: "Create hello.txt" + Conversation->>Agent: Process message + Agent->>LLM: What should I do? + LLM-->>Agent: Use BashTool("touch hello.txt") + Agent->>Tool: Execute action + Note over Tool: Runs in same environment
as Agent (local/container/remote) + Tool-->>Agent: Observation + Agent->>LLM: Got result, continue? + LLM-->>Agent: Done + Agent-->>Conversation: Update state + Conversation-->>You: "File created!" +``` -### Starting a New Conversation +**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. -To start a new conversation with OpenHands to perform a task, -make a POST request to the V1 app-conversations endpoint. +### Deployment Flexibility - - - ```bash - curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_message": { - "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] - }, - "selected_repository": "yourusername/your-repo" - }' - ``` - - - ```python - import requests +The same agent code runs in different environments by swapping workspace configuration: - api_key = "YOUR_API_KEY" - url = "https://app.all-hands.dev/api/v1/app-conversations" +```mermaid +graph TB + subgraph "Your Code (Unchanged)" + Code["Agent + Tools + LLM"] + end + + subgraph "Deployment Options" + Local["Local
Direct execution"] + Docker["Docker
Containerized"] + Remote["Remote
Multi-user server"] + end + + Code -->|LocalWorkspace| Local + Code -->|DockerWorkspace| Docker + Code -->|RemoteAPIWorkspace| Remote + + style Code fill:#e1f5fe + style Local fill:#e8f5e8 + style Docker fill:#e8f5e8 + style Remote fill:#e8f5e8 +``` - headers = { - "Authorization": f"Bearer {api_key}", - "Content-Type": "application/json" - } +## Next Steps - data = { - "initial_message": { - "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] - }, - "selected_repository": "yourusername/your-repo" - } +### Get Started +- [Getting Started](/sdk/getting-started) – Build your first agent +- [Hello World](/sdk/guides/hello-world) – Minimal example - response = requests.post(url, headers=headers, json=data) - result = response.json() +### Explore Components - # The response contains a start task with the conversation ID - conversation_id = result.get("app_conversation_id") or result.get("id") - print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") - print(f"Status: {result['status']}") - ``` -
- - ```typescript - const apiKey = "YOUR_API_KEY"; - const url = "https://app.all-hands.dev/api/v1/app-conversations"; +**SDK Package:** +- [Agent](/sdk/arch/agent) – Core reasoning-action loop +- [Conversation](/sdk/arch/conversation) – State management and lifecycle +- [LLM](/sdk/arch/llm) – Language model integration +- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern +- [Events](/sdk/arch/events) – Typed event framework +- [Workspace](/sdk/arch/workspace) – Base workspace architecture - const headers = { - "Authorization": `Bearer ${apiKey}`, - "Content-Type": "application/json" - }; +**Tools Package:** +- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details - const data = { - initial_message: { - content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] - }, - selected_repository: "yourusername/your-repo" - }; +**Workspace Package:** +- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details - async function startConversation() { - try { - const response = await fetch(url, { - method: "POST", - headers: headers, - body: JSON.stringify(data) - }); +**Agent Server:** +- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details - const result = await response.json(); +### Deploy +- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service +- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server - // The response contains a start task with the conversation ID - const conversationId = result.app_conversation_id || result.id; - console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); - console.log(`Status: ${result.status}`); +### Source Code +- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework +- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools +- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples - return result; - } catch (error) { - console.error("Error starting conversation:", error); - } - } +### SDK Package +Source: https://docs.openhands.dev/sdk/arch/sdk.md - startConversation(); - ``` - -
+The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. -#### Response +**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) -The API will return a JSON object with details about the conversation start task: +## Purpose -```json -{ - "id": "550e8400-e29b-41d4-a716-446655440000", - "status": "WORKING", - "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", - "sandbox_id": "sandbox-abc123", - "created_at": "2025-01-15T10:30:00Z" -} +The SDK package handles: +- **Agent reasoning loop**: How agents process messages and make decisions +- **State management**: Conversation lifecycle and persistence +- **LLM integration**: Provider-agnostic language model access +- **Tool system**: Typed actions and observations +- **Workspace abstraction**: Where code executes +- **Extensibility**: Skills, condensers, MCP, security + +## Core Components + +```mermaid +graph TB + Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] + + Agent --> LLM[LLM
Language Model] + Agent --> Tools[Tool System
Capabilities] + Agent --> Micro[Skills
Behavior Modules] + Agent --> Cond[Condenser
Memory Manager] + + Tools --> Workspace[Workspace
Execution] + + Conv --> Events[Events
Communication] + Tools --> MCP[MCP
External Tools] + Workspace --> Security[Security
Validation] + + style Conv fill:#e1f5fe + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fff3e0 + style Workspace fill:#fce4ec ``` -The `status` field indicates the current state of the conversation startup process: -- `WORKING` - Initial processing -- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready -- `PREPARING_REPOSITORY` - Cloning and setting up the repository -- `READY` - Conversation is ready to use -- `ERROR` - An error occurred during startup +### 1. Conversation - State & Lifecycle -You may receive an authentication error if: +**What it does**: Manages the entire conversation lifecycle and state. -- You provided an invalid API key. -- You provided the wrong repository name. -- You don't have access to the repository. +**Key responsibilities**: +- Maintains conversation state (immutable) +- Handles message flow between user and agent +- Manages turn-taking and async execution +- Persists and restores conversation state +- Emits events for monitoring -### Streaming Conversation Start (Optional) +**Design decisions**: +- **Immutable state**: Each operation returns a new Conversation instance +- **Serializable**: Can be saved to disk or database and restored +- **Async-first**: Built for streaming and concurrent execution -For real-time updates during conversation startup, you can use the streaming endpoint: +**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. -```bash -curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_message": { - "content": [{"type": "text", "text": "Your task description here"}] - }, - "selected_repository": "yourusername/your-repo" - }' -``` +**Example use cases**: +- Saving conversation to database after each turn +- Implementing undo/redo functionality +- Building multi-session chatbots +- Time-travel debugging -#### Streaming Response +**Learn more**: +- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) +- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) +- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) -The endpoint streams a JSON array incrementally. Each element represents a status update: +--- -```json -[ - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, - {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} -] -``` +### 2. Agent - The Reasoning Loop -Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. +**What it does**: The core reasoning engine that processes messages and decides what to do. -## Rate Limits +**Key responsibilities**: +- Receives messages and current state +- Consults LLM to reason about next action +- Validates and executes tool calls +- Processes observations and loops until completion +- Integrates with skills for specialized behavior -If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. -If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). +**Design decisions**: +- **Stateless**: Agent doesn't hold state, operates on Conversation +- **Extensible**: Behavior can be modified via skills +- **Provider-agnostic**: Works with any LLM through unified interface ---- +**The reasoning loop**: +1. Receive message from Conversation +2. Add message to context +3. Consult LLM with full conversation history +4. If LLM returns tool call → validate and execute tool +5. If tool returns observation → add to context, go to step 3 +6. If LLM returns response → done, return to user -## Migrating from V0 to V1 API +**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. - - The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. - Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. - +**Example use cases**: +- Planning agents that break tasks into steps +- Code review agents with specific checks +- Agents with domain-specific reasoning patterns -### Key Differences +**Learn more**: +- Guide: [Custom Agents](/sdk/guides/agent-custom) +- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) +- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) -| Feature | V0 API | V1 API | -|---------|--------|--------| -| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | -| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | -| Repository field | `repository` | `selected_repository` | -| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | +--- -### Migration Steps +### 3. LLM - Language Model Integration -1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` +**What it does**: Provides a provider-agnostic interface to language models. -2. **Update the request body**: - - Change `repository` to `selected_repository` - - Change `initial_user_msg` (string) to `initial_message` (object with content array): - ```json - // V0 format - { "initial_user_msg": "Your message here" } +**Key responsibilities**: +- Abstracts different LLM providers (OpenAI, Anthropic, etc.) +- Handles message formatting and conversion +- Manages streaming responses +- Supports tool calling and reasoning modes +- Handles retries and error recovery - // V1 format - { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } - ``` +**Design decisions**: +- **Provider-agnostic**: Same API works with any provider +- **Streaming-first**: Built for real-time responses +- **Type-safe**: Pydantic models for all messages +- **Extensible**: Easy to add new providers -3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. +**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: +- Cost optimization (switch to cheaper models) +- Testing with different models +- Avoiding vendor lock-in +- Supporting customer choice ---- +**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. -## Legacy API (V0) - Deprecated +**Example use cases**: +- Routing requests to different models based on complexity +- Implementing custom caching strategies +- Adding observability hooks - - The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. - New integrations should use the V1 API documented above. - +**Learn more**: +- Guide: [LLM Registry](/sdk/guides/llm-registry) +- Guide: [LLM Routing](/sdk/guides/llm-routing) +- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) +- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) -### Starting a New Conversation (V0) +--- - - - ```bash - curl -X POST "https://app.all-hands.dev/api/conversations" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - "repository": "yourusername/your-repo" - }' - ``` - - - ```python - import requests +### 4. Tool System - Typed Capabilities - api_key = "YOUR_API_KEY" - url = "https://app.all-hands.dev/api/conversations" +**What it does**: Defines what agents can do through a typed action/observation pattern. - headers = { - "Authorization": f"Bearer {api_key}", - "Content-Type": "application/json" - } +**Key responsibilities**: +- Defines tool schemas (inputs and outputs) +- Validates actions before execution +- Executes tools and returns typed observations +- Generates JSON schemas for LLM tool calling +- Registers tools with the agent - data = { - "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - "repository": "yourusername/your-repo" - } +**Design decisions**: +- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs +- **Schema generation**: Pydantic models auto-generate JSON schemas +- **Executor pattern**: Separation of tool definition and execution +- **Composable**: Tools can call other tools - response = requests.post(url, headers=headers, json=data) - conversation = response.json() +**The three components**: +1. **Action**: Input schema (what the tool accepts) +2. **Observation**: Output schema (what the tool returns) +3. **ToolExecutor**: Logic that transforms Action → Observation - print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") - print(f"Status: {conversation['status']}") - ``` - - - ```typescript - const apiKey = "YOUR_API_KEY"; - const url = "https://app.all-hands.dev/api/conversations"; +**Why this pattern?** +- Type safety catches errors early +- LLMs get accurate schemas for tool calling +- Tools are testable in isolation +- Easy to compose tools - const headers = { - "Authorization": `Bearer ${apiKey}`, - "Content-Type": "application/json" - }; +**When to customize**: When you need domain-specific capabilities not covered by built-in tools. - const data = { - initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", - repository: "yourusername/your-repo" - }; +**Example use cases**: +- Database query tools +- API integration tools +- Custom file format parsers +- Domain-specific calculators - async function startConversation() { - try { - const response = await fetch(url, { - method: "POST", - headers: headers, - body: JSON.stringify(data) - }); +**Learn more**: +- Guide: [Custom Tools](/sdk/guides/custom-tools) +- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) - const conversation = await response.json(); +--- - console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); - console.log(`Status: ${conversation.status}`); +### 5. Workspace - Execution Abstraction - return conversation; - } catch (error) { - console.error("Error starting conversation:", error); - } - } +**What it does**: Abstracts *where* code executes (local, Docker, remote). - startConversation(); - ``` - - +**Key responsibilities**: +- Provides unified interface for code execution +- Handles file operations across environments +- Manages working directories +- Supports different isolation levels -#### Response (V0) +**Design decisions**: +- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package +- **Environment-agnostic**: Code works the same locally or remotely +- **Lazy initialization**: Workspace setup happens on first use -```json -{ - "status": "ok", - "conversation_id": "abc1234" -} -``` +**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. -### Cloud UI -Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md +**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. -## Landing Page +**Learn more**: +- Architecture: [Workspace Architecture](/sdk/arch/workspace) +- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) +- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) -The landing page is where you can: +--- -- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), - [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or - [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. -- Launch an empty conversation using `New Conversation`. -- See `Suggested Tasks` for repositories that OpenHands has access to. -- See your `Recent Conversations`. +### 6. Events - Component Communication -## Settings +**What it does**: Enables observability and debugging through event emissions. -Settings are divided across tabs, with each tab focusing on a specific area of configuration. +**Key responsibilities**: +- Defines event types (messages, actions, observations, errors) +- Emitted by Conversation, Agent, Tools +- Enables logging, debugging, and monitoring +- Supports custom event handlers -- `User` - - Change your email address. -- `Integrations` - - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. - - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). -- `Application` - - Set your preferred language, notifications and other preferences. - - Toggle task suggestions on GitHub. - - Toggle Solvability Analysis. - - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). - - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). -- `LLM` - - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). -- `Billing` - - Add credits for using the OpenHands provider. -- `Secrets` - - [Manage secrets](/openhands/usage/settings/secrets-settings). -- `API Keys` - - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). -- `MCP` - - [Setup an MCP server](/openhands/usage/settings/mcp-settings) +**Design decisions**: +- **Immutable**: Events are snapshots, not mutable objects +- **Serializable**: Can be logged, stored, replayed +- **Type-safe**: Pydantic models for all events -## Key Features +**Why events?** They provide a timeline of what happened during agent execution. Essential for: +- Debugging agent behavior +- Understanding decision-making +- Building observability dashboards +- Implementing custom logging -For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) -section of the documentation. +**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. -## Next Steps +**Learn more**: +- Guide: [Metrics and Observability](/sdk/guides/metrics) +- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) -- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). -- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +--- -### GitHub Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation.md +### 7. Condenser - Memory Management -## Prerequisites +**What it does**: Compresses conversation history when it gets too long. -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). +**Key responsibilities**: +- Monitors conversation length +- Summarizes older messages +- Preserves important context +- Keeps conversation within token limits -## Adding GitHub Repository Access +**Design decisions**: +- **Pluggable**: Different condensing strategies +- **Automatic**: Triggered when context gets large +- **Preserves semantics**: Important information retained -You can grant OpenHands access to specific GitHub repositories: +**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. -1. Click on `+ Add GitHub Repos` in the repository selection dropdown. -2. Select your organization and choose the specific repositories to grant OpenHands access to. - - - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: - - Actions: Read and write - - Commit statuses: Read and write - - Contents: Read and write - - Issues: Read and write - - Metadata: Read-only - - Pull requests: Read and write - - Webhooks: Read and write - - Workflows: Read and write - - Repository access for a user is granted based on: - - Permission granted for the repository - - User's GitHub permissions (owner/collaborator) - +**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. -3. Click `Install & Authorize`. +**Example strategies**: +- Summarize old messages +- Keep only last N turns +- Preserve task-related messages -## Modifying Repository Access +**Learn more**: +- Guide: [Context Condenser](/sdk/guides/context-condenser) +- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) -You can modify GitHub repository access at any time by: -- Selecting `+ Add GitHub Repos` in the repository selection dropdown or -- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` +--- -## Working With GitHub Repos in Openhands Cloud +### 8. MCP - Model Context Protocol -Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the -`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click -on `Launch` to start the conversation! +**What it does**: Integrates external tool servers via Model Context Protocol. -![Connect Repo](/openhands/static/img/connect-repo.png) +**Key responsibilities**: +- Connects to MCP-compatible tool servers +- Translates MCP tools to SDK tool format +- Manages server lifecycle +- Handles server communication -## Working on GitHub Issues and Pull Requests Using Openhands +**Design decisions**: +- **Standard protocol**: Uses MCP specification +- **Transparent integration**: MCP tools look like regular tools to agents +- **Process management**: Handles server startup/shutdown -To allow OpenHands to work directly from GitHub directly, you must -[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is -given, you can use OpenHands by labeling the issue or by tagging `@openhands`. +**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. -### Working with Issues +**When to use**: When you need tools that: +- Already have MCP servers (fetch, filesystem, etc.) +- Are too complex to rewrite as SDK tools +- Need to run in separate processes +- Are provided by third parties -On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: -1. Comment on the issue to let you know it is working on it. - - You can click on the link to track the progress on OpenHands Cloud. -2. Open a pull request if it determines that the issue has been successfully resolved. -3. Comment on the issue with a summary of the performed tasks and a link to the PR. +**Learn more**: +- Guide: [MCP Integration](/sdk/guides/mcp) +- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) +- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) -### Working with Pull Requests +--- -To get OpenHands to work on pull requests, mention `@openhands` in the comments to: -- Ask questions -- Request updates -- Get code explanations +### 9. Skills (formerly Microagents) - Behavior Modules - -The `@openhands` mention functionality in pull requests only works if the pull request is both -*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate -permissions to access both repositories. - +**What it does**: Specialized modules that modify agent behavior for specific tasks. +**Key responsibilities**: +- Provide domain-specific instructions +- Modify system prompts +- Guide agent decision-making +- Compose to create specialized agents -## Next Steps +**Design decisions**: +- **Composable**: Multiple skills can work together +- **Declarative**: Defined as configuration, not code +- **Reusable**: Share skills across agents -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. -### GitLab Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md +**Example skills**: +- GitHub operations (issue creation, PRs) +- Code review guidelines +- Documentation style enforcement +- Project-specific conventions -## Prerequisites +**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. -- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). - -## Adding GitLab Repository Access - -Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. - -## Working With GitLab Repos in Openhands Cloud - -After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and -branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! - -![Connect Repo](/openhands/static/img/connect-repo.png) +**Learn more**: +- Guide: [Agent Skills & Context](/sdk/guides/skill) +- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) -## Using Tokens with Reduced Scopes +--- -OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. -To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, -which will override the default token assigned to the agent. While the high-permission API token is still requested -and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. +### 10. Security - Validation & Sandboxing -## Working on GitLab Issues and Merge Requests Using Openhands +**What it does**: Validates inputs and enforces security constraints. - -This feature works for personal projects and is available for group projects with a -[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). +**Key responsibilities**: +- Input validation +- Command sanitization +- Path traversal prevention +- Resource limits -A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into -OpenHands Cloud. +**Design decisions**: +- **Defense in depth**: Multiple validation layers +- **Fail-safe**: Rejects suspicious inputs by default +- **Configurable**: Adjust security levels as needed - +**Why needed?** Agents execute arbitrary code and file operations. Security prevents: +- Malicious prompts escaping sandboxes +- Path traversal attacks +- Resource exhaustion +- Unintended system access -Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. +**When to customize**: When you need domain-specific validation rules or want to adjust security policies. -### Working with Issues +**Learn more**: +- Guide: [Security and Secrets](/sdk/guides/security) +- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) -On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +--- -1. Comment on the issue to let you know it is working on it. - - You can click on the link to track the progress on OpenHands Cloud. -2. Open a merge request if it determines that the issue has been successfully resolved. -3. Comment on the issue with a summary of the performed tasks and a link to the PR. +## How Components Work Together -### Working with Merge Requests +### Example: User asks agent to create a file -To get OpenHands to work on merge requests, mention `@openhands` in the comments to: +``` +1. User → Conversation: "Create a file called hello.txt with 'Hello World'" -- Ask questions -- Request updates -- Get code explanations +2. Conversation → Agent: New message event -## Managing GitLab Webhooks +3. Agent → LLM: Full conversation history + available tools -The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. +4. LLM → Agent: Tool call for FileEditorTool.create() -### Accessing Webhook Management +5. Agent → Tool System: Validate FileEditorAction -The webhook management table is available on the Integrations page when: +6. Tool System → Tool Executor: Execute action -- You are signed in to OpenHands Cloud with a GitLab account -- Your GitLab token is connected +7. Tool Executor → Workspace: Create file (local/docker/remote) -To access it: +8. Workspace → Tool Executor: Success -1. Navigate to the `Settings > Integrations` page -2. Find the GitLab section -3. If your GitLab token is connected, you'll see the webhook management table below the connection status +9. Tool Executor → Tool System: FileEditorObservation (success=true) -### Viewing Webhook Status +10. Tool System → Agent: Observation -The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. +11. Agent → LLM: Updated history with observation -- **Resource**: The name and full path of the project or group -- **Type**: Whether it's a "project" or "group" -- **Status**: The current webhook installation status: - - **Installed**: The webhook is active and working - - **Not Installed**: No webhook is currently installed - - **Failed**: A previous installation attempt failed (error details are shown below the status) +12. LLM → Agent: "File created successfully" -### Reinstalling Webhooks +13. Agent → Conversation: Done, final response -If a webhook is not installed or has failed, you can reinstall it: +14. Conversation → User: "File created successfully" +``` -1. Find the resource in the webhook management table -2. Click the `Reinstall` button in the Action column -3. The button will show `Reinstalling...` while the operation is in progress -4. Once complete, the status will update to reflect the result +Throughout this flow: +- **Events** are emitted for observability +- **Condenser** may trigger if history gets long +- **Skills** influence LLM's decision-making +- **Security** validates file paths and operations +- **MCP** could provide additional tools if configured - - To reinstall an existing webhook, you must first delete the current webhook - from the GitLab UI before using the Reinstall button in OpenHands Cloud. - +## Design Patterns -**Important behaviors:** +### Immutability -- The Reinstall button is disabled if the webhook is already installed -- Only one reinstall operation can run at a time -- After a successful reinstall, the button remains disabled to prevent duplicate installations -- If a reinstall fails, the error message is displayed below the status badge -- The resources list automatically refreshes after a reinstall completes +All core objects are immutable. Operations return new instances: -### Constraints and Limitations +```python +conversation = Conversation(...) +new_conversation = conversation.add_message(message) +# conversation is unchanged, new_conversation has the message +``` -- The webhook management table only displays resources that are accessible with your connected GitLab token -- Webhook installation requires Admin or Owner permissions on the GitLab project or group +**Why?** Makes debugging easier, enables time-travel, ensures serializability. -## Next Steps +### Composition Over Inheritance -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. +Agents are composed from: +- LLM provider +- Tool list +- Skill list +- Condenser strategy +- Security policy -### Getting Started -Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md +You don't subclass Agent - you configure it. -## Accessing OpenHands Cloud +**Why?** More flexible, easier to test, enables runtime configuration. -OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, -visit [app.all-hands.dev](https://app.all-hands.dev). +### Type Safety -You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: +Everything uses Pydantic models: +- Messages, actions, observations are typed +- Validation happens automatically +- Schemas generate from types -1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. -2. Review the permissions requested by OpenHands and authorize the application. - - OpenHands will require certain permissions from your account. To read more about these permissions, - you can click the `Learn more` link on the authorization page. -3. Review and accept the `terms of service` and select `Continue`. +**Why?** Catches errors early, provides IDE support, self-documenting. ## Next Steps -Once you've connected your account, you can: - -- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). -- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). -- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). -- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). -- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). - -### Jira Data Center Integration (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md - -# Jira Data Center Integration - -## Platform Configuration - -### Step 1: Create Service Account - -1. **Access User Management** - - Log in to Jira Data Center as administrator - - Go to **Administration** > **User Management** +### For Usage Examples -2. **Create User** - - Click **Create User** - - Username: `openhands-agent` - - Full Name: `OpenHands Agent` - - Email: `openhands@yourcompany.com` (replace with your preferred service account email) - - Password: Set a secure password - - Click **Create** +- [Getting Started](/sdk/getting-started) - Build your first agent +- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities +- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers +- [Conversation Management](/sdk/guides/convo-persistence) - State handling -3. **Assign Permissions** - - Add user to appropriate groups - - Ensure access to relevant projects - - Grant necessary project permissions +### For Related Architecture -### Step 2: Generate API Token +- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations +- [Workspace Architecture](/sdk/arch/workspace) - Execution environments +- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution -1. **Personal Access Tokens** - - Log in as the service account - - Go to **Profile** > **Personal Access Tokens** - - Click **Create token** - - Name: `OpenHands Cloud Integration` - - Expiry: Set appropriate expiration (recommend 1 year) - - Click **Create** - - **Important**: Copy and store the token securely +### For Implementation Details -### Step 3: Configure Webhook +- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code +- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code +- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples -1. **Create Webhook** - - Go to **Administration** > **System** > **WebHooks** - - Click **Create a WebHook** - - **Name**: `OpenHands Cloud Integration` - - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` - - Set a suitable webhook secret - - **Issue related events**: Select the following: - - Issue updated - - Comment created - - **JQL Filter**: Leave empty (or customize as needed) - - Click **Create** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +### Security +Source: https://docs.openhands.dev/sdk/arch/security.md ---- +The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. -## Workspace Integration +**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) -### Step 1: Log in to OpenHands Cloud +## Core Responsibilities -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +The Security system has four primary responsibilities: -### Step 2: Configure Jira Data Center Integration +1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions +2. **Confirmation Policy** - Determine when user approval is required based on risk +3. **Action Validation** - Enforce security policies before execution +4. **Audit Trail** - Record security decisions in event history -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Jira Data Center** section +## Architecture -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The personal access token from Step 2 above - - Ensure **Active** toggle is enabled - - -Workspace name is the host name of your Jira Data Center instance. - -Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["SecurityAnalyzerBase
Abstract analyzer"] + end + + subgraph Implementations["Concrete Analyzers"] + LLM["LLMSecurityAnalyzer
Inline risk prediction"] + NoOp["NoOpSecurityAnalyzer
No analysis"] + end + + subgraph Risk["Risk Levels"] + Low["LOW
Safe operations"] + Medium["MEDIUM
Moderate risk"] + High["HIGH
Dangerous ops"] + Unknown["UNKNOWN
Unanalyzed"] + end + + subgraph Policy["Confirmation Policy"] + Check["should_require_confirmation()"] + Mode["Confirmation Mode"] + Decision["Require / Allow"] + end + + Base --> LLM + Base --> NoOp + + Implementations --> Low + Implementations --> Medium + Implementations --> High + Implementations --> Unknown + + Low --> Check + Medium --> Check + High --> Check + Unknown --> Check + + Check --> Mode + Mode --> Decision + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + + class Base primary + class LLM secondary + class High danger + class Check tertiary +``` -Here the workspace name is **jira.all-hands.dev**. -
+### Key Components -3. **Complete OAuth Flow** - - You'll be redirected to Jira Data Center to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | +| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | +| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | +| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | +| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | -### Managing Your Integration +## Risk Levels -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view +Security analyzers return one of four risk levels: -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + Action["ActionEvent"] + Analyze["Security Analyzer"] + + subgraph Levels["Risk Levels"] + Low["LOW
Read-only, safe"] + Medium["MEDIUM
Modify files"] + High["HIGH
Delete, execute"] + Unknown["UNKNOWN
Not analyzed"] + end + + Action --> Analyze + Analyze --> Low + Analyze --> Medium + Analyze --> High + Analyze --> Unknown + + style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px + style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px + style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px +``` -### Screenshots +### Risk Level Definitions - - -![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) - +| Level | Characteristics | Examples | +|-------|----------------|----------| +| **LOW** | Read-only, no state changes | File reading, directory listing, search | +| **MEDIUM** | Modifies user data | File editing, creating files, API calls | +| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | +| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | - -![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) - +## Security Analyzers - -![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) - +### LLMSecurityAnalyzer - -![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) - - +Leverages the LLM's inline risk assessment during action generation: -### Jira Cloud Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Schema["Tool Schema
+ security_risk param"] + LLM["LLM generates action
with security_risk"] + ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] + Extract["Extract security_risk
from arguments"] + ActionEvent["ActionEvent
with security_risk set"] + Analyzer["LLMSecurityAnalyzer
returns security_risk"] + + Schema --> LLM + LLM --> ToolCall + ToolCall --> Extract + Extract --> ActionEvent + ActionEvent --> Analyzer + + style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -# Jira Cloud Integration +**Analysis Process:** -## Platform Configuration +1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema +2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments +3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments +4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` +5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level +6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step -### Step 1: Create Service Account +**Example Tool Call:** +```json +{ + "name": "execute_bash", + "arguments": { + "command": "rm -rf /tmp/cache", + "security_risk": "HIGH" + } +} +``` -1. **Navigate to User Management** - - Go to [Atlassian Admin](https://admin.atlassian.com/) - - Select your organization - - Go to **Directory** > **Users** +The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. -2. **Create OpenHands Service Account** - - Click **Service accounts** - - Click **Create a service account** - - Name: `OpenHands Agent` - - Click **Next** - - Select **User** role for Jira app - - Click **Create** +**Configuration:** +- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent +- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools +- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation -### Step 2: Generate API Token +### NoOpSecurityAnalyzer -1. **Access Service Account Configuration** - - Locate the created service account from above step and click on it - - Click **Create API token** - - Set the expiry to 365 days (maximum allowed value) - - Click **Next** - - In **Select token scopes** screen, filter by following values - - App: Jira - - Scope type: Classic - - Scope actions: Write, Read - - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes - - Click **Next** - - Review and create API token - - **Important**: Copy and securely store the token immediately +Passthrough analyzer that skips analysis: -### Step 3: Configure Webhook +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Action["ActionEvent"] + NoOp["NoOpSecurityAnalyzer"] + Unknown["SecurityRisk.UNKNOWN"] + + Action --> NoOp --> Unknown + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -1. **Navigate to Webhook Settings** - - Go to **Jira Settings** > **System** > **WebHooks** - - Click **Create a WebHook** +**Use Case:** Development, trusted environments, or when confirmation mode handles all actions -2. **Configure Webhook** - - **Name**: `OpenHands Cloud Integration` - - **Status**: Enabled - - **URL**: `https://app.all-hands.dev/integration/jira/events` - - **Issue related events**: Select the following: - - Issue updated - - Comment created - - **JQL Filter**: Leave empty (or customize as needed) - - Click **Create** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +## Confirmation Policy ---- +The confirmation policy determines when user approval is required. There are three policy implementations: -## Workspace Integration +**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) -### Step 1: Log in to OpenHands Cloud +### Policy Types -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +| Policy | Behavior | Use Case | +|--------|----------|----------| +| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | +| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | +| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | -### Step 2: Configure Jira Integration +### ConfirmRisky (Default Policy) -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Jira Cloud** section +The most flexible policy with configurable thresholds: -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The API token from Step 2 above - - Ensure **Active** toggle is enabled - - -Workspace name is the host name when accessing a resource in Jira Cloud. - -Eg: https://all-hands.atlassian.net/browse/OH-55 +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Risk["SecurityRisk"] + CheckUnknown{"Risk ==
UNKNOWN?"} + UseConfirmUnknown{"confirm_unknown
setting?"} + CheckThreshold{"risk.is_riskier
(threshold)?"} + + Confirm["Require Confirmation"] + Allow["Allow Execution"] + + Risk --> CheckUnknown + CheckUnknown -->|Yes| UseConfirmUnknown + CheckUnknown -->|No| CheckThreshold + + UseConfirmUnknown -->|True| Confirm + UseConfirmUnknown -->|False| Allow + + CheckThreshold -->|Yes| Confirm + CheckThreshold -->|No| Allow + + style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px +``` -Here the workspace name is **all-hands**. -
+**Configuration:** +- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required + - Cannot be set to `UNKNOWN` + - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` +- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation -3. **Complete OAuth Flow** - - You'll be redirected to Jira Cloud to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI +### Confirmation Rules by Policy -### Managing Your Integration +#### ConfirmRisky with threshold=HIGH (Default) -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view +| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | +|------------|----------------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | ✅ Allow | ✅ Allow | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. +#### ConfirmRisky with threshold=MEDIUM -### Screenshots +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | - - -![workspace-link.png](/openhands/static/img/jira-user-link.png) - +#### ConfirmRisky with threshold=LOW - -![workspace-link.png](/openhands/static/img/jira-admin-configure.png) - +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | - -![workspace-link.png](/openhands/static/img/jira-user-unlink.png) - +**Key Rules:** +- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` +- **UNKNOWN handling** is configurable via `confirm_unknown` flag +- **Threshold cannot be UNKNOWN** - validated at policy creation time - -![workspace-link.png](/openhands/static/img/jira-admin-edit.png) - - -### Linear Integration (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md +## Component Relationships -# Linear Integration +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Security["Security Analyzer"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + MCP["MCP Tools"] + + Agent -->|Validates actions| Security + Security -->|Checks| Tools + Security -->|Uses hints| MCP + Conversation -->|Pauses for confirmation| Agent + + style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -## Platform Configuration +**Relationship Characteristics:** +- **Agent → Security**: Validates actions before execution +- **Security → Tools**: Examines tool characteristics (annotations) +- **Security → MCP**: Uses MCP hints for risk assessment +- **Conversation → Agent**: Pauses for user confirmation when required +- **Optional Component**: Security analyzer can be disabled for trusted environments -### Step 1: Create Service Account +## See Also -1. **Access Team Settings** - - Log in to Linear as a team admin - - Go to **Settings** > **Members** +- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers +- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints +- **[Security Guide](/sdk/guides/security)** - Configuring security policies -2. **Invite Service Account** - - Click **Invite members** - - Email: `openhands@yourcompany.com` (replace with your preferred service account email) - - Role: **Member** (with appropriate team access) - - Send invitation +### Skill +Source: https://docs.openhands.dev/sdk/arch/skill.md -3. **Complete Setup** - - Accept invitation from the service account email - - Complete profile setup - - Ensure access to relevant teams/workspaces +The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. -### Step 2: Generate API Key +**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) -1. **Access API Settings** - - Log in as the service account - - Go to **Settings** > **Security & access** +## Core Responsibilities -2. **Create Personal API Key** - - Click **Create new key** - - Name: `OpenHands Cloud Integration` - - Scopes: Select the following: - - `Read` - Read access to issues and comments - - `Create comments` - Ability to create or update comments - - Select the teams you want to provide access to, or allow access for all teams you have permissions for - - Click **Create** - - **Important**: Copy and store the API key securely +The Skill system has four primary responsibilities: -### Step 3: Configure Webhook +1. **Context Injection** - Add specialized prompts to agent context based on triggers +2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) +3. **MCP Integration** - Load MCP tools associated with repository skills +4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats -1. **Access Webhook Settings** - - Go to **Settings** > **API** > **Webhooks** - - Click **New webhook** +## Architecture -2. **Configure Webhook** - - **Label**: `OpenHands Cloud Integration` - - **URL**: `https://app.all-hands.dev/integration/linear/events` - - **Resource types**: Select: - - `Comment` - For comment events - - `Issue` - For issue updates (label changes) - - Select the teams you want to provide access to, or allow access for all public teams - - Click **Create webhook** - - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Types["Skill Types"] + Repo["Repository Skill
trigger: None"] + Knowledge["Knowledge Skill
trigger: KeywordTrigger"] + Task["Task Skill
trigger: TaskTrigger"] + end + + subgraph Triggers["Trigger Evaluation"] + Always["Always Active
Repository guidelines"] + Keyword["Keyword Match
String matching on user messages"] + TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] + end + + subgraph Content["Skill Content"] + Markdown["Markdown with Frontmatter"] + MCPTools["MCP Tools Config
Repo skills only"] + Inputs["Input Metadata
Task skills only"] + end + + subgraph Integration["Agent Integration"] + Context["Agent Context"] + Prompt["System Prompt"] + end + + Repo --> Always + Knowledge --> Keyword + Task --> TaskMatch + + Always --> Markdown + Keyword --> Markdown + TaskMatch --> Markdown + + Repo -.->|Optional| MCPTools + Task -.->|Requires| Inputs + + Markdown --> Context + MCPTools --> Context + Context --> Prompt + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Repo,Knowledge,Task primary + class Always,Keyword,TaskMatch secondary + class Context tertiary +``` ---- +### Key Components -## Workspace Integration +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | +| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | +| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | +| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | +| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | -### Step 1: Log in to OpenHands Cloud +## Skill Types -1. **Navigate and Authenticate** - - Go to [OpenHands Cloud](https://app.all-hands.dev/) - - Sign in with your Git provider (GitHub, GitLab, or BitBucket) - - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +### Repository Skills -### Step 2: Configure Linear Integration +Always-active, repository-specific guidelines. -1. **Access Integration Settings** - - Navigate to **Settings** > **Integrations** - - Locate **Linear** section +**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. -2. **Configure Workspace** - - Click **Configure** button - - Enter your workspace name and click **Connect** - - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: - - **Webhook Secret**: The webhook secret from Step 3 above - - **Service Account Email**: The service account email from Step 1 above - - **Service Account API Key**: The API key from Step 2 above - - Ensure **Active** toggle is enabled +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + File["AGENTS.md"] + Parse["Parse Frontmatter"] + Skill["Skill(trigger=None)"] + Context["Always in Context"] + + File --> Parse + Parse --> Skill + Skill --> Context + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` - -Workspace name is the identifier after the host name when accessing a resource in Linear. +**Characteristics:** +- **Trigger:** `None` (always active) +- **Purpose:** Project conventions, coding standards, architecture rules +- **MCP Tools:** Can include MCP tool configuration +- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) -Eg: https://linear.app/allhands/issue/OH-37 +**Example Files (permanent context):** +- `AGENTS.md` - General agent instructions +- `GEMINI.md` - Gemini-specific instructions +- `CLAUDE.md` - Claude-specific instructions -Here the workspace name is **allhands**. - +**Other supported formats:** +- `.cursorrules` - Cursor IDE guidelines +- `agents.md` / `agent.md` - General agent instructions -3. **Complete OAuth Flow** - - You'll be redirected to Linear to complete OAuth verification - - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided - - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI +### Knowledge Skills -### Managing Your Integration +Keyword-triggered skills for specialized domains: -**Edit Configuration:** -- Click the **Edit** button next to your configured platform -- Update any necessary credentials or settings -- Click **Update** to apply changes -- You will need to repeat the OAuth flow as before -- **Important:** Only the original user who created the integration can see the edit view +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Check["Check Keywords"] + Match{"Match?"} + Activate["Activate Skill"] + Skip["Skip Skill"] + Context["Add to Context"] + + User --> Check + Check --> Match + Match -->|Yes| Activate + Match -->|No| Skip + Activate --> Context + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -**Unlink Workspace:** -- In the edit view, click **Unlink** next to the workspace name -- This will deactivate your workspace link -- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. +**Characteristics:** +- **Trigger:** `KeywordTrigger` with regex patterns +- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") +- **Activation:** Keywords detected in user messages +- **Location:** System or user-defined knowledge base -### Screenshots +**Trigger Example:** +```yaml +--- +name: kubernetes +trigger: + type: keyword + keywords: ["kubernetes", "k8s", "kubectl"] +--- +``` - - -![workspace-link.png](/openhands/static/img/linear-user-link.png) - +### Task Skills - -![workspace-link.png](/openhands/static/img/linear-admin-configure.png) - +Keyword-triggered skills with structured inputs for guided workflows: - -![workspace-link.png](/openhands/static/img/linear-admin-edit.png) - +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Match{"Keyword
Match?"} + Inputs["Collect User Inputs"] + Template["Apply Template"] + Context["Add to Context"] + Skip["Skip Skill"] + + User --> Match + Match -->|Yes| Inputs + Match -->|No| Skip + Inputs --> Template + Template --> Context + + style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` - -![workspace-link.png](/openhands/static/img/linear-admin-edit.png) - -
+**Characteristics:** +- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) +- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) +- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) +- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) +- **Location:** System-defined or custom task templates -### Project Management Tool Integrations (Coming soon...) -Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md +**Trigger Example:** +```yaml +--- +name: bug_fix +triggers: ["/bug_fix", "fix bug", "bug report"] +inputs: + - name: bug_description + description: "Describe the bug" + required: true +--- +``` -# Project Management Tool Integrations +**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. -## Overview +## Trigger Evaluation -OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: -- Adding `@openhands` in ticket comments -- Adding the `openhands` label to tickets +Skills are evaluated at different points in the agent lifecycle: -## Prerequisites +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent Step Start"] + + Repo["Check Repository Skills
trigger: None"] + AddRepo["Always Add to Context"] + + Message["Check User Message"] + Keyword["Match Keyword Triggers"] + AddKeyword["Add Matched Skills"] + + TaskType["Check Task Type"] + TaskMatch["Match Task Triggers"] + AddTask["Add Task Skill"] + + Build["Build Agent Context"] + + Start --> Repo + Repo --> AddRepo + + Start --> Message + Message --> Keyword + Keyword --> AddKeyword + + Start --> TaskType + TaskType --> TaskMatch + TaskMatch --> AddTask + + AddRepo --> Build + AddKeyword --> Build + AddTask --> Build + + style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -Integration requires two levels of setup: -1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) -2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace +**Evaluation Rules:** -### Platform-Specific Setup Guides: -- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) -- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) -- [Linear Integration (Coming soon...)](./linear-integration.md) +| Trigger Type | Evaluation Point | Activation Condition | +|--------------|------------------|----------------------| +| **None** | Every step | Always active | +| **KeywordTrigger** | On user message | Keyword/string match in message | +| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | -## Usage +**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. -Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: +## MCP Tool Integration -### Method 1: Comment Mention -Add a comment to any issue with `@openhands` followed by your task description: -``` -@openhands Please implement the user authentication feature described in this ticket +Repository skills can include MCP tool configurations: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skill["Repository Skill"] + MCPConfig["mcp_tools Config"] + Client["MCP Client"] + Tools["Tool Registry"] + + Skill -->|Contains| MCPConfig + MCPConfig -->|Spawns| Client + Client -->|Registers| Tools + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### Method 2: Label-based Delegation -Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. +**MCP Configuration Format:** -### Git Repository Detection +Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): -The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: +```yaml +--- +name: repo_skill +mcp_tools: + mcpServers: + filesystem: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] +--- +``` -#### Specifying the Target Repository +**Workflow:** +1. **Load Skill:** Parse markdown file with frontmatter +2. **Extract MCP Config:** Read `mcp_tools` field +3. **Spawn MCP Servers:** Create MCP clients for each server +4. **Register Tools:** Add MCP tools to agent's tool registry +5. **Inject Context:** Add skill content to agent prompt -**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. +## Skill File Format -**Supported Repository Formats:** -- Full HTTPS URL: `https://github.com/owner/repository.git` -- GitHub URL without .git: `https://github.com/owner/repository` -- Owner/repository format: `owner/repository` - -#### Platform-Specific Behavior - -**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. - -**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. - -## Troubleshooting +Skills are defined in markdown files with YAML frontmatter: -### Platform Configuration Issues -- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) -- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings -- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions +```markdown +--- +name: skill_name +trigger: + type: keyword + keywords: ["pattern1", "pattern2"] +--- -### Workspace Integration Issues -- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) -- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first -- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access +# Skill Content -### General Issues -- **Agent not responding**: Check webhook logs in your platform settings and verify service account status -- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access -- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on -- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed +This is the instruction text that will be added to the agent's context. +``` -### Getting Help -For additional support, contact OpenHands Cloud support with: -- Your integration platform (Linear, Jira Cloud, or Jira Data Center) -- Workspace name -- Error logs from webhook/integration attempts -- Screenshots of configuration settings (without sensitive credentials) +**Frontmatter Fields:** -### Slack Integration -Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md +| Field | Required | Description | +|-------|----------|-------------| +| **name** | Yes | Unique skill identifier | +| **trigger** | Yes* | Activation trigger (`null` for always active) | +| **mcp_tools** | No | MCP server configuration (repo skills only) | +| **inputs** | No | User input metadata (task skills only) | - +*Repository skills use `trigger: null` (or omit trigger field) - -OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. -While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to -validate critical information independently. - +## Component Relationships -## Prerequisites +### How Skills Integrate -- Access to OpenHands Cloud. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skills["Skill System"] + Context["Agent Context"] + Agent["Agent"] + MCP["MCP Client"] + + Skills -->|Injects content| Context + Skills -.->|Spawns tools| MCP + Context -->|System prompt| Agent + MCP -->|Tool| Agent + + style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` -## Installation Steps +**Relationship Characteristics:** +- **Skills → Agent Context**: Active skills contribute their content to system prompt +- **Skills → MCP**: Repository skills can spawn MCP servers and register tools +- **Context → Agent**: Combined skill content becomes part of agent's instructions +- **Skills Lifecycle**: Loaded at conversation start, evaluated each step - - +## See Also - **This step is for Slack admins/owners** +- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context +- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management +- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications - 1. Make sure you have permissions to install Apps to your workspace. - 2. Click the button below to install OpenHands Slack App Add to Slack - 3. In the top right corner, select the workspace to install the OpenHands Slack app. - 4. Review permissions and click allow. +### Tool System & MCP +Source: https://docs.openhands.dev/sdk/arch/tool-system.md - +The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. - +**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) - **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** +## Core Responsibilities - Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: - 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. - 2. Click `Install OpenHands Slack App`. - 3. In the top right corner, select the workspace to install the OpenHands Slack app. - 4. Review permissions and click allow. +The Tool System has four primary responsibilities: - Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. +1. **Type Safety** - Enforce action/observation schemas via Pydantic models +2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas +3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs +4. **Tool Registry** - Discover and resolve tools by name or pattern - +## Tool System - +### Architecture Overview +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Definition["Tool Definition"] + Action["Action
Input schema"] + Observation["Observation
Output schema"] + Executor["Executor
Business logic"] + end + + subgraph Framework["Tool Framework"] + Base["ToolBase
Abstract base"] + Impl["Tool Implementation
Concrete tool"] + Registry["Tool Registry
Spec → Tool"] + end -## Working With the Slack App + Agent["Agent"] + LLM["LLM"] + ToolSpec["Tool Spec
name + params"] -To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. + Base -.->|Extends| Impl + + ToolSpec -->|resolve_tool| Registry + Registry -->|Create instances| Impl + Impl -->|Available in| Agent + Impl -->|Generate schema| LLM + LLM -->|Generate tool call| Agent + Agent -->|Parse & validate| Action + Agent -->|Execute via Tool.\_\_call\_\_| Executor + Executor -->|Return| Observation + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Action,Observation,Executor secondary + class Registry tertiary +``` -Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. +### Key Components -To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. -You must be the user who started the conversation. +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | +| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | +| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | +| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | +| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | +| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | +| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | +| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | -## Example conversation +### Action-Observation Pattern -### Start a new conversation, and select repo +The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. -Conversation is started by mentioning `@openhands`. +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Agent["Agent Layer"] + ToolCall["MessageToolCall
from LLM"] + ParseJSON["Parse JSON
arguments"] + CreateAction["tool.action_from_arguments()
Pydantic validation"] + WrapAction["ActionEvent
wraps Action"] + WrapObs["ObservationEvent
wraps Observation"] + Error["AgentErrorEvent"] + end + + subgraph ToolSystem["Tool System"] + ActionType["Action
Pydantic model"] + ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] + Execute["ToolExecutor
business logic"] + ObsType["Observation
Pydantic model"] + end + + ToolCall --> ParseJSON + ParseJSON -->|Valid JSON| CreateAction + ParseJSON -->|Invalid JSON| Error + CreateAction -->|Valid| ActionType + CreateAction -->|Invalid| Error + ActionType --> WrapAction + ActionType --> ToolCall2 + ToolCall2 --> Execute + Execute --> ObsType + ObsType --> WrapObs + + style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px + style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px +``` -![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) +**Tool System Boundary:** +- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance +- **Output**: `Observation` instance with structured result +- **No knowledge of**: Events, LLM messages, conversation state -### See agent response and send follow up messages +### Tool Definition -Initial request is followed up by mentioning `@openhands` in a thread reply. +Tools are defined using two patterns depending on complexity: -![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) +#### Pattern 1: Direct Instantiation (Simple Tools) -## Pro tip +For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): -You can mention a repo name when starting a new conversation in the following formats +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
stateless logic"] + Tool["ToolDefinition(...,
executor=Executor())"] + + Action --> Tool + Obs --> Tool + Exec --> Tool + + style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` -1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) -2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) +**Components:** +1. **Action** - Pydantic model with `visualize` property for display +2. **Observation** - Pydantic model with `to_llm_content` property for LLM +3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` +4. **ToolDefinition** - Direct instantiation with executor instance -The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. -If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. +#### Pattern 2: Subclass with Factory (Stateful Tools) -![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) +For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): -## OpenHands CLI - -### OpenHands Cloud -Source: https://docs.openhands.dev/openhands/usage/cli/cloud.md +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
with \_\_init\_\_ and state"] + Subclass["class MyTool(ToolDefinition)
with create() method"] + Instance["Return [MyTool(...,
executor=instance)]"] + + Action --> Subclass + Obs --> Subclass + Exec --> Subclass + Subclass --> Instance + + style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -## Overview +**Components:** +1. **Action/Observation** - Same as Pattern 1 +2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup +3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method +4. **Factory Method** - Returns sequence of configured tool instances -The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Pattern1["Pattern 1: Direct Instantiation"] + P1A["Define Action/Observation
with visualize/to_llm_content"] + P1E["Define ToolExecutor
with \_\_call\_\_()"] + P1T["ToolDefinition(...,
executor=Executor())"] + end + + subgraph Pattern2["Pattern 2: Subclass with Factory"] + P2A["Define Action/Observation
with visualize/to_llm_content"] + P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] + P2C["class MyTool(ToolDefinition)
@classmethod create()"] + P2I["Return [MyTool(...,
executor=instance)]"] + end + + P1A --> P1E + P1E --> P1T + + P2A --> P2E + P2E --> P2C + P2C --> P2I + + style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` -- Authenticate with your OpenHands Cloud account -- Create new cloud conversations -- Use cloud resources without the web interface +**Key Design Elements:** -## Authentication +| Component | Purpose | Requirements | +|-----------|---------|--------------| +| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | +| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | +| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | +| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | -### Login +**When to Use Each Pattern:** -Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: +| Pattern | Use Case | Examples | +|---------|----------|----------| +| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | +| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | -```bash -openhands login -``` +### Tool Annotations -This opens a browser window for authentication. After successful login, your credentials are stored locally. +Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: -#### Custom Server URL +| Field | Meaning | Examples | +|-------|---------|----------| +| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | +| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | +| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | +| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | -For self-hosted or enterprise deployments: +**Key Behaviors:** +- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` +- Annotations help LLMs reason about tool safety and side effects -```bash -openhands login --server-url https://your-openhands-server.com -``` +### Tool Registry -You can also set the server URL via environment variable: +The registry enables **dynamic tool discovery** and instantiation from tool specifications: -```bash -export OPENHANDS_CLOUD_URL=https://your-openhands-server.com -openhands login +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + ToolSpec["Tool Spec
name + params"] + + subgraph Registry["Tool Registry"] + Resolver["Resolver
name → factory"] + Factory["Factory
create(params)"] + end + + Instance["Tool Instance
with executor"] + Agent["Agent"] + + ToolSpec -->|"resolve_tool(spec)"| Resolver + Resolver -->|Lookup factory| Factory + Factory -->|"create(**params)"| Instance + Instance -->|Used by| Agent + + style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -### Logout - -Log out from OpenHands Cloud: +**Resolution Workflow:** -```bash -# Log out from all servers -openhands logout +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution -# Log out from a specific server -openhands logout --server-url https://app.all-hands.dev -``` +**Registration Types:** -## Creating Cloud Conversations +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | -Create a new conversation in OpenHands Cloud: +### File Organization -```bash -# With a task -openhands cloud -t "Review the codebase and suggest improvements" +Tools follow a consistent file structure for maintainability: -# From a file -openhands cloud -f task.txt +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities ``` -### Options - -| Option | Description | -|--------|-------------| -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | +**File Responsibilities:** -### Examples +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | -```bash -# Create a cloud conversation with a task -openhands cloud -t "Fix the authentication bug in login.py" +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability -# Create from a task file -openhands cloud -f requirements.txt +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation -# Use a custom server -openhands cloud --server-url https://custom.server.com -t "Add unit tests" -# Combine with environment variable -export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev -openhands cloud -t "Refactor the database module" -``` +## MCP Integration -## Workflow +The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. -A typical workflow with OpenHands Cloud: +**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) -1. **Login once**: - ```bash - openhands login - ``` +### Architecture Overview -2. **Create conversations as needed**: - ```bash - openhands cloud -t "Your task here" - ``` +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph External["External MCP Server"] + Server["MCP Server
stdio/HTTP"] + ExtTools["External Tools"] + end + + subgraph Bridge["MCP Integration Layer"] + MCPClient["MCPClient
Sync/Async bridge"] + Convert["Schema Conversion
MCP → MCPToolDefinition"] + MCPExec["MCPToolExecutor
Bridges to MCP calls"] + end + + subgraph Agent["Agent System"] + ToolsMap["tools_map
str -> ToolDefinition"] + AgentLogic["Agent Execution"] + end + + Server -.->|Spawns| ExtTools + MCPClient --> Server + Server --> Convert + Convert -->|create_mcp_tools| MCPExec + MCPExec -->|Added during
agent.initialize| ToolsMap + ToolsMap --> AgentLogic + AgentLogic -->|Tool call| MCPExec + MCPExec --> MCPClient + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class MCPClient primary + class Convert,MCPExec secondary + class Server,ExtTools external +``` -3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server +### Key Components -## Environment Variables +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | MCP server connection | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Tool wrapper | Wraps MCP tools as SDK `ToolDefinition` with dynamic validation | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP tool calls via MCPClient | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Generic action wrapper | Simple `dict[str, Any]` wrapper for MCP tool arguments | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results as observations with content blocks | +| **[`_create_mcp_action_type()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Dynamic schema | Runtime Pydantic model generated from MCP `inputSchema` for validation | -| Variable | Description | -|----------|-------------| -| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | +### Sync/Async Bridge -## Cloud vs Local +MCP protocol is asynchronous, but SDK tools execute synchronously. The bridge pattern in [client.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py) solves this: -| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | -|---------|---------------------------|---------------------| -| Compute | Cloud-hosted | Your machine | -| Persistence | Cloud storage | Local files | -| Collaboration | Share via link | Local only | -| Setup | Just login | Configure LLM & runtime | -| Cost | Subscription/usage-based | Your LLM API costs | - - -Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. - - -## See Also - -- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation -- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide -- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access - -### Command Reference -Source: https://docs.openhands.dev/openhands/usage/cli/command-reference.md - -## Basic Usage - -```bash -openhands [OPTIONS] [COMMAND] +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Sync["Sync Tool Execution"] + Bridge["call_async_from_sync()"] + Loop["Background Event Loop"] + Async["Async MCP Call"] + Result["Return Result"] + + Sync --> Bridge + Bridge --> Loop + Loop --> Async + Async --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Loop fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -## Global Options - -| Option | Description | -|--------|-------------| -| `-v, --version` | Show version number and exit | -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | -| `--last` | Resume the most recent conversation (use with `--resume`) | -| `--exp` | Use textual-based UI (now default, kept for compatibility) | -| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | -| `--json` | Enable JSONL output (requires `--headless`) | -| `--always-approve` | Auto-approve all actions without confirmation | -| `--llm-approve` | Use LLM-based security analyzer for action approval | -| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | -| `--exit-without-confirmation` | Exit without showing confirmation dialog | - -## Subcommands - -### serve - -Launch the OpenHands GUI server using Docker. +**Bridge Features:** +- **Background Event Loop** - Executes async code from sync contexts +- **Timeout Support** - Configurable timeouts for MCP operations +- **Error Handling** - Wraps MCP errors in observations +- **Connection Pooling** - Reuses connections across tool calls -```bash -openhands serve [OPTIONS] -``` +### Tool Discovery Flow -| Option | Description | -|--------|-------------| -| `--mount-cwd` | Mount the current working directory into the container | -| `--gpu` | Enable GPU support via nvidia-docker | +**Source:** [`create_mcp_tools()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/utils.py) | [`agent._initialize()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py) -**Examples:** -```bash -openhands serve -openhands serve --mount-cwd -openhands serve --gpu -openhands serve --mount-cwd --gpu +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Server Config
command + args"] + Spawn["Spawn Server Process
MCPClient"] + List["List Available Tools
client.list_tools()"] + + subgraph Convert["For Each MCP Tool"] + Store["Store MCP metadata
name, description, inputSchema"] + CreateExec["Create MCPToolExecutor
bound to tool + client"] + Def["Create MCPToolDefinition
generic MCPToolAction type"] + end + + Register["Add to Agent's tools_map
bypasses ToolRegistry"] + Ready["Tools Available
Dynamic models created on-demand"] + + Config --> Spawn + Spawn --> List + List --> Store + Store --> CreateExec + CreateExec --> Def + Def --> Register + Register --> Ready + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Def fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### web - -Launch the CLI as a web application accessible via browser. +**Discovery Steps:** +1. **Spawn Server** - Launch MCP server via stdio protocol (using `MCPClient`) +2. **List Tools** - Call MCP `tools/list` endpoint to retrieve available tools +3. **Parse Schemas** - Extract tool names, descriptions, and `inputSchema` from MCP response +4. **Create Definitions** - For each tool, call `MCPToolDefinition.create()` which: + - Creates an `MCPToolExecutor` instance bound to the tool name and client + - Wraps the MCP tool metadata in `MCPToolDefinition` + - Uses generic `MCPToolAction` as the action type (NOT dynamic models yet) +5. **Add to Agent** - All `MCPToolDefinition` instances are added to agent's `tools_map` during `initialize()` (bypasses ToolRegistry) +6. **Lazy Validation** - Dynamic Pydantic models are generated lazily when: + - `action_from_arguments()` is called (argument validation) + - `to_openai_tool()` is called (schema export to LLM) -```bash -openhands web [OPTIONS] -``` +**Schema Handling:** -| Option | Default | Description | -|--------|---------|-------------| -| `--host` | `0.0.0.0` | Host to bind the web server to | -| `--port` | `12000` | Port to bind the web server to | -| `--debug` | `false` | Enable debug mode | +| MCP Schema | SDK Integration | When Used | +|------------|----------------|-----------| +| `name` | Tool name (stored in `MCPToolDefinition`) | Discovery, execution | +| `description` | Tool description for LLM | Discovery, LLM prompt | +| `inputSchema` | Stored in `mcp_tool.inputSchema` | Lazy model generation | +| `inputSchema` fields | Converted to Pydantic fields via `Schema.from_mcp_schema()` | Validation, schema export | +| `annotations` | Mapped to `ToolAnnotations` | Security analysis, LLM hints | -**Examples:** -```bash -openhands web -openhands web --port 8080 -openhands web --host 127.0.0.1 --port 3000 -openhands web --debug -``` +### MCP Server Configuration -### cloud +MCP servers are configured via the `mcp_config` field on the `Agent` class. Configuration follows [FastMCP config format](https://gofastmcp.com/clients/client#configuration-format): -Create a new conversation in OpenHands Cloud. +```python +from openhands.sdk import Agent -```bash -openhands cloud [OPTIONS] +agent = Agent( + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } + } +) ``` -| Option | Description | -|--------|-------------| -| `-t, --task TEXT` | Initial task to seed the conversation | -| `-f, --file PATH` | Path to a file whose contents seed the conversation | -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | +## Component Relationships -**Examples:** -```bash -openhands cloud -t "Fix the bug" -openhands cloud -f task.txt -openhands cloud --server-url https://custom.server.com -t "Task" +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Sources["Tool Sources"] + Native["Native Tools"] + MCP["MCP Tools"] + end + + Registry["Tool Registry
resolve_tool"] + ToolsMap["Agent.tools_map
Merged tool dict"] + + subgraph AgentSystem["Agent System"] + Agent["Agent Logic"] + LLM["LLM"] + end + + Security["Security Analyzer"] + Conversation["Conversation State"] + + Native -->|register_tool| Registry + Registry --> ToolsMap + MCP -->|create_mcp_tools| ToolsMap + ToolsMap -->|Provide schemas| LLM + Agent -->|Execute tools| ToolsMap + ToolsMap -.->|Action risk| Security + ToolsMap -.->|Read state| Conversation + + style ToolsMap fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### acp +**Relationship Characteristics:** +- **Native → Registry → tools_map**: Native tools resolved via `ToolRegistry` +- **MCP → tools_map**: MCP tools bypass registry, added directly during `initialize()` +- **tools_map → LLM**: Generate schemas describing all available capabilities +- **Agent → tools_map**: Execute actions, receive observations +- **tools_map → Conversation**: Read state for context-aware execution +- **tools_map → Security**: Tool annotations inform risk assessment -Start the Agent Client Protocol server for IDE integrations. +## See Also -```bash -openhands acp [OPTIONS] -``` +- **[Agent Architecture](/sdk/arch/agent)** - How agents select and execute tools +- **[Events](/sdk/arch/events)** - ActionEvent and ObservationEvent structures +- **[Security Analyzer](/sdk/arch/security)** - Action risk assessment +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library -| Option | Description | -|--------|-------------| -| `--resume [ID]` | Resume a conversation by ID | -| `--last` | Resume the most recent conversation | -| `--always-approve` | Auto-approve all actions | -| `--llm-approve` | Use LLM-based security analyzer | -| `--streaming` | Enable token-by-token streaming | +### Workspace +Source: https://docs.openhands.dev/sdk/arch/workspace.md -**Examples:** -```bash -openhands acp -openhands acp --llm-approve -openhands acp --resume abc123def456 -openhands acp --resume --last -``` +The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. -### mcp +**Source:** [`openhands/sdk/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) -Manage Model Context Protocol server configurations. +## Core Responsibilities -```bash -openhands mcp [OPTIONS] -``` +The Workspace system has four primary responsibilities: -#### mcp add +1. **Execution Abstraction** - Unified interface for command execution across environments +2. **File Operations** - Upload, download, and manipulate files in workspace +3. **Resource Management** - Context manager protocol for setup/teardown +4. **Environment Isolation** - Separate agent execution from host system -Add a new MCP server. +## Architecture -```bash -openhands mcp add --transport [OPTIONS] [-- args...] -``` - -| Option | Description | -|--------|-------------| -| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | -| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | -| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | -| `--auth` | Authentication method (e.g., `oauth`) | -| `--enabled` | Enable immediately (default) | -| `--disabled` | Add in disabled state | - -**Examples:** -```bash -openhands mcp add my-api --transport http https://api.example.com/mcp -openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com -openhands mcp add local --transport stdio python -- -m my_server -openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 60}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["BaseWorkspace
Abstract base class"] + end + + subgraph Implementations["Concrete Implementations"] + Local["LocalWorkspace
Direct subprocess"] + Remote["RemoteWorkspace
HTTP API calls"] + end + + subgraph Operations["Core Operations"] + Command["execute_command()"] + Upload["file_upload()"] + Download["file_download()"] + Context["__enter__ / __exit__"] + end + + subgraph Targets["Execution Targets"] + Process["Local Process"] + Container["Docker Container"] + Server["Remote Server"] + end + + Base --> Local + Base --> Remote + + Base -.->|Defines| Operations + + Local --> Process + Remote --> Container + Remote --> Server + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Local,Remote secondary + class Command,Upload tertiary ``` -#### mcp list +### Key Components -List all configured MCP servers. +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`BaseWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)** | Abstract interface | Defines execution and file operation contracts | +| **[`LocalWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/local.py)** | Local execution | Subprocess-based command execution | +| **[`RemoteWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/remote/base.py)** | Remote execution | HTTP API-based execution via agent-server | +| **[`CommandResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | Execution output | Structured result with stdout, stderr, exit_code | +| **[`FileOperationResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | File op outcome | Success status and metadata | -```bash -openhands mcp list -``` +## Workspace Types -#### mcp get +### Local vs Remote Execution -Get details for a specific MCP server. -```bash -openhands mcp get -``` +| Aspect | LocalWorkspace | RemoteWorkspace | +|--------|----------------|-----------------| +| **Execution** | Direct subprocess | HTTP → agent-server | +| **Isolation** | Process-level | Container/VM-level | +| **Performance** | Fast (no network) | Network overhead | +| **Security** | Host system access | Sandboxed | +| **Use Case** | Development, CLI | Production, web apps | -#### mcp remove +## Core Operations -Remove an MCP server configuration. +### Command Execution -```bash -openhands mcp remove +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + Tool["Tool invokes
execute_command()"] + + Decision{"Workspace
type?"} + + LocalExec["subprocess.run()
Direct execution"] + RemoteExec["POST /command
HTTP API"] + + Result["CommandResult
stdout, stderr, exit_code"] + + Tool --> Decision + Decision -->|Local| LocalExec + Decision -->|Remote| RemoteExec + + LocalExec --> Result + RemoteExec --> Result + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style LocalExec fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style RemoteExec fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -#### mcp enable +**Command Result Structure:** -Enable an MCP server. +| Field | Type | Description | +|-------|------|-------------| +| **stdout** | str | Standard output stream | +| **stderr** | str | Standard error stream | +| **exit_code** | int | Process exit code (0 = success) | +| **timeout** | bool | Whether command timed out | +| **duration** | float | Execution time in seconds | -```bash -openhands mcp enable -``` +### File Operations -#### mcp disable +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | -Disable an MCP server. +## Resource Management -```bash -openhands mcp disable -``` +Workspaces use **context manager** for safe resource handling: -### login +**Lifecycle Hooks:** -Authenticate with OpenHands Cloud. +| Phase | LocalWorkspace | RemoteWorkspace | +|-------|----------------|-----------------| +| **Enter** | Create working directory | Connect to agent-server, verify | +| **Use** | Execute commands | Proxy commands via HTTP | +| **Exit** | No cleanup (persistent) | Disconnect, optionally stop container | -```bash -openhands login [OPTIONS] -``` +## Remote Workspace Extensions -| Option | Description | -|--------|-------------| -| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | +The SDK provides remote workspace implementations in `openhands-workspace` package: -**Examples:** -```bash -openhands login -openhands login --server-url https://enterprise.openhands.dev +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 50}} }%% +flowchart TB + Base["RemoteWorkspace
SDK base class"] + + Docker["DockerWorkspace
Auto-spawn containers"] + API["RemoteAPIWorkspace
Connect to existing server"] + + Base -.->|Extended by| Docker + Base -.->|Extended by| API + + Docker -->|Creates| Container["Docker Container
with agent-server"] + API -->|Connects| Server["Remote Agent Server"] + + style Base fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Docker fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style API fill:#fff4df,stroke:#b7791f,stroke-width:2px ``` -### logout +**Implementation Comparison:** -Log out from OpenHands Cloud. +| Type | Setup | Isolation | Use Case | +|------|-------|-----------|----------| +| **LocalWorkspace** | Immediate | Process | Development, trusted code | +| **DockerWorkspace** | Spawn container | Container | Multi-user, untrusted code | +| **RemoteAPIWorkspace** | Connect to URL | Remote server | Distributed systems, cloud | -```bash -openhands logout [OPTIONS] -``` +**Source:** +- **DockerWorkspace**: [`openhands-workspace/openhands/workspace/docker`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/docker) +- **RemoteAPIWorkspace**: [`openhands-workspace/openhands/workspace/remote_api`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/remote_api) -| Option | Description | -|--------|-------------| -| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | +## Component Relationships -**Examples:** -```bash -openhands logout -openhands logout --server-url https://app.all-hands.dev +### How Workspace Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Workspace["Workspace"] + Conversation["Conversation"] + AgentServer["Agent Server"] + + Conversation -->|Configures| Workspace + Workspace -.->|Remote type| AgentServer + + style Workspace fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conversation fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px ``` -## Interactive Commands +**Relationship Characteristics:** +- **Conversation → Workspace**: Conversation factory uses workspace type to select LocalConversation or RemoteConversation +- **Workspace → Agent Server**: RemoteWorkspace delegates operations to agent-server API +- **Tools Independence**: Tools run in the same environment as workspace -Commands available inside the CLI (prefix with `/`): +## See Also -| Command | Description | -|---------|-------------| -| `/help` | Display available commands | -| `/new` | Start a new conversation | -| `/history` | Toggle conversation history | -| `/confirm` | Configure confirmation settings | -| `/condense` | Condense conversation history | -| `/skills` | View loaded skills, hooks, and MCPs | -| `/feedback` | Send anonymous feedback about CLI | -| `/exit` | Exit the application | +- **[Conversation Architecture](/sdk/arch/conversation)** - How workspace type determines conversation implementation +- **[Agent Server](/sdk/arch/agent-server)** - Remote execution API +- **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution -## Command Palette +### FAQ +Source: https://docs.openhands.dev/sdk/faq.md -Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: +## How do I use AWS Bedrock with the SDK? -| Option | Description | -|--------|-------------| -| **History** | Toggle conversation history panel | -| **Keys** | Show keyboard shortcuts | -| **MCP** | View MCP server configurations | -| **Maximize** | Maximize/restore window | -| **Plan** | View agent plan | -| **Quit** | Quit the application | -| **Screenshot** | Take a screenshot | -| **Settings** | Configure LLM model, API keys, and other settings | -| **Theme** | Toggle color theme | +**Yes, the OpenHands SDK supports AWS Bedrock through LiteLLM.** -## Changing Your Model +Since LiteLLM requires `boto3` for Bedrock requests, you need to install it alongside the SDK. -### Via Settings UI + -1. Press `Ctrl+P` to open the command palette -2. Select **Settings** -3. Choose your LLM provider and model -4. Save changes (no restart required) +### Step 1: Install boto3 -### Via Configuration File +Install the SDK with boto3: -Edit `~/.openhands/agent_settings.json` and change the `model` field: +```bash +# Using pip +pip install openhands-sdk boto3 -```json -{ - "llm": { - "model": "claude-sonnet-4-5-20250929", - "api_key": "...", - "base_url": "..." - } -} +# Using uv +uv pip install openhands-sdk boto3 + +# Or when installing as a CLI tool +uv tool install openhands --with boto3 ``` -### Via Environment Variables +### Step 2: Configure Authentication -Temporarily override your model without changing saved configuration: +You have two authentication options: + +**Option A: API Key Authentication (Recommended)** + +Use the `AWS_BEARER_TOKEN_BEDROCK` environment variable: ```bash -export LLM_MODEL="gpt-4o" -export LLM_API_KEY="your-api-key" -openhands --override-with-envs +export AWS_BEARER_TOKEN_BEDROCK="your-bedrock-api-key" ``` -Changes made with `--override-with-envs` are not persisted. - -## Environment Variables +**Option B: AWS Credentials** -| Variable | Description | -|----------|-------------| -| `LLM_API_KEY` | API key for your LLM provider | -| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | -| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | -| `OPENHANDS_CLOUD_URL` | Default cloud server URL | -| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | +Use traditional AWS credentials: -## Exit Codes +```bash +export AWS_ACCESS_KEY_ID="your-access-key" +export AWS_SECRET_ACCESS_KEY="your-secret-key" +export AWS_REGION_NAME="us-west-2" +``` -| Code | Meaning | -|------|---------| -| `0` | Success | -| `1` | Error or task failed | -| `2` | Invalid arguments | +### Step 3: Configure the Model -## Configuration Files +Use the `bedrock/` prefix for your model name: -| File | Purpose | -|------|---------| -| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | -| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | -| `~/.openhands/mcp.json` | MCP server configurations | -| `~/.openhands/conversations/` | Conversation history | +```python +from openhands.sdk import LLM, Agent -## See Also +llm = LLM( + model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0", + # api_key is read from AWS_BEARER_TOKEN_BEDROCK automatically +) +``` -- [Installation](/openhands/usage/cli/installation) - Install the CLI -- [Quick Start](/openhands/usage/cli/quick-start) - Get started -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +For cross-region inference profiles, include the region prefix: -### Critic (Experimental) -Source: https://docs.openhands.dev/openhands/usage/cli/critic.md +```python +llm = LLM( + model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0", # US region + # or + model="bedrock/apac.anthropic.claude-sonnet-4-20250514-v1:0", # APAC region +) +``` - -**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. - + -## Overview +For more details on Bedrock configuration options, see the [LiteLLM Bedrock documentation](https://docs.litellm.ai/docs/providers/bedrock). -If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. +## Does the agent SDK support parallel tool calling? -For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). +**Yes, the OpenHands SDK supports parallel tool calling by default.** +The SDK automatically handles parallel tool calls when the underlying LLM (like Claude or GPT-4) returns multiple tool calls in a single response. This allows agents to execute multiple independent actions before the next LLM call. -## What is the Critic? + +When the LLM generates multiple tool calls in parallel, the SDK groups them using a shared `llm_response_id`: -The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: +```python +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +# Combined into: Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` -- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success -- **Real-time feedback**: Scores computed during agent execution, not just at completion +Multiple `ActionEvent`s with the same `llm_response_id` are grouped together and combined into a single LLM message with multiple `tool_calls`. Only the first event's thought/reasoning is included. The parallel tool calling implementation can be found in the [Events Architecture](/sdk/arch/events#event-types) for detailed explanation of how parallel function calling works, the [`prepare_llm_messages` in utils.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/utils.py) which groups ActionEvents by `llm_response_id` when converting events to LLM messages, the [agent step method](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py#L200-L300) where actions are created with shared `llm_response_id`, and the [`ActionEvent` class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py) which includes the `llm_response_id` field. For more details, see the **[Events Architecture](/sdk/arch/events)** for a deep dive into the event system and parallel function calling, the **[Tool System](/sdk/arch/tool-system)** for understanding how tools work with the agent, and the **[Agent Architecture](/sdk/arch/agent)** for how agents process and execute actions. + - +## Does the agent SDK support image content? -![Critic output in CLI](./screenshots/critic-cli-output.png) +**Yes, the OpenHands SDK fully supports image content for vision-capable LLMs.** -## Pricing +The SDK supports both HTTP/HTTPS URLs and base64-encoded images through the `ImageContent` class. -The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. + -## Disabling the Critic +### Check Vision Support -If you prefer not to use the critic feature, you can disable it in your settings: +Before sending images, verify your LLM supports vision: -1. Open the command palette with `Ctrl+P` -2. Select **Settings** -3. Navigate to the **CLI Settings** tab -4. Toggle off **Enable Critic (Experimental)** +```python +from openhands.sdk import LLM +from pydantic import SecretStr -![Critic settings in CLI](./screenshots/critic-cli-settings.png) +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent" +) -### GUI Server -Source: https://docs.openhands.dev/openhands/usage/cli/gui-server.md +# Check if vision is active +assert llm.vision_is_active(), "Model does not support vision" +``` -## Overview +### Using HTTP URLs -The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. +```python +from openhands.sdk import ImageContent, Message, TextContent -```bash -openhands serve +message = Message( + role="user", + content=[ + TextContent(text="What do you see in this image?"), + ImageContent(image_urls=["https://example.com/image.png"]), + ], +) ``` - -This requires Docker to be installed and running on your system. - - -## Prerequisites +### Using Base64 Images -- [Docker](https://docs.docker.com/get-docker/) installed and running -- Sufficient disk space for Docker images (~2GB) +Base64 images are supported using data URLs: -## Basic Usage +```python +import base64 +from openhands.sdk import ImageContent, Message, TextContent -```bash -# Launch the GUI server -openhands serve +# Read and encode an image file +with open("my_image.png", "rb") as f: + image_base64 = base64.b64encode(f.read()).decode("utf-8") -# The server will be available at http://localhost:3000 +# Create message with base64 image +message = Message( + role="user", + content=[ + TextContent(text="Describe this image"), + ImageContent(image_urls=[f"data:image/png;base64,{image_base64}"]), + ], +) ``` -The command will: -1. Check Docker requirements -2. Pull the required Docker images -3. Start the OpenHands GUI server -4. Display the URL to access the interface - -## Options - -| Option | Description | -|--------|-------------| -| `--mount-cwd` | Mount the current working directory into the container | -| `--gpu` | Enable GPU support via nvidia-docker | +### Supported Image Formats -## Mounting Your Workspace +The data URL format is: `data:;base64,` -To give OpenHands access to your local files: +Supported MIME types: +- `image/png` +- `image/jpeg` +- `image/gif` +- `image/webp` +- `image/bmp` -```bash -# Mount current directory -openhands serve --mount-cwd -``` +### Built-in Image Support -This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. +Several SDK tools automatically handle images: - -Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. - +- **FileEditorTool**: When viewing image files (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`), they're automatically converted to base64 and sent to the LLM +- **BrowserUseTool**: Screenshots are captured and sent as base64 images +- **MCP Tools**: Image content from MCP tool results is automatically converted to base64 data URLs -## GPU Support +### Disabling Vision -For tasks that benefit from GPU acceleration: +To disable vision for cost reduction (even on vision-capable models): -```bash -openhands serve --gpu +```python +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent", + disable_vision=True, # Images will be filtered out +) ``` -This requires: -- NVIDIA GPU -- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed -- Docker configured for GPU support + -## Examples +For a complete example, see the [image input example](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) in the SDK repository. -```bash -# Basic GUI server -openhands serve +## How do I handle MessageEvent in one-off tasks? -# Mount current project and enable GPU -cd /path/to/your/project -openhands serve --mount-cwd --gpu -``` +**The SDK provides utilities to automatically respond to agent messages when running tasks end-to-end.** -## How It Works +When running one-off tasks, some models may send a `MessageEvent` (proposing an action or asking for confirmation) instead of directly using tools. This causes `conversation.run()` to return, even though the agent hasn't finished the task. -The `openhands serve` command: + -1. **Pulls Docker images**: Downloads the OpenHands runtime and application images -2. **Starts containers**: Runs the OpenHands server in a Docker container -3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` -4. **Shares settings**: Uses your `~/.openhands` directory for configuration +When an agent sends a message (via `MessageEvent`) instead of using the `finish` tool, the conversation ends because it's waiting for user input. In automated pipelines, there's no human to respond, so the task appears incomplete. -## Stopping the Server +**Key event types:** +- `ActionEvent`: Agent uses a tool (terminal, file editor, etc.) +- `MessageEvent`: Agent sends a text message (waiting for user response) +- `FinishAction`: Agent explicitly signals task completion -Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. +The solution is to automatically send a "fake user response" when the agent sends a message, prompting it to continue. -## Comparison: GUI Server vs Web Interface + -| Feature | `openhands serve` | `openhands web` | -|---------|-------------------|-----------------| -| Interface | Full web GUI | Terminal UI in browser | -| Dependencies | Docker required | None | -| Resources | Full container (~2GB) | Lightweight | -| Features | All GUI features | CLI features only | -| Best for | Rich GUI experience | Quick terminal access | + -## Troubleshooting +The [`run_conversation_with_fake_user_response`](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) function wraps your conversation and automatically handles agent messages: -### Docker Not Running +```python +from openhands.sdk.conversation.state import ConversationExecutionStatus +from openhands.sdk.event import ActionEvent, MessageEvent +from openhands.sdk.tool.builtins.finish import FinishAction -``` -❌ Docker daemon is not running. -Please start Docker and try again. +def run_conversation_with_fake_user_response(conversation, max_responses: int = 10): + """Run conversation, auto-responding to agent messages until finish or limit.""" + for _ in range(max_responses): + conversation.run() + if conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + break + events = list(conversation.state.events) + # Check if agent used finish tool + if any(isinstance(e, ActionEvent) and isinstance(e.action, FinishAction) for e in reversed(events)): + break + # Check if agent sent a message (needs response) + if not any(isinstance(e, MessageEvent) and e.source == "agent" for e in reversed(events)): + break + # Send continuation prompt + conversation.send_message( + "Please continue. Use the finish tool when done. DO NOT ask for human help." + ) ``` -**Solution**: Start Docker Desktop or the Docker daemon. + -### Permission Denied + -``` -Got permission denied while trying to connect to the Docker daemon socket -``` +```python +from openhands.sdk import Agent, Conversation, LLM +from openhands.workspace import DockerWorkspace +from openhands.tools.preset.default import get_default_tools -**Solution**: Add your user to the docker group: -```bash -sudo usermod -aG docker $USER -# Then log out and back in +llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key="...") +agent = Agent(llm=llm, tools=get_default_tools()) +workspace = DockerWorkspace() +conversation = Conversation(agent=agent, workspace=workspace, max_iteration_per_run=100) + +conversation.send_message("Fix the bug in src/utils.py") +run_conversation_with_fake_user_response(conversation, max_responses=10) +# Results available in conversation.state.events ``` -### Port Already in Use + -If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. + +**Pro tip:** Add a hint to your task prompt: +> "If you're 100% done with the task, use the finish action. Otherwise, keep going until you're finished." -## See Also +This encourages the agent to use the finish tool rather than asking for confirmation. + -- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide -- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access -- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details +For the full implementation used in OpenHands benchmarks, see the [fake_user_response.py](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) module. -### Headless Mode -Source: https://docs.openhands.dev/openhands/usage/cli/headless.md +## More questions? -## Overview +If you have additional questions: -Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: -- CI/CD pipelines -- Automated scripting -- Integration with other tools -- Batch processing +- **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion + +### Getting Started +Source: https://docs.openhands.dev/sdk/getting-started.md + +The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. + +## Prerequisites + +Install the **[uv package manager](https://docs.astral.sh/uv/)** (version 0.8.13+): ```bash -openhands --headless -t "Your task here" +curl -LsSf https://astral.sh/uv/install.sh | sh ``` -## Requirements +## Installation -- Must specify a task with `--task` or `--file` +### Step 1: Acquire an LLM API Key - -**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. - +The SDK requires an LLM API key from any [LiteLLM-supported provider](https://docs.litellm.ai/docs/providers). See our [recommended models](/openhands/usage/llms/llms) for best results. -## Basic Usage + + + Bring your own API key from providers like: + - [Anthropic](https://console.anthropic.com/) + - [OpenAI](https://platform.openai.com/) + - [Other LiteLLM-supported providers](https://docs.litellm.ai/docs/providers) -```bash -# Run a task in headless mode -openhands --headless -t "Write a Python script that prints hello world" + Example: + ```bash + export LLM_API_KEY="your-api-key" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + -# Load task from a file -openhands --headless -f task.txt -``` + + Sign up for [OpenHands Cloud](https://app.all-hands.dev) and get an LLM API key from the [API keys page](https://app.all-hands.dev/settings/api-keys). This gives you access to models verified to work well with OpenHands, with no markup. -## JSON Output Mode + Example: + ```bash + export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` -The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: + [Learn more →](/openhands/usage/llms/openhands-llms) + -```bash -openhands --headless --json -t "Create a simple Flask app" -``` + + If you have a ChatGPT Plus or Pro subscription, you can use `LLM.subscription_login()` to authenticate with your ChatGPT account and access Codex models without consuming API credits. -Each line is a JSON object representing an agent event: + ```python + from openhands.sdk import LLM -```json -{"type": "action", "action": "write", "path": "app.py", ...} -{"type": "observation", "content": "File created successfully", ...} -{"type": "action", "action": "run", "command": "python app.py", ...} -``` + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` -### Use Cases for JSON Output + [Learn more →](/sdk/guides/llm-subscriptions) + + -- **CI/CD pipelines**: Parse events to determine success/failure -- **Automated processing**: Feed output to other tools -- **Logging**: Capture structured logs for analysis -- **Integration**: Connect OpenHands with other systems +> Tip: Model name prefixes depend on your provider +> +> - If you bring your own provider key (Anthropic/OpenAI/etc.), use that provider's model name, e.g. `anthropic/claude-sonnet-4-5-20250929` +OpenHands supports [dozens of models](https://docs.openhands.dev/sdk/arch/llm#llm-providers), you can choose the model you want to try. +> - If you use OpenHands Cloud, use `openhands/`-prefixed models, e.g. `openhands/claude-sonnet-4-5-20250929` +> +> Many examples in the docs read the model from the `LLM_MODEL` environment variable. You can set it like: +> +> ```bash +> export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" # for OpenHands Provider +> ``` -### Example: Capture Output to File +**Set Your API Key:** ```bash -openhands --headless --json -t "Add unit tests" > output.jsonl +export LLM_API_KEY=your-api-key-here ``` -## See Also - -- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage -- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options +### Step 2: Install the SDK -### JetBrains IDEs -Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md + + + ```bash + pip install openhands-sdk # Core SDK (openhands.sdk) + pip install openhands-tools # Built-in tools (openhands.tools) + # Optional: required for sandboxed workspaces in Docker or remote servers + pip install openhands-workspace # Workspace backends (openhands.workspace) + pip install openhands-agent-server # Remote agent server (openhands.agent_server) + ``` + -[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. + + ```bash + # Clone the repository + git clone https://github.com/OpenHands/software-agent-sdk.git + cd software-agent-sdk -## Supported IDEs + # Install dependencies and setup development environment + make build + ``` + + -This guide applies to all JetBrains IDEs: -- IntelliJ IDEA -- PyCharm -- WebStorm -- GoLand -- Rider -- CLion -- PhpStorm -- RubyMine -- DataGrip -- And other JetBrains IDEs +### Step 3: Run Your First Agent -## Prerequisites +Here's a complete example that creates an agent and asks it to perform a simple task: -Before configuring JetBrains IDEs: +```python icon="python" expandable examples/01_standalone_sdk/01_hello_world.py +import os -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **JetBrains IDE version 25.3 or later** -4. **JetBrains AI Assistant enabled** in your IDE +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool - -JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. - -## Configuration +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) -### Step 1: Create the ACP Configuration File +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) -Create or edit the file `$HOME/.jetbrains/acp.json`: - - - - ```bash - mkdir -p ~/.jetbrains - nano ~/.jetbrains/acp.json - ``` - - - Create the file at `C:\Users\\.jetbrains\acp.json` - - +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) -### Step 2: Add the Configuration +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` -Add the following JSON: +Run the example: -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": {} - } - } -} +```bash +# Using a direct provider key (Anthropic/OpenAI/etc.) +uv run python examples/01_standalone_sdk/01_hello_world.py ``` -### Step 3: Use OpenHands in Your IDE - -Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. +```bash +# Using OpenHands Cloud +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +uv run python examples/01_standalone_sdk/01_hello_world.py +``` -## Advanced Configuration +You should see the agent understand your request, explore the project, and create a file with facts about it. -### LLM-Approve Mode +## Core Concepts -For automatic LLM-based approval: +**Agent**: An AI-powered entity that can reason, plan, and execute actions using tools. -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp", "--llm-approve"], - "env": {} - } - } -} -``` +**Tools**: Capabilities like executing bash commands, editing files, or browsing the web. -### Auto-Approve Mode +**Workspace**: The execution environment where agents operate (local, Docker, or remote). -For automatic approval of all actions (use with caution): +**Conversation**: Manages the interaction lifecycle between you and the agent. -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp", "--always-approve"], - "env": {} - } - } -} -``` +## Basic Workflow -### Resume a Conversation +1. **Configure LLM**: Choose model and provide API key +2. **Create Agent**: Use preset or custom configuration +3. **Add Tools**: Enable capabilities (bash, file editing, etc.) +4. **Start Conversation**: Create conversation context +5. **Send Message**: Provide task description +6. **Run Agent**: Agent executes until task completes or stops +7. **Get Result**: Review agent's output and actions -Resume a specific conversation: -```json -{ - "agent_servers": { - "OpenHands (Resume)": { - "command": "openhands", - "args": ["acp", "--resume", "abc123def456"], - "env": {} - } - } -} -``` +## Try More Examples -Resume the latest conversation: +The repository includes 24+ examples demonstrating various capabilities: -```json -{ - "agent_servers": { - "OpenHands (Latest)": { - "command": "openhands", - "args": ["acp", "--resume", "--last"], - "env": {} - } - } -} -``` +```bash +# Simple hello world +uv run python examples/01_standalone_sdk/01_hello_world.py -### Multiple Configurations +# Custom tools +uv run python examples/01_standalone_sdk/02_custom_tools.py -Add multiple configurations for different use cases: +# With skills +uv run python examples/01_standalone_sdk/03_activate_microagent.py -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": {} - }, - "OpenHands (Auto-Approve)": { - "command": "openhands", - "args": ["acp", "--always-approve"], - "env": {} - }, - "OpenHands (Resume Latest)": { - "command": "openhands", - "args": ["acp", "--resume", "--last"], - "env": {} - } - } -} +# See all examples +ls examples/01_standalone_sdk/ ``` -### Environment Variables - -Pass environment variables to the agent: - -```json -{ - "agent_servers": { - "OpenHands": { - "command": "openhands", - "args": ["acp"], - "env": { - "LLM_API_KEY": "your-api-key" - } - } - } -} -``` -## Troubleshooting +## Next Steps -### "Agent not found" or "Command failed" +### Explore Documentation -1. Verify OpenHands CLI is installed: - ```bash - openhands --version - ``` +- **[SDK Architecture](/sdk/arch/sdk)** - Deep dive into components +- **[Tool System](/sdk/arch/tool-system)** - Available tools +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environments +- **[LLM Configuration](/sdk/arch/llm)** - Deep dive into language model configuration -2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) +### Build Custom Solutions -### "AI Assistant not available" +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools to expand agent capabilities +- **[MCP Integration](/sdk/guides/mcp)** - Connect to external tools via Model Context Protocol +- **[Docker Workspaces](/sdk/guides/agent-server/docker-sandbox)** - Sandbox agent execution in containers -1. Ensure you have JetBrains IDE version 25.3 or later -2. Enable AI Assistant: `Settings > Plugins > AI Assistant` -3. Restart the IDE after enabling +### Get Help -### Agent doesn't respond +- **[Slack Community](https://openhands.dev/joinslack)** - Ask questions and share projects +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features +- **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples -1. Check your LLM settings: - ```bash - openhands - # Use /settings to configure - ``` +### Browser Use +Source: https://docs.openhands.dev/sdk/guides/agent-browser-use.md -2. Test ACP mode in terminal: - ```bash - openhands acp - # Should start without errors - ``` +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### Configuration not applied +> A ready-to-run example is available [here](#ready-to-run-example)! -1. Verify the config file location: `~/.jetbrains/acp.json` -2. Validate JSON syntax (no trailing commas, proper quotes) -3. Restart your JetBrains IDE +The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built +on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, +and extracting content - all through natural language instructions. -### Finding Your Conversation ID +## How It Works -To resume conversations, first find the ID: +The [ready-to-run example](#ready-to-run-example) demonstrates combining multiple tools to create a capable web research agent: -```bash -openhands --resume -``` +1. **BrowserToolSet**: Provides automated browser control for web interaction +2. **FileEditorTool**: Allows the agent to read and write files if needed +3. **BashTool**: Enables command-line operations for additional functionality -This displays recent conversations with their IDs: +The agent uses these tools to: +- Navigate to specified URLs +- Interact with web page elements (clicking, scrolling, etc.) +- Extract and analyze content from web pages +- Summarize information from multiple sources -``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py --------------------------------------------------------------------------------- -``` +In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. -## See Also +## Customization -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually +register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual +tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. +This gives you fine-grained control over which browser capabilities are exposed to the agent. -### IDE Integration Overview -Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview.md +## Ready-to-run Example - -IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). - + +This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) + - -**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. - +```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +import os -## What is the Agent Client Protocol (ACP)? +from pydantic import SecretStr -The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. - -## Supported IDEs - -| IDE | Support Level | Setup Guide | -|-----|---------------|-------------| -| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | -| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | -| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | -| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | - -## Prerequisites +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -Before using OpenHands with any IDE, you must: -1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) +logger = get_logger(__name__) -2. **Configure your LLM settings** using the `/settings` command: - ```bash - openhands - # Then use /settings to configure - ``` +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=BrowserToolSet.name), +] -## How It Works +# If you need fine-grained browser control, you can manually register individual browser +# tools by creating a BrowserToolExecutor and providing factories that return customized +# Tool instances before constructing the Agent. -```mermaid -graph LR - IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] - CLI -->|API Calls| LLM[LLM Provider] - CLI -->|Commands| Runtime[Sandbox Runtime] -``` +# Agent +agent = Agent(llm=llm, tools=tools) -1. Your IDE launches `openhands acp` as a subprocess -2. Communication happens via JSON-RPC 2.0 over stdio -3. OpenHands uses your configured LLM and runtime settings -4. Results are displayed in your IDE's interface +llm_messages = [] # collect raw LLM messages -## The ACP Command -The `openhands acp` command starts OpenHands as an ACP server: +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -```bash -# Basic ACP server -openhands acp -# With LLM-based approval -openhands acp --llm-approve +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -# Resume a conversation -openhands acp --resume +conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" +) +conversation.run() -# Resume the latest conversation -openhands acp --resume --last +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") ``` -### ACP Options - -| Option | Description | -|--------|-------------| -| `--resume [ID]` | Resume a conversation by ID | -| `--last` | Resume the most recent conversation | -| `--always-approve` | Auto-approve all actions | -| `--llm-approve` | Use LLM-based security analyzer | -| `--streaming` | Enable token-by-token streaming | + -## Confirmation Modes +## Next Steps -OpenHands ACP supports three confirmation modes to control how agent actions are approved: +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services -### Always Ask (Default) +### Creating Custom Agent +Source: https://docs.openhands.dev/sdk/guides/agent-custom.md -The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -```bash -openhands acp # defaults to always-ask mode -``` +This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. -### Always Approve + +This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) + -The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. -```bash -openhands acp --always-approve -``` +The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. -### LLM-Based Approval +```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py +#!/usr/bin/env python3 +""" +Planning Agent Workflow Example -The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. +This example demonstrates a two-stage workflow: +1. Planning Agent: Analyzes the task and creates a detailed implementation plan +2. Execution Agent: Implements the plan with full editing capabilities -```bash -openhands acp --llm-approve -``` +The task: Create a Python web scraper that extracts article titles and URLs +from a news website, handles rate limiting, and saves results to JSON. +""" -### Changing Modes During a Session +import os +import tempfile +from pathlib import Path -You can change the confirmation mode during an active session using slash commands: +from pydantic import SecretStr -| Command | Description | -|---------|-------------| -| `/confirm always-ask` | Switch to always-ask mode | -| `/confirm always-approve` | Switch to always-approve mode | -| `/confirm llm-approve` | Switch to LLM-based approval mode | -| `/help` | Show all available slash commands | +from openhands.sdk import LLM, Conversation +from openhands.sdk.llm import content_to_str +from openhands.tools.preset.default import get_default_agent +from openhands.tools.preset.planning import get_planning_agent - -The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. - -## Choosing an IDE +def get_event_content(event): + """Extract content from an event.""" + if hasattr(event, "llm_message"): + return "".join(content_to_str(event.llm_message.content)) + return str(event) - - - High-performance editor with native ACP support. Best for speed and simplicity. - - - Universal terminal interface. Works with any terminal, consistent experience. - - - Popular editor with community extension. Great for VS Code users. - - - IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. - - -## Resuming Conversations in IDEs +"""Run the planning agent workflow example.""" -You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: +# Create a temporary workspace +workspace_dir = Path(tempfile.mkdtemp()) +print(f"Working in: {workspace_dir}") -```bash -openhands --resume -``` +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="agent", +) -This shows your recent conversations: +# Task description +task = """ +Create a Python web scraper with the following requirements: +- Scrape article titles and URLs from a news website +- Handle HTTP errors gracefully with retry logic +- Save results to a JSON file with timestamp +- Use requests and BeautifulSoup for scraping -``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py +Do NOT ask for any clarifying questions. Directly create your implementation plan. +""" - 2. xyz789ghi012 (yesterday) - Add unit tests for the user service --------------------------------------------------------------------------------- -``` +print("=" * 80) +print("PHASE 1: PLANNING") +print("=" * 80) -Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. +# Create Planning Agent with read-only tools +planning_agent = get_planning_agent(llm=llm) -## See Also +# Create conversation for planning +planning_conversation = Conversation( + agent=planning_agent, + workspace=str(workspace_dir), +) -- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal -- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide +# Run planning phase +print("Planning Agent is analyzing the task and creating implementation plan...") +planning_conversation.send_message( + f"Please analyze this web scraping task and create a detailed " + f"implementation plan:\n\n{task}" +) +planning_conversation.run() -### Toad Terminal -Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad.md +print("\n" + "=" * 80) +print("PLANNING COMPLETE") +print("=" * 80) +print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") -[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). +print("\n" + "=" * 80) +print("PHASE 2: EXECUTION") +print("=" * 80) -The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. +# Create Execution Agent with full editing capabilities +execution_agent = get_default_agent(llm=llm, cli_mode=True) -![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) +# Create conversation for execution +execution_conversation = Conversation( + agent=execution_agent, + workspace=str(workspace_dir), +) -## Why Toad? +# Prepare execution prompt with reference to the plan file +execution_prompt = f""" +Please implement the web scraping project according to the implementation plan. -Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: +The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md -- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything -- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs -- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP +Please read the plan from PLAN.md and implement all components according to it. -OpenHands is included as a recommended agent in Toad's agent store. +Create all necessary files, implement the functionality, and ensure everything +works together properly. +""" -## Prerequisites +print("Execution Agent is implementing the plan...") +execution_conversation.send_message(execution_prompt) +execution_conversation.run() -Before using Toad with OpenHands: +# Get the last message from the conversation +execution_result = execution_conversation.state.events[-1] -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` +print("\n" + "=" * 80) +print("EXECUTION RESULT:") +print("=" * 80) +print(get_event_content(execution_result)) -## Installation +print("\n" + "=" * 80) +print("WORKFLOW COMPLETE") +print("=" * 80) +print(f"Project files created in: {workspace_dir}") -Install Toad using [uv](https://docs.astral.sh/uv/): +# List created files +print("\nCreated files:") +for file_path in workspace_dir.rglob("*"): + if file_path.is_file(): + print(f" - {file_path.relative_to(workspace_dir)}") -```bash -uvx batrachian-toad +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). + -## Setup +## Anatomy of a Custom Agent -### Using the Agent Store +The planning agent demonstrates the two key components for creating specialized agent: -The easiest way to set up OpenHands with Toad: +### 1. Custom Tool Selection -1. Launch Toad: `uvx batrachian-toad` -2. Open Toad's agent store -3. Find **OpenHands** in the list of recommended agents -4. Click **Install** to set up OpenHands -5. Select OpenHands and start a conversation +Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: -The install process runs: -```bash -uv tool install openhands --python 3.12 && openhands login -``` +```python icon="python" -### Manual Configuration +def register_planning_tools() -> None: + """Register the planning agent tools.""" + from openhands.tools.glob import GlobTool + from openhands.tools.grep import GrepTool + from openhands.tools.planning_file_editor import PlanningFileEditorTool -You can also launch Toad directly with OpenHands: + register_tool("GlobTool", GlobTool) + logger.debug("Tool: GlobTool registered.") + register_tool("GrepTool", GrepTool) + logger.debug("Tool: GrepTool registered.") + register_tool("PlanningFileEditorTool", PlanningFileEditorTool) + logger.debug("Tool: PlanningFileEditorTool registered.") -```bash -toad acp "openhands acp" -``` -## Usage +def get_planning_tools() -> list[Tool]: + """Get the planning agent tool specifications. -### Basic Usage + Returns: + List of tools optimized for planning and analysis tasks, including + file viewing and PLAN.md editing capabilities for advanced + code discovery and navigation. + """ + register_planning_tools() -```bash -# Launch Toad with OpenHands -toad acp "openhands acp" + return [ + Tool(name="GlobTool"), + Tool(name="GrepTool"), + Tool(name="PlanningFileEditorTool"), + ] ``` -### With Command Line Arguments - -Pass OpenHands CLI flags through Toad: +The planning agent uses: +- **GlobTool**: For discovering files and directories matching patterns +- **GrepTool**: For searching specific content across files +- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only -```bash -# Use LLM-based approval mode -toad acp "openhands acp --llm-approve" +This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. -# Auto-approve all actions -toad acp "openhands acp --always-approve" -``` +### 2. System Prompt Customization -### Resume a Conversation +Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: +1. **Objective**: Clear goal statement +2. **Context Summary**: Relevant system components and constraints +3. **Approach Overview**: High-level strategy and rationale +4. **Implementation Steps**: Detailed step-by-step execution plan +5. **Testing and Validation**: Verification methods and success criteria -Resume a specific conversation by ID: +### Complete Implementation Reference -```bash -toad acp "openhands acp --resume abc123def456" -``` +For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). -Resume the most recent conversation: +## Next Steps -```bash -toad acp "openhands acp --resume --last" -``` +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case +- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management +- **[MCP Integration](/sdk/guides/mcp)** - Add MCP - -Find your conversation IDs by running `openhands --resume` in a regular terminal. - +### Sub-Agent Delegation +Source: https://docs.openhands.dev/sdk/guides/agent-delegation.md -## Advanced Configuration +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### Combined Options +> A ready-to-run example is available [here](#ready-to-run-example)! -```bash -# Resume with LLM approval -toad acp "openhands acp --resume --last --llm-approve" -``` +## Overview -### Environment Variables +Agent delegation allows a main agent to spawn multiple sub-agents and delegate tasks to them for parallel processing. Each sub-agent runs independently with its own conversation context and returns results that the main agent can consolidate and process further. -Pass environment variables to OpenHands: +This pattern is useful when: +- Breaking down complex problems into independent subtasks +- Processing multiple related tasks in parallel +- Separating concerns between different specialized sub-agents +- Improving throughput for parallelizable work -```bash -LLM_API_KEY=your-key toad acp "openhands acp" -``` +## How It Works -## Troubleshooting +The delegation system consists of two main operations: -### "openhands" command not found +### 1. Spawning Sub-Agents -Ensure OpenHands is installed: -```bash -uv tool install openhands --python 3.12 -``` +Before delegating work, the agent must first spawn sub-agents with meaningful identifiers: -Verify it's in your PATH: -```bash -which openhands +```python icon="python" wrap +# Agent uses the delegate tool to spawn sub-agents +{ + "command": "spawn", + "ids": ["lodging", "activities"] +} ``` -### Agent doesn't respond +Each spawned sub-agent: +- Gets a unique identifier that the agent specify (e.g., "lodging", "activities") +- Inherits the same LLM configuration as the parent agent +- Operates in the same workspace as the main agent +- Maintains its own independent conversation context -1. Check your LLM settings: `openhands` then `/settings` -2. Verify your API key is valid -3. Check network connectivity to your LLM provider +### 2. Delegating Tasks -### Conversation not persisting +Once sub-agents are spawned, the agent can delegate tasks to them: -Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. +```python icon="python" wrap +# Agent uses the delegate tool to assign tasks +{ + "command": "delegate", + "tasks": { + "lodging": "Find the best budget-friendly areas to stay in London", + "activities": "List top 5 must-see attractions and hidden gems in London" + } +} +``` -## See Also +The delegate operation: +- Runs all sub-agent tasks in parallel using threads +- Blocks until all sub-agents complete their work +- Returns a single consolidated observation with all results +- Handles errors gracefully and reports them per sub-agent -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +## Setting Up the DelegateTool -### VS Code -Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md + + + ### Register the Tool -[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. + ```python icon="python" wrap + from openhands.sdk.tool import register_tool + from openhands.tools.delegate import DelegateTool - -VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. - + register_tool("DelegateTool", DelegateTool) + ``` + + + ### Add to Agent Tools -## Prerequisites + ```python icon="python" wrap + from openhands.sdk import Tool + from openhands.tools.preset.default import get_default_tools -Before configuring VS Code: + tools = get_default_tools(enable_browser=False) + tools.append(Tool(name="DelegateTool")) -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) + agent = Agent(llm=llm, tools=tools) + ``` + + + ### Configure Maximum Sub-Agents (Optional) -## Installation + The user can limit the maximum number of concurrent sub-agents: -### Step 1: Install the Extension + ```python icon="python" wrap + from openhands.tools.delegate import DelegateTool -1. Open VS Code -2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) -3. Search for **"VSCode ACP"** -4. Click **Install** - -Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). - -### Step 2: Connect to OpenHands + class CustomDelegateTool(DelegateTool): + @classmethod + def create(cls, conv_state, max_children: int = 3): + # Only allow up to 3 sub-agents + return super().create(conv_state, max_children=max_children) -1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) -2. Click **Connect** to start a session -3. Select **OpenHands** from the agent dropdown -4. Start chatting with OpenHands! + register_tool("DelegateTool", CustomDelegateTool) + ``` + + -## How It Works -The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. +## Tool Commands -The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. +### spawn -## Verification +Initialize sub-agents with meaningful identifiers. -Ensure OpenHands is discoverable: +**Parameters:** +- `command`: `"spawn"` +- `ids`: List of string identifiers (e.g., `["research", "implementation", "testing"]`) -```bash -which openhands -# Should return a path like /Users/you/.local/bin/openhands -``` +**Returns:** +A message indicating the sub-agents were successfully spawned. -If the command is not found, install OpenHands CLI: -```bash -uv tool install openhands --python 3.12 +**Example:** +```python icon="python" wrap +{ + "command": "spawn", + "ids": ["research", "implementation", "testing"] +} ``` -## Advanced Usage +### delegate -### Custom Arguments +Send tasks to specific sub-agents and wait for results. -The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. +**Parameters:** +- `command`: `"delegate"` +- `tasks`: Dictionary mapping sub-agent IDs to task descriptions -### Resume Conversations +**Returns:** +A consolidated message containing all results from the sub-agents. -To resume a conversation, you may need to: +**Example:** +```python icon="python" wrap +{ + "command": "delegate", + "tasks": { + "research": "Find best practices for async code", + "implementation": "Refactor the MyClass class", + "testing": "Write unit tests for the refactored code" + } +} +``` -1. Find your conversation ID: `openhands --resume` -2. Configure the extension to use custom arguments (if supported) -3. Or use the terminal directly: `openhands acp --resume ` +## Ready-to-run Example -The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. +This example is available on GitHub: [examples/01_standalone_sdk/25_agent_delegation.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_agent_delegation.py) -## Troubleshooting - -### OpenHands Not Appearing in Dropdown - -1. Verify OpenHands is installed and in PATH: - ```bash - which openhands - openhands --version - ``` - -2. Restart VS Code after installing OpenHands +```python icon="python" expandable examples/01_standalone_sdk/25_agent_delegation.py +""" +Agent Delegation Example -3. Check if the extension recognizes agents: - - Look for any error messages in the extension panel - - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) +This example demonstrates the agent delegation feature where a main agent +delegates tasks to sub-agents for parallel processing. +Each sub-agent runs independently and returns its results to the main agent, +which then merges both analyses into a single consolidated report. +""" -### Connection Failed +import os -1. Ensure your LLM settings are configured: - ```bash - openhands - # Use /settings to configure - ``` +from pydantic import SecretStr -2. Check that `openhands acp` works in terminal: - ```bash - openhands acp - # Should start without errors (Ctrl+C to exit) - ``` +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Tool, + get_logger, +) +from openhands.sdk.context import Skill +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import ( + DelegateTool, + DelegationVisualizer, + register_agent, +) +from openhands.tools.preset.default import get_default_tools -### Extension Not Working -1. Update to the latest version of the extension -2. Check for VS Code updates -3. Report issues on the [extension's GitHub](https://github.com/omercnet) +ONLY_RUN_SIMPLE_DELEGATION = False -## Limitations +logger = get_logger(__name__) -Since this is a community extension: +# Configure LLM and agent +# You can get an API key from https://app.all-hands.dev/settings/api-keys +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=os.environ.get("LLM_BASE_URL", None), + usage_id="agent", +) -- Feature availability may vary -- Support depends on the extension maintainer -- Not all OpenHands CLI flags may be accessible through the UI +cwd = os.getcwd() -For the most control over OpenHands, consider using: -- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage -- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support +register_tool("DelegateTool", DelegateTool) +tools = get_default_tools(enable_browser=False) +tools.append(Tool(name="DelegateTool")) -## See Also +main_agent = Agent( + llm=llm, + tools=tools, +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page -- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal +task_message = ( + "Forget about coding. Let's switch to travel planning. " + "Let's plan a trip to London. I have two issues I need to solve: " + "Lodging: what are the best areas to stay at while keeping budget in mind? " + "Activities: what are the top 5 must-see attractions and hidden gems? " + "Please use the delegation tools to handle these two tasks in parallel. " + "Make sure the sub-agents use their own knowledge " + "and dont rely on internet access. " + "They should keep it short. After getting the results, merge both analyses " + "into a single consolidated report.\n\n" +) +conversation.send_message(task_message) +conversation.run() -### Zed IDE -Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed.md +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() -[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. +# Report cost for simple delegation example +cost_1 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (simple delegation): {cost_1}") - +print("Simple delegation example done!", "\n" * 20) -## Prerequisites -Before configuring Zed, ensure you have: +# -------- Agent Delegation Second Part: User-Defined Agent Types -------- -1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) -2. **LLM settings configured** - Run `openhands` and use `/settings` -3. **Zed editor** - Download from [zed.dev](https://zed.dev/) +if ONLY_RUN_SIMPLE_DELEGATION: + exit(0) -## Configuration -### Step 1: Open Agent Settings +def create_lodging_planner(llm: LLM) -> Agent: + """Create a lodging planner focused on London stays.""" + skills = [ + Skill( + name="lodging_planning", + content=( + "You specialize in finding great places to stay in London. " + "Provide 3-4 hotel recommendations with neighborhoods, quick " + "pros/cons, " + "and notes on transit convenience. Keep options varied by budget." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Focus only on London lodging recommendations.", + ), + ) -1. Open Zed -2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette -3. Search for `agent: open settings` -![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) +def create_activities_planner(llm: LLM) -> Agent: + """Create an activities planner focused on London itineraries.""" + skills = [ + Skill( + name="activities_planning", + content=( + "You design concise London itineraries. Suggest 2-3 daily " + "highlights, grouped by proximity to minimize travel time. " + "Include food/coffee stops " + "and note required tickets/reservations." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Plan practical, time-efficient days in London.", + ), + ) -### Step 2: Add OpenHands as an Agent -1. On the right side, click `+ Add Agent` -2. Select `Add Custom Agent` +# Register user-defined agent types (default agent type is always available) +register_agent( + name="lodging_planner", + factory_func=create_lodging_planner, + description="Finds London lodging options with transit-friendly picks.", +) +register_agent( + name="activities_planner", + factory_func=create_activities_planner, + description="Creates time-efficient London activity itineraries.", +) -![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) +# Make the delegation tool available to the main agent +register_tool("DelegateTool", DelegateTool) -### Step 3: Configure the Agent +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) -Add the following configuration to the `agent_servers` field: +task_message = ( + "Plan a 3-day London trip. " + "1) Spawn two sub-agents: lodging_planner (hotel options) and " + "activities_planner (itinerary). " + "2) Ask lodging_planner for 3-4 central London hotel recommendations with " + "neighborhoods, quick pros/cons, and transit notes by budget. " + "3) Ask activities_planner for a concise 3-day itinerary with nearby stops, " + " food/coffee suggestions, and any ticket/reservation notes. " + "4) Share both sub-agent results and propose a combined plan." +) -```json -{ - "agent_servers": { - "OpenHands": { - "command": "uvx", - "args": [ - "openhands", - "acp" - ], - "env": {} - } - } -} -``` +print("=" * 100) +print("Demonstrating London trip delegation (lodging + activities)...") +print("=" * 100) -### Step 4: Save and Use +conversation.send_message(task_message) +conversation.run() -1. Save the settings file -2. You can now use OpenHands within Zed! +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() -![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) +# Report cost for user-defined agent types example +cost_2 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (user-defined agents): {cost_2}") -## Advanced Configuration +print("All done!") -### LLM-Approve Mode +# Full example cost report for CI workflow +print(f"EXAMPLE_COST: {cost_1 + cost_2}") +``` -For automatic LLM-based approval of actions: + -```json -{ - "agent_servers": { - "OpenHands (LLM Approve)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--llm-approve" - ], - "env": {} - } - } -} -``` +### Interactive Terminal +Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md -### Resume a Specific Conversation +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -To resume a previous conversation: +> A ready-to-run example is available [here](#ready-to-run-example)! -```json -{ - "agent_servers": { - "OpenHands (Resume)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--resume", - "abc123def456" - ], - "env": {} - } - } -} -``` +The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. -Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. -### Resume Latest Conversation +## How It Works -```json -{ - "agent_servers": { - "OpenHands (Latest)": { - "command": "uvx", - "args": [ - "openhands", - "acp", - "--resume", - "--last" - ], - "env": {} - } - } -} +```python icon="python" focus={4-7} +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] ``` -### Multiple Configurations -You can add multiple OpenHands configurations for different use cases: +The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. -```json -{ - "agent_servers": { - "OpenHands": { - "command": "uvx", - "args": ["openhands", "acp"], - "env": {} - }, - "OpenHands (Auto-Approve)": { - "command": "uvx", - "args": ["openhands", "acp", "--always-approve"], - "env": {} - }, - "OpenHands (Resume Latest)": { - "command": "uvx", - "args": ["openhands", "acp", "--resume", "--last"], - "env": {} - } - } -} -``` +In the example above, the agent should: +1. Enters Python's interactive mode by running `python3` +2. Executes Python code to get the current time +3. Exits the Python interpreter -## Troubleshooting +The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/definition.py) and [terminal source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. -### Accessing Debug Logs +## Ready-to-run Example -If you encounter issues: + +This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) + -1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) -2. Type and select `acp debug log` -3. Review the logs for errors or warnings -4. Restart the conversation to reload connections after configuration changes -### Common Issues +```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +import os -**"openhands" command not found** +from pydantic import SecretStr -Ensure OpenHands is installed and in your PATH: -```bash -which openhands -# Should return a path like /Users/you/.local/bin/openhands -``` +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool -If using `uvx`, ensure uv is installed: -```bash -uv --version -``` -**Agent doesn't start** +logger = get_logger(__name__) -1. Check that your LLM settings are configured: run `openhands` and verify `/settings` -2. Verify the configuration JSON syntax is valid -3. Check the ACP debug logs for detailed errors +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -**Conversation doesn't persist** +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + params={"no_change_timeout_seconds": 3}, + ) +] -Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. +# Agent +agent = Agent(llm=llm, tools=tools) - -After making configuration changes, restart the conversation in Zed to apply them. - +llm_messages = [] # collect raw LLM messages -## See Also -- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs -- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation -- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -### Installation -Source: https://docs.openhands.dev/openhands/usage/cli/installation.md - -**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. - +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -## Installation Methods +conversation.send_message( + "Enter python interactive mode by directly running `python3`, then tell me " + "the current time, and exit python interactive mode." +) +conversation.run() - - - Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. - - **Install OpenHands:** - ```bash - uv tool install openhands --python 3.12 - ``` - - **Run OpenHands:** - ```bash - openhands - ``` +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` - **Upgrade OpenHands:** - ```bash - uv tool upgrade openhands --python 3.12 - ``` - - - Install the OpenHands CLI binary with the install script: + - ```bash - curl -fsSL https://install.openhands.dev/install.sh | sh - ``` +## Next Steps - Then run: - ```bash - openhands - ``` +- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases - - Your system may require you to allow permissions to run the executable. +### API-based Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md - - When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple - cannot check it for malicious software." +> A ready-to-run example is available [here](#ready-to-run-example)! - 1. Open `System Settings`. - 2. Go to `Privacy & Security`. - 3. Scroll down to `Security` and click `Allow Anyway`. - 4. Rerun the OpenHands CLI. - ![mac-security](/openhands/static/img/cli-security-mac.png) +The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. - - - - - 1. Set the following environment variable in your terminal: - - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) +## Key Concepts - 2. Ensure you have configured your settings before starting: - - Set up `~/.openhands/settings.json` with your LLM configuration +### APIRemoteWorkspace - 3. Run the following command: +The `APIRemoteWorkspace` connects to a hosted runtime API service: - ```bash - docker run -it \ - --pull=always \ - -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ - -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ - -e SANDBOX_USER_ID=$(id -u) \ - -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v ~/.openhands:/root/.openhands \ - --add-host host.docker.internal:host-gateway \ - --name openhands-cli-$(date +%Y%m%d%H%M%S) \ - python:3.12-slim \ - bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" - ``` +```python icon="python" +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) as workspace: +``` - The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's - permissions. This prevents the agent from creating root-owned files in the mounted workspace. - - +This workspace type: +- Connects to a remote runtime API service +- Automatically provisions sandboxed environments +- Manages container lifecycle through the API +- Handles all infrastructure concerns -## First Run +### Runtime API Authentication -The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved -for future sessions in `~/.openhands/settings.json`. +The example requires a runtime API key for authentication: -The conversation history will be saved in `~/.openhands/conversations`. +```python icon="python" +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) +``` - -If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the -configuration format has changed. - +This key authenticates your requests to the hosted runtime service. -## Next Steps +### Pre-built Image Selection -- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +You can specify which pre-built agent server image to use: -### MCP Servers -Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md +```python icon="python" focus={4} +APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) +``` -## Overview +The runtime API will pull and run the specified image in a sandboxed environment. -[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. +### Workspace Testing -The CLI provides two ways to manage MCP servers: -1. **CLI commands** (`openhands mcp`) - Manage servers from the command line -2. **Interactive command** (`/mcp`) - View server status within a conversation +Just like with `DockerWorkspace`, you can test the workspace before running the agent: - -If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. - +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` -## MCP Commands +This verifies connectivity to the remote runtime and ensures the environment is ready. -### List Servers +### Automatic RemoteConversation -View all configured MCP servers: +The conversation uses WebSocket communication with the remote server: -```bash -openhands mcp list +```python icon="python" focus={1, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True +) +assert isinstance(conversation, RemoteConversation) ``` -### Get Server Details +All agent execution happens on the remote runtime infrastructure. -View details for a specific server: +## Ready-to-run Example -```bash -openhands mcp get -``` + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + -### Remove a Server +This example shows how to connect to a hosted runtime API for fully managed agent execution: -Remove a server configuration: +```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +"""Example: APIRemoteWorkspace with Dynamic Build. -```bash -openhands mcp remove -``` +This example demonstrates building an agent-server image on-the-fly from the SDK +codebase and launching it in a remote sandboxed environment via Runtime API. -### Enable/Disable Servers +Usage: + uv run examples/24_remote_convo_with_api_sandboxed_server.py -Control which servers are active: +Requirements: + - LLM_API_KEY: API key for LLM access + - RUNTIME_API_KEY: API key for runtime API access +""" -```bash -# Enable a server -openhands mcp enable +import os +import time -# Disable a server -openhands mcp disable -``` +from pydantic import SecretStr -## Adding Servers +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import APIRemoteWorkspace -### HTTP/SSE Servers -Add remote servers with HTTP or SSE transport: +logger = get_logger(__name__) -```bash -openhands mcp add --transport http -``` -#### With Bearer Token Authentication +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" -```bash -openhands mcp add my-api --transport http \ - --header "Authorization: Bearer your-token" \ - https://api.example.com/mcp -``` +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -#### With API Key Authentication +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) -```bash -openhands mcp add weather-api --transport http \ - --header "X-API-Key: your-api-key" \ - https://weather.api.com -``` -#### With Multiple Headers +# If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency +# Otherwise, use the latest image from main +server_image_sha = os.getenv("GITHUB_SHA") or "main" +server_image = f"ghcr.io/openhands/agent-server:{server_image_sha[:7]}-python-amd64" +logger.info(f"Using server image: {server_image}") -```bash -openhands mcp add secure-api --transport http \ - --header "Authorization: Bearer token123" \ - --header "X-Client-ID: client456" \ - https://api.example.com -``` +with APIRemoteWorkspace( + runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), + runtime_api_key=runtime_api_key, + server_image=server_image, + image_pull_policy="Always", +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} -#### With OAuth Authentication + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() -```bash -openhands mcp add notion-server --transport http \ - --auth oauth \ - https://mcp.notion.com/mcp -``` - -### Stdio Servers - -Add local servers that communicate via stdio: - -```bash -openhands mcp add --transport stdio -- [args...] -``` - -#### Basic Example - -```bash -openhands mcp add local-server --transport stdio \ - python -- -m my_mcp_server -``` + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") -#### With Environment Variables + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) -```bash -openhands mcp add local-server --transport stdio \ - --env "API_KEY=secret123" \ - --env "DATABASE_URL=postgresql://localhost/mydb" \ - python -- -m my_mcp_server --config config.json -``` + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() -#### Add in Disabled State + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) -```bash -openhands mcp add my-server --transport stdio --disabled \ - node -- my-server.js + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() ``` -### Command Reference +You can run the example code as-is. -```bash -openhands mcp add --transport [options] [-- args...] +```bash Running the Example +export LLM_API_KEY="your-api-key" +# If using the OpenHands LLM proxy, set its base URL: +export LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" +export RUNTIME_API_KEY="your-runtime-api-key" +# Set the runtime API URL for the remote sandbox +export RUNTIME_API_URL="https://runtime.eval.all-hands.dev" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py ``` -| Option | Description | -|--------|-------------| -| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | -| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | -| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | -| `--auth` | Authentication method (e.g., `oauth`) | -| `--enabled` | Enable immediately (default) | -| `--disabled` | Add in disabled state | - -## Example: Web Search with Tavily - -Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): +## Next Steps -```bash -openhands mcp add tavily --transport stdio \ - npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" -``` +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture -## Manual Configuration +### Apptainer Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md -You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### Configuration Format +> A ready-to-run example is available [here](#basic-apptainer-sandbox-example)! -The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): +The Apptainer sandboxed agent server demonstrates how to run agents in isolated Apptainer containers using ApptainerWorkspace. -```json -{ - "mcpServers": { - "server-name": { - "command": "command-to-run", - "args": ["arg1", "arg2"], - "env": { - "ENV_VAR": "value" - } - } - } -} -``` +Apptainer (formerly Singularity) is a container runtime designed for HPC environments that doesn't require root access, making it ideal for shared computing environments, university clusters, and systems where Docker is not available. -### Example Configuration +## When to Use Apptainer -```json -{ - "mcpServers": { - "tavily-remote": { - "command": "npx", - "args": [ - "-y", - "mcp-remote", - "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" - ] - }, - "local-tools": { - "command": "python", - "args": ["-m", "my_mcp_tools"], - "env": { - "DEBUG": "true" - } - } - } -} -``` +Use Apptainer instead of Docker when: +- Running on HPC clusters or shared computing environments +- Root access is not available +- Docker daemon cannot be installed +- Working in academic or research computing environments +- Security policies restrict Docker usage -## Interactive `/mcp` Command +## Prerequisites -Within an OpenHands conversation, use `/mcp` to view server status: +Before running this example, ensure you have: +- Apptainer installed ([Installation Guide](https://apptainer.org/docs/user/main/quick_start.html)) +- LLM API key set in environment -- **View active servers**: Shows which MCP servers are currently active in the conversation -- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts +## Basic Apptainer Sandbox Example -The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. +This example is available on GitHub: [examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py) -## Workflow - -1. **Add servers** using `openhands mcp add` -2. **Start a conversation** with `openhands` -3. **Check status** with `/mcp` inside the conversation -4. **Use the tools** provided by your MCP servers - -The agent will automatically have access to tools provided by enabled MCP servers. +This example shows how to create an `ApptainerWorkspace` that automatically manages Apptainer containers for agent execution: -## Troubleshooting +```python icon="python" expandable examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py +import os +import platform +import time -### Server Not Appearing +from pydantic import SecretStr -1. Verify the server is enabled: - ```bash - openhands mcp list - ``` +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import ApptainerWorkspace -2. Check the configuration: - ```bash - openhands mcp get - ``` -3. Restart the conversation to load new configurations +logger = get_logger(__name__) -### Server Fails to Start +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -1. Test the command manually: - ```bash - # For stdio servers - python -m my_mcp_server - - # For HTTP servers, check the URL is reachable - curl https://api.example.com/mcp - ``` +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -2. Check environment variables and credentials -3. Review error messages in the CLI output +def detect_platform(): + """Detects the correct platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" -### Configuration File Location -The MCP configuration is stored at: -- **Config file**: `~/.openhands/mcp.json` +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" -## See Also -- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation -- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration -- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference +# 2) Create an Apptainer-based remote workspace that will set up and manage +# the Apptainer container automatically. Use `ApptainerWorkspace` with a +# pre-built agent server image. +# Apptainer (formerly Singularity) doesn't require root access, making it +# ideal for HPC and shared computing environments. +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with ApptainerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) -### Quick Start -Source: https://docs.openhands.dev/openhands/usage/cli/quick-start.md + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} - -**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. - + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() -## Overview + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) -The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") -| Mode | Command | Best For | -|------|---------|----------| -| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | -| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | -| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | -| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | -| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") - + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") -## Your First Conversation + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") -**Set up your account** (first time only): + # Report cost (must be before conversation.close()) + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` - - - ```bash - openhands login - ``` - This authenticates with OpenHands Cloud and fetches your settings. - - - The CLI will prompt you to configure your LLM provider and API key on first run. - - + -1. **Start the CLI:** - ```bash - openhands - ``` +## Configuration Options -2. **Enter a task:** - ``` - Create a Python script that prints "Hello, World!" - ``` +The `ApptainerWorkspace` supports several configuration options: -3. **Watch OpenHands work:** - The agent will create the file and show you the results. +### Option 1: Pre-built Image (Recommended) -## Controls +Use a pre-built agent server image for fastest startup: -Once inside the CLI, use these controls: +```python icon="python" focus={2} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, +) as workspace: + # Your code here +``` -| Control | Description | -|---------|-------------| -| `Ctrl+P` | Open command palette (access Settings, MCP status) | -| `Esc` | Pause the running agent | -| `Ctrl+Q` or `/exit` | Exit the CLI | +### Option 2: Build from Base Image -## Starting with a Task +Build from a base image when you need custom dependencies: -You can start the CLI with an initial task: +```python icon="python" focus={2} +with ApptainerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) as workspace: + # Your code here +``` -```bash -# Start with a task -openhands -t "Fix the bug in auth.py" + +Building from a base image requires internet access and may take several minutes on first run. The built image is cached for subsequent runs. + -# Start with a task from a file -openhands -f task.txt -``` +### Option 3: Use Existing SIF File -## Resuming Conversations +If you have a pre-built Apptainer SIF file: -Resume a previous conversation: +```python icon="python" focus={2} +with ApptainerWorkspace( + sif_file="/path/to/your/agent-server.sif", + host_port=8010, +) as workspace: + # Your code here +``` -```bash -# List recent conversations and select one -openhands --resume +## Key Features -# Resume the most recent conversation -openhands --resume --last +### Rootless Container Execution -# Resume a specific conversation by ID -openhands --resume abc123def456 -``` +Apptainer runs completely without root privileges: +- No daemon process required +- User namespace isolation +- Compatible with most HPC security policies -For more details, see [Resume Conversations](/openhands/usage/cli/resume). +### Image Caching -## Next Steps +Apptainer automatically caches container images: +- First run builds/pulls the image +- Subsequent runs reuse cached SIF files +- Cache location: `~/.cache/apptainer/` - - - Learn about the interactive terminal interface - - - Use OpenHands in Zed, VS Code, or JetBrains - - - Automate tasks with scripting - - - Add tools via Model Context Protocol - - +### Port Mapping -### Resume Conversations -Source: https://docs.openhands.dev/openhands/usage/cli/resume.md +The workspace exposes ports for agent services: +```python icon="python" focus={1, 3} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, # Maps to container port 8010 +) as workspace: + # Access agent server at http://localhost:8010 +``` -## Overview +## Differences from Docker -OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. +While the API is similar to DockerWorkspace, there are some differences: -## Listing Previous Conversations +| Feature | Docker | Apptainer | +|---------|--------|-----------| +| Root access required | Yes (daemon) | No | +| Installation | Requires Docker Engine | Single binary | +| Image format | OCI/Docker | SIF | +| Build speed | Fast (layers) | Slower (monolithic) | +| HPC compatibility | Limited | Excellent | +| Networking | Bridge/overlay | Host networking | -To see a list of your recent conversations, run: +## Troubleshooting -```bash -openhands --resume -``` +### Apptainer Not Found -This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: +If you see `apptainer: command not found`: +1. Install Apptainer following the [official guide](https://apptainer.org/docs/user/main/quick_start.html) +2. Ensure it's in your PATH: `which apptainer` -``` -Recent Conversations: --------------------------------------------------------------------------------- - 1. abc123def456 (2h ago) - Fix the login bug in auth.py +### Permission Errors - 2. xyz789ghi012 (yesterday) - Add unit tests for the user service +Apptainer should work without root. If you see permission errors: +- Check that your user has access to `/tmp` +- Verify Apptainer is properly installed: `apptainer version` +- Ensure the cache directory is writable: `ls -la ~/.cache/apptainer/` - 3. mno345pqr678 (3 days ago) - Refactor the database connection module --------------------------------------------------------------------------------- -To resume a conversation, use: openhands --resume -``` +## Next Steps -## Resuming a Specific Conversation +- **[Docker Sandbox](/sdk/guides/agent-server/docker-sandbox)** - Alternative container runtime +- **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing +- **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution -To resume a specific conversation, use the `--resume` flag with the conversation ID: +### OpenHands Cloud Workspace +Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md -```bash -openhands --resume -``` +> A ready-to-run example is available [here](#ready-to-run-example)! -For example: +The `OpenHandsCloudWorkspace` demonstrates how to use the [OpenHands Cloud](https://app.all-hands.dev) to provision and manage sandboxed environments for agent execution. This provides a seamless experience with automatic sandbox provisioning, monitoring, and secure execution without managing your own infrastructure. -```bash -openhands --resume abc123def456 -``` +## Key Concepts -## Resuming the Latest Conversation +### OpenHandsCloudWorkspace -To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: +The `OpenHandsCloudWorkspace` connects to OpenHands Cloud to provision sandboxes: -```bash -openhands --resume --last +```python icon="python" focus={1-2} +with OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, +) as workspace: ``` -This automatically finds and resumes the most recent conversation. +This workspace type: +- Connects to OpenHands Cloud API +- Automatically provisions sandboxed environments +- Manages sandbox lifecycle (create, poll status, delete) +- Handles all infrastructure concerns -## How It Works +### Getting Your API Key -When you resume a conversation: +To use OpenHands Cloud, you need an API key: -1. OpenHands loads the full conversation history from disk -2. The agent has access to all previous context, including: - - Your previous messages and requests - - The agent's responses and actions - - Any files that were created or modified -3. You can continue the conversation as if you never left +1. Go to [app.all-hands.dev](https://app.all-hands.dev) +2. Sign in to your account +3. Navigate to Settings → API Keys +4. Create a new API key - -The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. - +Store this key securely and use it as the `OPENHANDS_CLOUD_API_KEY` environment variable. -## Resuming in Different Modes -### Terminal Mode +### Configuration Options -```bash -openhands --resume abc123def456 -openhands --resume --last -``` +The `OpenHandsCloudWorkspace` supports several configuration options: -### ACP Mode (IDEs) +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `cloud_api_url` | `str` | Required | OpenHands Cloud API URL | +| `cloud_api_key` | `str` | Required | API key for authentication | +| `sandbox_spec_id` | `str \| None` | `None` | Custom sandbox specification ID | +| `init_timeout` | `float` | `300.0` | Timeout for sandbox initialization (seconds) | +| `api_timeout` | `float` | `60.0` | Timeout for API requests (seconds) | +| `keep_alive` | `bool` | `False` | Keep sandbox running after cleanup | -```bash -openhands acp --resume abc123def456 -openhands acp --resume --last -``` +### Keep Alive Mode -For IDE-specific configurations, see: -- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) -- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) -- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) +By default, the sandbox is deleted when the workspace is closed. To keep it running: -### With Confirmation Modes +```python icon="python" focus={4} +workspace = OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, + keep_alive=True, +) +``` -Combine `--resume` with confirmation mode flags: +This is useful for debugging or when you want to inspect the sandbox state after execution. -```bash -# Resume with LLM-based approval -openhands --resume abc123def456 --llm-approve +### Workspace Testing -# Resume with auto-approve -openhands --resume --last --always-approve +You can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") ``` -## Tips +This verifies connectivity to the cloud sandbox and ensures the environment is ready. - -**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. - +## Comparison with Other Workspace Types - -**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. - +| Feature | OpenHandsCloudWorkspace | APIRemoteWorkspace | DockerWorkspace | +|---------|------------------------|-------------------|-----------------| +| Infrastructure | OpenHands Cloud | Runtime API | Local Docker | +| Authentication | API Key | API Key | None | +| Setup Required | None | Runtime API access | Docker installed | +| Custom Images | Via sandbox specs | Direct image specification | Direct image specification | +| Best For | Production use | Custom runtime environments | Local development | -## Storage Location +## Ready-to-run Example -Conversations are stored in: + +This example is available on GitHub: [examples/02_remote_agent_server/07_convo_with_cloud_workspace.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/07_convo_with_cloud_workspace.py) + -``` -~/.openhands/conversations/ -├── abc123def456/ -│ └── conversation.json -├── xyz789ghi012/ -│ └── conversation.json -└── ... -``` +This example shows how to connect to OpenHands Cloud for fully managed agent execution: -## See Also +```python icon="python" expandable examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +"""Example: OpenHandsCloudWorkspace for OpenHands Cloud API. -- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage -- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs -- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference +This example demonstrates using OpenHandsCloudWorkspace to provision a sandbox +via OpenHands Cloud (app.all-hands.dev) and run an agent conversation. -### Terminal (CLI) -Source: https://docs.openhands.dev/openhands/usage/cli/terminal.md +Usage: + uv run examples/02_remote_agent_server/06_convo_with_cloud_workspace.py -## Overview +Requirements: + - LLM_API_KEY: API key for direct LLM provider access (e.g., Anthropic API key) + - OPENHANDS_CLOUD_API_KEY: API key for OpenHands Cloud access -The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. +Note: + The LLM configuration is sent to the cloud sandbox, so you need an API key + that works directly with the LLM provider (not a local proxy). If using + Anthropic, set LLM_API_KEY to your Anthropic API key. +""" -```bash -openhands -``` +import os +import time -## Features +from pydantic import SecretStr -- **Real-time interaction**: Type natural language tasks and receive instant feedback -- **Live status monitoring**: Watch the agent's progress as it works -- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import OpenHandsCloudWorkspace -## Command Palette -Press `Ctrl+P` to open the command palette, then select from the dropdown options: +logger = get_logger(__name__) -| Option | Description | -|--------|-------------| -| **Settings** | Open the settings configuration menu | -| **MCP** | View MCP server status | -## Controls +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" -| Control | Action | -|---------|--------| -| `Ctrl+P` | Open command palette | -| `Esc` | Pause the running agent | -| `Ctrl+Q` or `/exit` | Exit the CLI | +# Note: Don't use a local proxy URL here - the cloud sandbox needs direct access +# to the LLM provider. Use None for base_url to let LiteLLM use the default +# provider endpoint, or specify the provider's direct URL. +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL") or None, + api_key=SecretStr(api_key), +) -## Starting with a Task +cloud_api_key = os.getenv("OPENHANDS_CLOUD_API_KEY") +if not cloud_api_key: + logger.error("OPENHANDS_CLOUD_API_KEY required") + exit(1) -Start a conversation with an initial task: +cloud_api_url = os.getenv("OPENHANDS_CLOUD_API_URL", "https://app.all-hands.dev") +logger.info(f"Using OpenHands Cloud API: {cloud_api_url}") -```bash -# Provide a task directly -openhands -t "Create a REST API for user management" +with OpenHandsCloudWorkspace( + cloud_api_url=cloud_api_url, + cloud_api_key=cloud_api_key, +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} -# Load task from a file -openhands -f requirements.txt -``` + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() -## Confirmation Modes + result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") -Control how the agent requests approval for actions: + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) -```bash -# Default: Always ask for confirmation -openhands + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() -# Auto-approve all actions (use with caution) -openhands --always-approve + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) -# Use LLM-based security analyzer -openhands --llm-approve + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() + + logger.info("✅ Conversation completed successfully.") + logger.info(f"Total {len(received_events)} events received during conversation.") ``` -## Resuming Conversations -Resume previous conversations: +```bash Running the Example +export LLM_API_KEY="your-llm-api-key" +export OPENHANDS_CLOUD_API_KEY="your-cloud-api-key" +# Optional: specify a custom sandbox spec +# export OPENHANDS_SANDBOX_SPEC_ID="your-sandbox-spec-id" +cd agent-sdk +uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +``` -```bash -# List recent conversations -openhands --resume +## Next Steps -# Resume the most recent -openhands --resume --last +- **[API-based Sandbox](/sdk/guides/agent-server/api-sandbox)** - Connect to Runtime API service +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run locally with Docker +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details -# Resume a specific conversation -openhands --resume abc123def456 -``` +### Custom Tools with Remote Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md -For more details, see [Resume Conversations](/openhands/usage/cli/resume). +> A ready-to-run example is available [here](#ready-to-run-example)! -## Tips - -Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. - +When using a [remote agent server](/sdk/guides/agent-server/overview), custom tools must be available in the server's Python environment. This guide shows how to build a custom base image with your tools and use `DockerDevWorkspace` to automatically build the agent server on top of it. - -Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. - + +For standalone custom tools (without remote agent server), see the [Custom Tools guide](/sdk/guides/custom-tools). + -## See Also +## How It Works -- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI -- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers -- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation +1. **Define custom tool** with `register_tool()` at module level +2. **Create Dockerfile** that copies tools and sets `PYTHONPATH` +3. **Build custom base image** with your tools +4. **Use `DockerDevWorkspace`** with `base_image` parameter - it builds the agent server on top +5. **Import tool module** in client before creating conversation +6. **Server imports modules** dynamically, triggering registration -### Web Interface -Source: https://docs.openhands.dev/openhands/usage/cli/web-interface.md +## Key Files -## Overview +### Custom Tool (`custom_tools/log_data.py`) -The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: -- Access the CLI remotely -- Share your terminal session -- Use the CLI on devices without a full terminal +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py +"""Log Data Tool - Example custom tool for logging structured data to JSON. -```bash -openhands web -``` +This tool demonstrates how to create a custom tool that logs structured data +to a local JSON file during agent execution. The data can be retrieved and +verified after the agent completes. +""" - -This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. - +import json +from collections.abc import Sequence +from datetime import UTC, datetime +from enum import Enum +from pathlib import Path +from typing import Any -## Basic Usage +from pydantic import Field -```bash -# Start on default port (12000) -openhands web +from openhands.sdk import ( + Action, + ImageContent, + Observation, + TextContent, + ToolDefinition, +) +from openhands.sdk.tool import ToolExecutor, register_tool -# Access at http://localhost:12000 -``` -## Options +# --- Enums and Models --- -| Option | Default | Description | -|--------|---------|-------------| -| `--host` | `0.0.0.0` | Host address to bind to | -| `--port` | `12000` | Port number to use | -| `--debug` | `false` | Enable debug mode | -## Examples +class LogLevel(str, Enum): + """Log level for entries.""" -```bash -# Custom port -openhands web --port 8080 + DEBUG = "debug" + INFO = "info" + WARNING = "warning" + ERROR = "error" -# Bind to localhost only (more secure) -openhands web --host 127.0.0.1 -# Enable debug mode -openhands web --debug +class LogDataAction(Action): + """Action to log structured data to a JSON file.""" -# Full example with custom host and port -openhands web --host 0.0.0.0 --port 3000 -``` + message: str = Field(description="The log message") + level: LogLevel = Field( + default=LogLevel.INFO, + description="Log level (debug, info, warning, error)", + ) + data: dict[str, Any] = Field( + default_factory=dict, + description="Additional structured data to include in the log entry", + ) -## Remote Access -To access the web interface from another machine: +class LogDataObservation(Observation): + """Observation returned after logging data.""" -1. Start with `--host 0.0.0.0` to bind to all interfaces: - ```bash - openhands web --host 0.0.0.0 --port 12000 - ``` + success: bool = Field(description="Whether the data was successfully logged") + log_file: str = Field(description="Path to the log file") + entry_count: int = Field(description="Total number of entries in the log file") -2. Access from another machine using the host's IP: - ``` - http://:12000 - ``` + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + """Convert observation to LLM content.""" + if self.success: + return [ + TextContent( + text=( + f"✅ Data logged successfully to {self.log_file}\n" + f"Total entries: {self.entry_count}" + ) + ) + ] + return [TextContent(text="❌ Failed to log data")] - -When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. - -## Use Cases +# --- Executor --- -### Development on Remote Servers +# Default log file path +DEFAULT_LOG_FILE = "/tmp/agent_data.json" -Access OpenHands on a remote development server through your local browser: -```bash -# On remote server -openhands web --host 0.0.0.0 --port 12000 +class LogDataExecutor(ToolExecutor[LogDataAction, LogDataObservation]): + """Executor that logs structured data to a JSON file.""" -# On local machine, use SSH tunnel -ssh -L 12000:localhost:12000 user@remote-server + def __init__(self, log_file: str = DEFAULT_LOG_FILE): + """Initialize the log data executor. -# Access at http://localhost:12000 -``` + Args: + log_file: Path to the JSON log file + """ + self.log_file = Path(log_file) -### Sharing Sessions + def __call__( + self, + action: LogDataAction, + conversation=None, # noqa: ARG002 + ) -> LogDataObservation: + """Execute the log data action. -Run the web interface on a shared server for team access: + Args: + action: The log data action + conversation: Optional conversation context (not used) -```bash -openhands web --host 0.0.0.0 --port 8080 -``` + Returns: + LogDataObservation with the result + """ + # Load existing entries or start fresh + entries: list[dict[str, Any]] = [] + if self.log_file.exists(): + try: + with open(self.log_file) as f: + entries = json.load(f) + except (json.JSONDecodeError, OSError): + entries = [] -## Comparison: Web Interface vs GUI Server + # Create new entry with timestamp + entry = { + "timestamp": datetime.now(UTC).isoformat(), + "level": action.level.value, + "message": action.message, + "data": action.data, + } + entries.append(entry) -| Feature | `openhands web` | `openhands serve` | -|---------|-----------------|-------------------| -| Interface | Terminal UI in browser | Full web GUI | -| Dependencies | None | Docker required | -| Resources | Lightweight | Full container | -| Best for | Quick access | Rich GUI experience | + # Write back to file + self.log_file.parent.mkdir(parents=True, exist_ok=True) + with open(self.log_file, "w") as f: + json.dump(entries, f, indent=2) -## See Also + return LogDataObservation( + success=True, + log_file=str(self.log_file), + entry_count=len(entries), + ) -- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage -- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker -- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options -## OpenHands Software Agent SDK +# --- Tool Definition --- -### Software Agent SDK -Source: https://docs.openhands.dev/sdk.md +_LOG_DATA_DESCRIPTION = """Log structured data to a JSON file. -The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. +Use this tool to record information, findings, or events during your work. +Each log entry includes a timestamp and can contain arbitrary structured data. -You can use the OpenHands Software Agent SDK for: +Parameters: +* message: A descriptive message for the log entry +* level: Log level - one of 'debug', 'info', 'warning', 'error' (default: info) +* data: Optional dictionary of additional structured data to include -- One-off tasks, like building a README for your repo -- Routine maintenance tasks, like updating dependencies -- Major tasks that involve multiple agents, like refactors and rewrites +Example usage: +- Log a finding: message="Found potential issue", level="warning", data={"file": "app.py", "line": 42} +- Log progress: message="Completed analysis", level="info", data={"files_checked": 10} +""" # noqa: E501 -You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). -Get started with some examples or keep reading to learn more. +class LogDataTool(ToolDefinition[LogDataAction, LogDataObservation]): + """Tool for logging structured data to a JSON file.""" -## Features + @classmethod + def create(cls, conv_state, **params) -> Sequence[ToolDefinition]: # noqa: ARG003 + """Create LogDataTool instance. - - - A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. - - - Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. - - - A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. - - + Args: + conv_state: Conversation state (not used in this example) + **params: Additional parameters: + - log_file: Path to the JSON log file (default: /tmp/agent_data.json) -## Why OpenHands Software Agent SDK? + Returns: + A sequence containing a single LogDataTool instance + """ + log_file = params.get("log_file", DEFAULT_LOG_FILE) + executor = LogDataExecutor(log_file=log_file) -### Emphasis on coding + return [ + cls( + description=_LOG_DATA_DESCRIPTION, + action_type=LogDataAction, + observation_type=LogDataObservation, + executor=executor, + ) + ] -While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. -While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. +# Auto-register the tool when this module is imported +# This is what enables dynamic tool registration in the remote agent server +register_tool("LogDataTool", LogDataTool) +``` -### State-of-the-Art Performance +### Dockerfile -OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: +```dockerfile icon="docker" +FROM nikolaik/python-nodejs:python3.12-nodejs22 -- Task planning and decomposition -- Automatic context compression -- Security analysis -- Strong agent-computer interfaces +COPY custom_tools /app/custom_tools +ENV PYTHONPATH="/app:${PYTHONPATH}" +``` -OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. +## Troubleshooting -### Free and Open Source +| Issue | Solution | +|-------|----------| +| Tool not found | Ensure `register_tool()` is called at module level, import tool before creating conversation | +| Import errors on server | Check `PYTHONPATH` in Dockerfile, verify all dependencies installed | +| Build failures | Verify file paths in `COPY` commands, ensure Python 3.12+ | -OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. + +**Binary Mode Limitation**: Custom tools only work with **source mode** deployments. When using `DockerDevWorkspace`, set `target="source"` (the default). See [GitHub issue #1531](https://github.com/OpenHands/software-agent-sdk/issues/1531) for details. + -Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! +## Ready-to-run Example -## Get Started + +This example is available on GitHub: [examples/02_remote_agent_server/06_custom_tool/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/02_remote_agent_server/06_custom_tool) + - - - Install the SDK, run your first agent, and explore the guides. - - +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tool_example.py +"""Example: Using custom tools with remote agent server. -## Learn the SDK +This example demonstrates how to use custom tools with a remote agent server +by building a custom base image that includes the tool implementation. - - - Understand the SDK's architecture: agents, tools, workspaces, and more. - - - Explore the complete SDK API and source code. - - +Prerequisites: + 1. Build the custom base image first: + cd examples/02_remote_agent_server/05_custom_tool + ./build_custom_image.sh -## Build with Examples + 2. Set LLM_API_KEY environment variable - - - Build local agents with custom tools and capabilities. - - - Run agents on remote servers with Docker sandboxing. - - - Automate repository tasks with agent-powered workflows. - - +The workflow is: +1. Define a custom tool (LogDataTool for logging structured data to JSON) +2. Create a simple Dockerfile that copies the tool into the base image +3. Build the custom base image +4. Use DockerDevWorkspace with base_image pointing to the custom image +5. DockerDevWorkspace builds the agent server on top of the custom base image +6. The server dynamically registers tools when the client creates a conversation +7. The agent can use the custom tool during execution +8. Verify the logged data by reading the JSON file from the workspace -## Community +This pattern is useful for: +- Collecting structured data during agent runs (logs, metrics, events) +- Implementing custom integrations with external systems +- Adding domain-specific operations to the agent +""" - - - Connect with the OpenHands community on Slack. - - - Contribute to the SDK or report issues on GitHub. - - +import os +import platform +import subprocess +import sys +import time +from pathlib import Path -### openhands.sdk.agent -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md +from pydantic import SecretStr -### class Agent +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + Tool, + get_logger, +) +from openhands.workspace import DockerDevWorkspace -Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) -Main agent implementation for OpenHands. +logger = get_logger(__name__) -The Agent class provides the core functionality for running AI agents that can -interact with tools, process messages, and execute actions. It inherits from -AgentBase and implements the agent execution logic. Critic-related functionality -is provided by CriticMixin. +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -#### Example +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -```pycon ->>> from openhands.sdk import LLM, Agent, Tool ->>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) ->>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] ->>> agent = Agent(llm=llm, tools=tools) -``` +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" -#### Properties -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +# Get the directory containing this script +example_dir = Path(__file__).parent.absolute() -#### Methods +# Custom base image tag (contains custom tools, agent server built on top) +CUSTOM_BASE_IMAGE_TAG = "custom-base-image:latest" -#### init_state() +# 2) Check if custom base image exists, build if not +logger.info(f"🔍 Checking for custom base image: {CUSTOM_BASE_IMAGE_TAG}") +result = subprocess.run( + ["docker", "images", "-q", CUSTOM_BASE_IMAGE_TAG], + capture_output=True, + text=True, + check=False, +) -Initialize conversation state. +if not result.stdout.strip(): + logger.info("⚠️ Custom base image not found. Building...") + logger.info("📦 Building custom base image with custom tools...") + build_script = example_dir / "build_custom_image.sh" + try: + subprocess.run( + [str(build_script), CUSTOM_BASE_IMAGE_TAG], + cwd=str(example_dir), + check=True, + ) + logger.info("✅ Custom base image built successfully!") + except subprocess.CalledProcessError as e: + logger.error(f"❌ Failed to build custom base image: {e}") + logger.error("Please run ./build_custom_image.sh manually and fix any errors.") + sys.exit(1) +else: + logger.info(f"✅ Custom base image found: {CUSTOM_BASE_IMAGE_TAG}") -Invariants enforced by this method: -- If a SystemPromptEvent is already present, it must be within the first 3 +# 3) Create a DockerDevWorkspace with the custom base image +# DockerDevWorkspace will build the agent server on top of this base image +logger.info("🚀 Building and starting agent server with custom tools...") +logger.info("📦 This may take a few minutes on first run...") - events (index 0 or 1 in practice; index 2 is included in the scan window - to detect a user message appearing before the system prompt). -- A user MessageEvent should not appear before the SystemPromptEvent. +with DockerDevWorkspace( + base_image=CUSTOM_BASE_IMAGE_TAG, + host_port=8011, + platform=detect_platform(), + target="source", # NOTE: "binary" target does not work with custom tools +) as workspace: + logger.info("✅ Custom agent server started!") -These invariants keep event ordering predictable for downstream components -(condenser, UI, etc.) and also prevent accidentally materializing the full -event history during initialization. + # 4) Import custom tools to register them in the client's registry + # This allows the client to send the module qualname to the server + # The server will then import the same module and execute the tool + import custom_tools.log_data # noqa: F401 -#### model_post_init() + # 5) Create agent with custom tools + # Note: We specify the tool here, but it's actually executed on the server + # Get default tools and add our custom tool + from openhands.sdk import Agent + from openhands.tools.preset.default import get_default_condenser, get_default_tools -This function is meant to behave like a BaseModel method to initialise private attributes. + tools = get_default_tools(enable_browser=False) + # Add our custom tool! + tools.append(Tool(name="LogDataTool")) -It takes context as an argument since that’s what pydantic-core passes when calling it. + agent = Agent( + llm=llm, + tools=tools, + system_prompt_kwargs={"cli_mode": True}, + condenser=get_default_condenser( + llm=llm.model_copy(update={"usage_id": "condenser"}) + ), + ) -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. + # 6) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} -#### step() + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() -Taking a step in the conversation. + # 7) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Custom agent server ready!' && python --version" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") -Typically this involves: -1. Making a LLM call -2. Executing the tool -3. Updating the conversation state with + # 8) Create conversation with the custom agent + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) - LLM calls (role=”assistant”) and tool results (role=”tool”) + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") -4.1 If conversation is finished, set state.execution_status to FINISHED -4.2 Otherwise, just return, Conversation will kick off the next step + logger.info("📝 Sending task to analyze files and log findings...") + conversation.send_message( + "Please analyze the Python files in the current directory. " + "Use the LogDataTool to log your findings as you work. " + "For example:\n" + "- Log when you start analyzing a file (level: info)\n" + "- Log any interesting patterns you find (level: info)\n" + "- Log any potential issues (level: warning)\n" + "- Include relevant data like file names, line numbers, etc.\n\n" + "Make at least 3 log entries using the LogDataTool." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ Task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") -If the underlying LLM supports streaming, partial deltas are forwarded to -`on_token` before the full response is returned. + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") -NOTE: state will be mutated in-place. + # 9) Read the logged data from the JSON file using file_download API + logger.info("\n📊 Logged Data Summary:") + logger.info("=" * 80) -### class AgentBase + # Download the log file from the workspace using the file download API + import json + import tempfile -Bases: `DiscriminatedUnionMixin`, `ABC` + with tempfile.NamedTemporaryFile( + mode="w", suffix=".json", delete=False + ) as tmp_file: + local_path = tmp_file.name -Abstract base class for OpenHands agents. + download_result = workspace.file_download( + source_path="/tmp/agent_data.json", + destination_path=local_path, + ) -Agents are stateless and should be fully defined by their configuration. -This base class provides the common interface and functionality that all -agent implementations must follow. + if download_result.success: + try: + with open(local_path) as f: + log_entries = json.load(f) + logger.info(f"Found {len(log_entries)} log entries:\n") + for i, entry in enumerate(log_entries, 1): + logger.info(f"Entry {i}:") + logger.info(f" Timestamp: {entry.get('timestamp', 'N/A')}") + logger.info(f" Level: {entry.get('level', 'N/A')}") + logger.info(f" Message: {entry.get('message', 'N/A')}") + if entry.get("data"): + logger.info(f" Data: {json.dumps(entry['data'], indent=4)}") + logger.info("") + except json.JSONDecodeError: + logger.info("Log file exists but couldn't parse JSON") + with open(local_path) as f: + logger.info(f"Raw content: {f.read()}") + finally: + # Clean up the temporary file + Path(local_path).unlink(missing_ok=True) + else: + logger.info("No log file found (agent may not have used the tool)") + if download_result.error: + logger.debug(f"Download error: {download_result.error}") + logger.info("=" * 80) -#### Properties + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") -- `agent_context`: AgentContext | None -- `condenser`: CondenserBase | None -- `critic`: CriticBase | None -- `dynamic_context`: str | None - Get the dynamic per-conversation context. - This returns the context that varies between conversations, such as: - - Repository information and skills - - Runtime information (hosts, working directory) - - User-specific secrets and settings - - Conversation instructions - This content should NOT be included in the cached system prompt to enable - cross-conversation cache sharing. Instead, it is sent as a second content - block (without a cache marker) inside the system message. - * Returns: - The dynamic context string, or None if no context is configured. -- `filter_tools_regex`: str | None -- `include_default_tools`: list[str] -- `llm`: LLM -- `mcp_config`: dict[str, Any] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: str - Returns the name of the Agent. -- `prompt_dir`: str - Returns the directory where this class’s module file is located. -- `security_policy_filename`: str -- `static_system_message`: str - Compute the static portion of the system message. - This returns only the base system prompt template without any dynamic - per-conversation context. This static portion can be cached and reused - across conversations for better prompt caching efficiency. - * Returns: - The rendered system prompt template without dynamic context. -- `system_message`: str - Return the combined system message (static + dynamic). -- `system_prompt_filename`: str -- `system_prompt_kwargs`: dict[str, object] -- `tools`: list[Tool] -- `tools_map`: dictstr, [ToolDefinition] - Get the initialized tools map. - :raises RuntimeError: If the agent has not been initialized. + finally: + logger.info("\n🧹 Cleaning up conversation...") + conversation.close() -#### Methods +logger.info("\n✅ Example completed successfully!") +logger.info("\nThis example demonstrated how to:") +logger.info("1. Create a custom tool that logs structured data to JSON") +logger.info("2. Build a simple base image with the custom tool") +logger.info("3. Use DockerDevWorkspace with base_image to build agent server on top") +logger.info("4. Enable dynamic tool registration on the server") +logger.info("5. Use the custom tool during agent execution") +logger.info("6. Read the logged data back from the workspace") +``` -#### get_all_llms() +```bash Running the Example +# Build the custom base image first +cd examples/02_remote_agent_server/06_custom_tool +./build_custom_image.sh -Recursively yield unique base-class LLM objects reachable from self. +# Run the example +export LLM_API_KEY="your-api-key" +uv run python custom_tool_example.py +``` -- Returns actual object references (not copies). -- De-dupes by id(LLM). -- Cycle-safe via a visited set for all traversed objects. -- Only yields objects whose type is exactly LLM (no subclasses). -- Does not handle dataclasses. -#### init_state() +## Next Steps -Initialize the empty conversation state to prepare the agent for user -messages. +- **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers -Typically this involves adding system message +### Docker Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md -NOTE: state will be mutated in-place. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### model_dump_succint() +The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using `DockerWorkspace`. -Like model_dump, but excludes None fields by default. +This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. -#### model_post_init() +Use `DockerWorkspace` with a pre-built agent server image for the fastest startup. When you need to build your own image from a base image, switch to `DockerDevWorkspace`. -This function is meant to behave like a BaseModel method to initialise private attributes. +the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. -It takes context as an argument since that’s what pydantic-core passes when calling it. +## 1) Basic Docker Sandbox -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +> A ready-to-run example is available [here](#ready-to-run-example-docker-sandbox)! -#### abstractmethod step() +### Key Concepts -Taking a step in the conversation. +#### DockerWorkspace Context Manager -Typically this involves: -1. Making a LLM call -2. Executing the tool -3. Updating the conversation state with +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: - LLM calls (role=”assistant”) and tool results (role=”tool”) +```python icon="python" +with DockerWorkspace( + # use pre-built image for faster startup (recommended) + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` -4.1 If conversation is finished, set state.execution_status to FINISHED -4.2 Otherwise, just return, Conversation will kick off the next step +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done -If the underlying LLM supports streaming, partial deltas are forwarded to -`on_token` before the full response is returned. +#### Platform Detection -NOTE: state will be mutated in-place. +The example includes platform detection to ensure the correct Docker image is built and used: -#### Deprecated -Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and -[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation -enables cross-conversation prompt caching. Will be removed in 1.16.0. +```python icon="python" +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` -#### WARNING -Using this property DISABLES cross-conversation prompt caching because -it combines static and dynamic content into a single string. Use -[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately -to enable caching. +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). -#### Deprecated -Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. -#### verify() +#### Testing the Workspace -Verify that we can resume this agent from persisted state. +Before creating a conversation, the example tests the workspace connection: -We do not merge configuration between persisted and runtime Agent -instances. Instead, we verify compatibility requirements and then -continue with the runtime-provided Agent. +```python icon="python" +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed" + f"with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` -Compatibility requirements: -- Agent class/type must match. -- Tools must match exactly (same tool names). +This verifies the workspace is properly initialized and can execute commands. -Tools are part of the system prompt and cannot be changed mid-conversation. -To use different tools, start a new conversation or use conversation forking -(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). - -All other configuration (LLM, agent_context, condenser, etc.) can be -freely changed between sessions. - -* Parameters: - * `persisted` – The agent loaded from persisted state. - * `events` – Unused, kept for API compatibility. -* Returns: - This runtime agent (self) if verification passes. -* Raises: - `ValueError` – If agent class or tools don’t match. - -### openhands.sdk.conversation -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md +#### Automatic RemoteConversation -### class BaseConversation +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: -Bases: `ABC` +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` -Abstract base class for conversation implementations. +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. -This class defines the interface that all conversation implementations must follow. -Conversations manage the interaction between users and agents, handling message -exchange, execution control, and state management. +#### DockerWorkspace vs DockerDevWorkspace -#### Properties +Use `DockerWorkspace` when you can rely on the official pre-built images for the agent server. Switch to `DockerDevWorkspace` when you need to build or customize the image on-demand (slower startup, requires the SDK source tree and Docker build support). -- `confirmation_policy_active`: bool -- `conversation_stats`: ConversationStats -- `id`: UUID -- `is_confirmation_mode_active`: bool - Check if confirmation mode is active. - Returns True if BOTH conditions are met: - 1. The conversation state has a security analyzer set (not None) - 2. The confirmation policy is active -- `state`: ConversationStateProtocol +```python icon="python" +# ✅ Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) -#### Methods +# 🛠️ Custom: Build on the fly (requires SDK tooling) +DockerDevWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + target="source", +) +``` -#### __init__() +### Ready-tu-run Example Docker Sandbox + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + -Initialize the base conversation with span tracking. +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: -#### abstractmethod ask_agent() +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time -Ask the agent a simple, stateless question and get a direct LLM response. +from pydantic import SecretStr -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace -* Parameters: - `question` – A simple string question to ask the agent -* Returns: - A string response from the agent -#### abstractmethod close() +logger = get_logger(__name__) -#### static compose_callbacks() +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -Compose multiple callbacks into a single callback function. +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -* Parameters: - `callbacks` – An iterable of callback functions -* Returns: - A single callback function that calls all provided callbacks -#### abstractmethod condense() +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" -Force condensation of the conversation history. -This method uses the existing condensation request pattern to trigger -condensation. It adds a CondensationRequest event to the conversation -and forces the agent to take a single step to process it. +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" -The condensation will be applied immediately and will modify the conversation -state by adding a condensation event to the history. -* Raises: - `ValueError` – If no condenser is configured or the condenser doesn’t - handle condensation requests. +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically. Use `DockerWorkspace` with a pre-built +# image or `DockerDevWorkspace` to automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) -#### abstractmethod execute_tool() + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} -Execute a tool directly without going through the agent loop. + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() -This method allows executing tools before or outside of the normal -conversation.run() flow. It handles agent initialization automatically, -so tools can be executed before the first run() call. + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) -Note: This method bypasses the agent loop, including confirmation -policies and security analyzer checks. Callers are responsible for -applying any safeguards before executing potentially destructive tools. + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") -This is useful for: -- Pre-run setup operations (e.g., indexing repositories) -- Manual tool execution for environment setup -- Testing tool behavior outside the agent loop + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") -* Parameters: - * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) - * `action` – The action to pass to the tool executor -* Returns: - The observation returned by the tool execution -* Raises: - * `KeyError` – If the tool is not found in the agent’s tools - * `NotImplementedError` – If the tool has no executor + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") -#### abstractmethod generate_title() + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") -Generate a title for the conversation based on the first user message. + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` -* Parameters: - * `llm` – Optional LLM to use for title generation. If not provided, - uses the agent’s LLM. - * `max_length` – Maximum length of the generated title. -* Returns: - A generated title for the conversation. -* Raises: - `ValueError` – If no user messages are found in the conversation. + -#### static get_persistence_dir() -Get the persistence directory for the conversation. +--- -* Parameters: - * `persistence_base_dir` – Base directory for persistence. Can be a string - path or Path object. - * `conversation_id` – Unique conversation ID. -* Returns: - String path to the conversation-specific persistence directory. - Always returns a normalized string path even if a Path was provided. +## 2) VS Code in Docker Sandbox -#### abstractmethod pause() +> A ready-to-run example is available [here](#ready-to-run-example-vs-code)! -#### abstractmethod reject_pending_actions() +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. -#### abstractmethod run() +### Key Concepts -Execute the agent to process messages and perform actions. +#### VS Code-Enabled DockerWorkspace -This method runs the agent until it finishes processing the current -message or reaches the maximum iteration limit. +The workspace is configured with extra ports for VS Code access: -#### abstractmethod send_message() - -Send a message to the agent. +```python icon="python" focus={1, 5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=18010, + platform="linux/arm64", # or "linux/amd64" depending on your architecture + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" +``` -* Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. +The `extra_ports=True` setting exposes: +- Port `host_port+1`: VS Code Web interface (host_port + 1) +- Port `host_port+2`: VNC viewer for visual access -#### abstractmethod set_confirmation_policy() +If you need to customize the agent-server image, swap in `DockerDevWorkspace` with the same parameters and provide `base_image`/`target` to build on demand. -Set the confirmation policy for the conversation. +#### VS Code URL Generation -#### abstractmethod set_security_analyzer() +The example retrieves the VS Code URL with authentication token: -Set the security analyzer for the conversation. +```python icon="python" +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` -#### abstractmethod update_secrets() +This generates a properly authenticated URL with the workspace directory pre-opened. -### class Conversation +#### VS Code URL Format -### class Conversation +```text +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` +where: +- `vscode_port`: Usually host_port + 1 (e.g., 8011) +- `token`: Authentication token for security +- `workspace_dir`: Workspace directory to open -Bases: `object` +### Ready-to-run Example VS Code -Factory class for creating conversation instances with OpenHands agents. + +This example is available on GitHub: [examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py) + -This factory automatically creates either a LocalConversation or RemoteConversation -based on the workspace type provided. LocalConversation runs the agent locally, -while RemoteConversation connects to a remote agent server. -* Returns: - LocalConversation if workspace is local, RemoteConversation if workspace - is remote. +```python icon="python" expandable examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py +import os +import platform +import time -#### Example +import httpx +from pydantic import SecretStr -```pycon ->>> from openhands.sdk import LLM, Agent, Conversation ->>> from openhands.sdk.plugin import PluginSource ->>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) ->>> agent = Agent(llm=llm, tools=[]) ->>> conversation = Conversation( -... agent=agent, -... workspace="./workspace", -... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], -... ) ->>> conversation.send_message("Hello!") ->>> conversation.run() -``` +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace -### class ConversationExecutionStatus -Bases: `str`, `Enum` +logger = get_logger(__name__) -Enum representing the current execution state of the conversation. +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -#### Methods +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -#### DELETING = 'deleting' -#### ERROR = 'error' +# Create a Docker-based remote workspace with extra ports for VSCode access +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" -#### FINISHED = 'finished' -#### IDLE = 'idle' +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" -#### PAUSED = 'paused' -#### RUNNING = 'running' +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=18010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" -#### STUCK = 'stuck' + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) -#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} -#### is_terminal() + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() -Check if this status represents a terminal state. + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) -Terminal states indicate the run has completed and the agent is no longer -actively processing. These are: FINISHED, ERROR, STUCK. + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() -Note: IDLE is NOT a terminal state - it’s the initial state of a conversation -before any run has started. Including IDLE would cause false positives when -the WebSocket delivers the initial state update during connection. + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" -* Returns: - True if this is a terminal status, False otherwise. + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + -### class ConversationState -Bases: `OpenHandsModel` +--- +## 3) Browser in Docker Sandbox +> A ready-to-run example is available [here](#ready-to-run-example-browser)! -#### Properties +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. -- `activated_knowledge_skills`: list[str] -- `agent`: AgentBase -- `agent_state`: dict[str, Any] -- `blocked_actions`: dict[str, str] -- `blocked_messages`: dict[str, str] -- `confirmation_policy`: ConfirmationPolicyBase -- `env_observation_persistence_dir`: str | None - Directory for persisting environment observation files. -- `events`: [EventLog](#class-eventlog) -- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) -- `id`: UUID -- `max_iterations`: int -- `persistence_dir`: str | None -- `secret_registry`: [SecretRegistry](#class-secretregistry) -- `security_analyzer`: SecurityAnalyzerBase | None -- `stats`: ConversationStats -- `stuck_detection`: bool -- `workspace`: BaseWorkspace +### Key Concepts -#### Methods +#### Browser-Enabled DockerWorkspace -#### acquire() +The workspace is configured with extra ports for browser access: -Acquire the lock. +```python icon="python" focus={1-5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` -* Parameters: - * `blocking` – If True, block until lock is acquired. If False, return - immediately. - * `timeout` – Maximum time to wait for lock (ignored if blocking=False). - -1 means wait indefinitely. -* Returns: - True if lock was acquired, False otherwise. - -#### block_action() - -Persistently record a hook-blocked action. - -#### block_message() - -Persistently record a hook-blocked user message. - -#### classmethod create() - -Create a new conversation state or resume from persistence. - -This factory method handles both new conversation creation and resumption -from persisted state. +The `extra_ports=True` setting exposes additional ports for: +- Port `host_port+1`: VS Code Web interface +- Port `host_port+2`: VNC viewer for browser visualization -New conversation: -The provided Agent is used directly. Pydantic validation happens via the -cls() constructor. +If you need to pre-build a custom browser image, replace `DockerWorkspace` with `DockerDevWorkspace` and provide `base_image`/`target` to build before launch. -Restored conversation: -The provided Agent is validated against the persisted agent using -agent.load(). Tools must match (they may have been used in conversation -history), but all other configuration can be freely changed: LLM, -agent_context, condenser, system prompts, etc. -* Parameters: - * `id` – Unique conversation identifier - * `agent` – The Agent to use (tools must match persisted on restore) - * `workspace` – Working directory for agent operations - * `persistence_dir` – Directory for persisting state and events - * `max_iterations` – Maximum iterations per run - * `stuck_detection` – Whether to enable stuck detection - * `cipher` – Optional cipher for encrypting/decrypting secrets in - persisted state. If provided, secrets are encrypted when - saving and decrypted when loading. If not provided, secrets - are redacted (lost) on serialization. -* Returns: - ConversationState ready for use -* Raises: - * `ValueError` – If conversation ID or tools mismatch on restore - * `ValidationError` – If agent or other fields fail Pydantic validation +#### Enabling Browser Tools -#### static get_unmatched_actions() +Browser tools are enabled by setting `cli_mode=False`: -Find actions in the event history that don’t have matching observations. +```python icon="python" focus={2, 4} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` -This method identifies ActionEvents that don’t have corresponding -ObservationEvents or UserRejectObservations, which typically indicates -actions that are pending confirmation or execution. +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. -* Parameters: - `events` – List of events to search through -* Returns: - List of ActionEvent objects that don’t have corresponding observations, - in chronological order +When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: + -#### locked() +#### VNC Access -Return True if the lock is currently held by any thread. +The VNC interface provides real-time visual access to the browser: -#### model_config = (configuration object) +```text +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `autoconnect=1`: Automatically connect to VNC server +- `resize=remote`: Automatically adjust resolution -#### model_post_init() +--- -This function is meant to behave like a BaseModel method to initialise private attributes. +### Ready-to-run Example Browser -It takes context as an argument since that’s what pydantic-core passes when calling it. + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +This example shows how to configure `DockerWorkspace` with browser capabilities and VNC access: -#### owned() +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time -Return True if the lock is currently held by the calling thread. +from pydantic import SecretStr -#### pop_blocked_action() +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace -Remove and return a hook-blocked action reason, if present. -#### pop_blocked_message() +logger = get_logger(__name__) -Remove and return a hook-blocked message reason, if present. +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -#### release() +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -Release the lock. -* Raises: - `RuntimeError` – If the current thread doesn’t own the lock. +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" -#### set_on_state_change() -Set a callback to be called when state changes. +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" -* Parameters: - `callback` – A function that takes an Event (ConversationStateUpdateEvent) - or None to remove the callback -### class ConversationVisualizerBase +# Create a Docker-based remote workspace with extra ports for browser access. +# Use `DockerWorkspace` with a pre-built image or `DockerDevWorkspace` to +# automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=8011, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" -Bases: `ABC` + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) -Base class for conversation visualizers. + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} -This abstract base class defines the interface that all conversation visualizers -must implement. Visualizers can be created before the Conversation is initialized -and will be configured with the conversation state automatically. + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() -The typical usage pattern: -1. Create a visualizer instance: + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) - viz = MyVisualizer() -1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) -2. Conversation automatically calls viz.initialize(state) to attach the state + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() -You can also pass the uninstantiated class if you don’t need extra args -: for initialization, and Conversation will create it: - : conv = Conversation(agent, visualizer=MyVisualizer) + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") -Conversation will then calls MyVisualizer() followed by initialize(state) + if os.getenv("CI"): + logger.info( + "CI environment detected; skipping interactive prompt and closing workspace." # noqa: E501 + ) + else: + # Wait for user confirm to exit when running locally + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + -#### Properties +## Next Steps -- `conversation_stats`: ConversationStats | None - Get conversation stats from the state. +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture -#### Methods +### Local Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server.md -#### __init__() +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Initialize the visualizer base. +> A ready-to-run example is available [here](#ready-to-run-example)! -#### create_sub_visualizer() +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using `RemoteConversation`. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. -Create a visualizer for a sub-agent during delegation. +## Key Concepts -Override this method to support sub-agent visualization in multi-agent -delegation scenarios. The sub-visualizer will be used to display events -from the spawned sub-agent. +### Managed API Server -By default, returns None which means sub-agents will not have visualization. -Subclasses that support delegation (like DelegationVisualizer) should -override this method to create appropriate sub-visualizers. +The ready-to-run example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: -* Parameters: - `agent_id` – The identifier of the sub-agent being spawned -* Returns: - A visualizer instance for the sub-agent, or None if sub-agent - visualization is not supported +```python icon="python" focus={1, 2, 4, 5} +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __enter__(self): + """Start the API server subprocess.""" + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) +``` -#### final initialize() +The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. -Initialize the visualizer with conversation state. +### Remote Workspace -This method is called by Conversation after the state is created, -allowing the visualizer to access conversation stats and other -state information. +When connecting to a remote server, you need to provide a `Workspace` that connects to that server: -Subclasses should not override this method, to ensure the state is set. +```python icon="python" +workspace = Workspace(host=server.base_url) +result = workspace.execute_command("pwd") +``` -* Parameters: - `state` – The conversation state object +When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). +The `Workspace` object communicates with the remote server's API to execute commands and manage files. -#### abstractmethod on_event() +### RemoteConversation -Handle a conversation event. +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): -This method is called for each event in the conversation and should -implement the visualization logic. +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` -* Parameters: - `event` – The event to visualize +`RemoteConversation` handles communication with the remote agent server over WebSocket for real-time event streaming. -### class DefaultConversationVisualizer +### Event Callbacks -Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) +Callbacks receive events in real-time as they happen on the remote server: -Handles visualization of conversation events with Rich formatting. +```python icon="python" +def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() +``` -Provides Rich-formatted output with semantic dividers and complete content display. +This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. -#### Methods +### Conversation State -#### __init__() +The conversation state provides access to all events and status: -Initialize the visualizer. +```python icon="python" +# Count total events using state.events +total_events = len(conversation.state.events) +logger.info(f"📈 Total events in conversation: {total_events}") -* Parameters: - * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles - for highlighting keywords in the visualizer. - For example: (configuration object) - * `skip_user_messages` – If True, skip displaying user messages. Useful for - scenarios where user input is not relevant to show. +# Get recent events (last 5) using state.events +all_events = conversation.state.events +recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +``` -#### on_event() +This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. -Main event handler that displays events with Rich formatting. +## Ready-to-run Example -### class EventLog + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + -Bases: [`EventsListBase`](#class-eventslistbase) +This example shows how to programmatically start a local agent server and interact with it through a `RemoteConversation`: -Persistent event log with locking for concurrent writes. +```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py +import os +import subprocess +import sys +import threading +import time -This class provides thread-safe and process-safe event storage using -the FileStore’s locking mechanism. Events are persisted to disk and -can be accessed by index or event ID. +from pydantic import SecretStr -#### Methods +from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger +from openhands.sdk.event import ConversationStateUpdateEvent +from openhands.tools.preset.default import get_default_agent -#### NOTE -For LocalFileStore, file locking via flock() does NOT work reliably -on NFS mounts or network filesystems. Users deploying with shared -storage should use alternative coordination mechanisms. -#### __init__() +logger = get_logger(__name__) -#### append() -Append an event with locking for thread/process safety. +def _stream_output(stream, prefix, target_stream): + """Stream output from subprocess to target stream with prefix.""" + try: + for line in iter(stream.readline, ""): + if line: + target_stream.write(f"[{prefix}] {line}") + target_stream.flush() + except Exception as e: + print(f"Error streaming {prefix}: {e}", file=sys.stderr) + finally: + stream.close() -* Raises: - * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. - * `ValueError` – If an event with the same ID already exists. -#### get_id() +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" -Return the event_id for a given index. + def __init__(self, port: int = 8000, host: str = "127.0.0.1"): + self.port: int = port + self.host: str = host + self.process: subprocess.Popen[str] | None = None + self.base_url: str = f"http://{host}:{port}" + self.stdout_thread: threading.Thread | None = None + self.stderr_thread: threading.Thread | None = None -#### get_index() + def __enter__(self): + """Start the API server subprocess.""" + print(f"Starting OpenHands API server on {self.base_url}...") -Return the integer index for a given event_id. + # Start the server process + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) -### class EventsListBase + # Start threads to stream stdout and stderr + assert self.process is not None + assert self.process.stdout is not None + assert self.process.stderr is not None + self.stdout_thread = threading.Thread( + target=_stream_output, + args=(self.process.stdout, "SERVER", sys.stdout), + daemon=True, + ) + self.stderr_thread = threading.Thread( + target=_stream_output, + args=(self.process.stderr, "SERVER", sys.stderr), + daemon=True, + ) -Bases: `Sequence`[`Event`], `ABC` + self.stdout_thread.start() + self.stderr_thread.start() -Abstract base class for event lists that can be appended to. + # Wait for server to be ready + max_retries = 30 + for i in range(max_retries): + try: + import httpx -This provides a common interface for both local EventLog and remote -RemoteEventsList implementations, avoiding circular imports in protocols. + response = httpx.get(f"{self.base_url}/health", timeout=1.0) + if response.status_code == 200: + print(f"API server is ready at {self.base_url}") + return self + except Exception: + pass -#### Methods + assert self.process is not None + if self.process.poll() is not None: + # Process has terminated + raise RuntimeError( + "Server process terminated unexpectedly. " + "Check the server logs above for details." + ) -#### abstractmethod append() + time.sleep(1) -Add a new event to the list. + raise RuntimeError(f"Server failed to start after {max_retries} seconds") -### class LocalConversation + def __exit__(self, exc_type, exc_val, exc_tb): + """Stop the API server subprocess.""" + if self.process: + print("Stopping API server...") + self.process.terminate() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + print("Force killing API server...") + self.process.kill() + self.process.wait() -Bases: [`BaseConversation`](#class-baseconversation) + # Wait for streaming threads to finish (they're daemon threads, + # so they'll stop automatically) + # But give them a moment to flush any remaining output + time.sleep(0.5) + print("API server stopped.") -#### Properties +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -- `agent`: AgentBase -- `delete_on_close`: bool = True -- `id`: UUID - Get the unique ID of the conversation. -- `llm_registry`: LLMRegistry -- `max_iteration_per_run`: int -- `resolved_plugins`: list[ResolvedPluginSource] | None - Get the resolved plugin sources after plugins are loaded. - Returns None if plugins haven’t been loaded yet, or if no plugins - were specified. Use this for persistence to ensure conversation - resume uses the exact same plugin versions. -- `state`: [ConversationState](#class-conversationstate) - Get the conversation state. - It returns a protocol that has a subset of ConversationState methods - and properties. We will have the ability to access the same properties - of ConversationState on a remote conversation object. - But we won’t be able to access methods that mutate the state. -- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None - Get the stuck detector instance if enabled. -- `workspace`: LocalWorkspace +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +title_gen_llm = LLM( + usage_id="title-gen-llm", + model=os.getenv("LLM_MODEL", "openhands/gpt-5-mini-2025-08-07"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) -#### Methods +# Use managed API server +with ManagedAPIServer(port=8001) as server: + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, # Disable browser tools for simplicity + ) -#### __init__() + # Define callbacks to test the WebSocket functionality + received_events = [] + event_tracker = {"last_event_time": time.time()} -Initialize the conversation. + def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() -* Parameters: - * `agent` – The agent to use for the conversation. - * `workspace` – Working directory for agent operations and tool execution. - Can be a string path, Path object, or LocalWorkspace instance. - * `plugins` – Optional list of plugins to load. Each plugin is specified - with a source (github:owner/repo, git URL, or local path), - optional ref (branch/tag/commit), and optional repo_path for - monorepos. Plugins are loaded in order with these merge - semantics: skills override by name (last wins), MCP config - override by key (last wins), hooks concatenate (all run). - * `persistence_dir` – Directory for persisting conversation state and events. - Can be a string path or Path object. - * `conversation_id` – Optional ID for the conversation. If provided, will - be used to identify the conversation. The user might want to - suffix their persistent filestore with this ID. - * `callbacks` – Optional list of callback functions to handle events - * `token_callbacks` – Optional list of callbacks invoked for streaming deltas - * `hook_config` – Optional hook configuration to auto-wire session hooks. - If plugins are loaded, their hooks are combined with this config. - * `max_iteration_per_run` – Maximum number of iterations per run - * `visualizer` – + # Create RemoteConversation with callbacks + # NOTE: Workspace is required for RemoteConversation + workspace = Workspace(host=server.base_url) + result = workspace.execute_command("pwd") + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") - Visualization configuration. Can be: - - ConversationVisualizerBase subclass: Class to instantiate - > (default: ConversationVisualizer) - - ConversationVisualizerBase instance: Use custom visualizer - - None: No visualization - * `stuck_detection` – Whether to enable stuck detection - * `stuck_detection_thresholds` – Optional configuration for stuck detection - thresholds. Can be a StuckDetectionThresholds instance or - a dict with keys: ‘action_observation’, ‘action_error’, - ‘monologue’, ‘alternating_pattern’. Values are integers - representing the number of repetitions before triggering. - * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted - state. If provided, secrets are encrypted when saving and - decrypted when loading. If not provided, secrets are redacted - (lost) on serialization. + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) -#### ask_agent() + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") -Ask the agent a simple, stateless question and get a direct LLM response. + # Send first message and run + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. + # Generate title using a specific LLM + title = conversation.generate_title(max_length=60, llm=title_gen_llm) + logger.info(f"Generated conversation title: {title}") -* Parameters: - `question` – A simple string question to ask the agent -* Returns: - A string response from the agent + logger.info("🚀 Running conversation...") + conversation.run() -#### close() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") -Close the conversation and clean up all tool executors. + # Wait for events to stop coming (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - event_tracker["last_event_time"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") -#### condense() + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") -Synchronously force condense the conversation history. + # Demonstrate state.events functionality + logger.info("\n" + "=" * 50) + logger.info("📊 Demonstrating State Events API") + logger.info("=" * 50) -If the agent is currently running, condense() will wait for the -ongoing step to finish before proceeding. + # Count total events using state.events + total_events = len(conversation.state.events) + logger.info(f"📈 Total events in conversation: {total_events}") -Raises ValueError if no compatible condenser exists. + # Get recent events (last 5) using state.events + logger.info("\n🔍 Getting last 5 events using state.events...") + all_events = conversation.state.events + recent_events = all_events[-5:] if len(all_events) >= 5 else all_events -#### property conversation_stats + for i, event in enumerate(recent_events, 1): + event_type = type(event).__name__ + timestamp = getattr(event, "timestamp", "Unknown") + logger.info(f" {i}. {event_type} at {timestamp}") -#### execute_tool() + # Let's see what the actual event types are + logger.info("\n🔍 Event types found:") + event_types = set() + for event in recent_events: + event_type = type(event).__name__ + event_types.add(event_type) + for event_type in sorted(event_types): + logger.info(f" - {event_type}") -Execute a tool directly without going through the agent loop. + # Print all ConversationStateUpdateEvent + logger.info("\n🗂️ ConversationStateUpdateEvent events:") + for event in conversation.state.events: + if isinstance(event, ConversationStateUpdateEvent): + logger.info(f" - {event}") -This method allows executing tools before or outside of the normal -conversation.run() flow. It handles agent initialization automatically, -so tools can be executed before the first run() call. + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") -Note: This method bypasses the agent loop, including confirmation -policies and security analyzer checks. Callers are responsible for -applying any safeguards before executing potentially destructive tools. + finally: + # Clean up + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` -This is useful for: -- Pre-run setup operations (e.g., indexing repositories) -- Manual tool execution for environment setup -- Testing tool behavior outside the agent loop + -* Parameters: - * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) - * `action` – The action to pass to the tool executor -* Returns: - The observation returned by the tool execution -* Raises: - * `KeyError` – If the tool is not found in the agent’s tools - * `NotImplementedError` – If the tool has no executor +## Next Steps -#### generate_title() +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture -Generate a title for the conversation based on the first user message. +### Overview +Source: https://docs.openhands.dev/sdk/guides/agent-server/overview.md -* Parameters: - * `llm` – Optional LLM to use for title generation. If not provided, - uses self.agent.llm. - * `max_length` – Maximum length of the generated title. -* Returns: - A generated title for the conversation. -* Raises: - `ValueError` – If no user messages are found in the conversation. +Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. -#### pause() -Pause agent execution. +For example, switching from a local workspace to a Docker‑based remote agent server: -This method can be called from any thread to request that the agent -pause execution. The pause will take effect at the next iteration -of the run loop (between agent steps). +```python icon="python" lines +# Local → Docker +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import DockerWorkspace # [!code ++] +with DockerWorkspace( # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` -Note: If called during an LLM completion, the pause will not take -effect until the current LLM call completes. +Use `DockerWorkspace` with the pre-built agent server image for the fastest startup. When you need to build from a custom base image, switch to [`DockerDevWorkspace`](/sdk/guides/agent-server/docker-sandbox). -#### reject_pending_actions() +Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): -Reject all pending actions from the agent. +```python icon="python" lines +# Local → Remote API +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import APIRemoteWorkspace # [!code ++] +with APIRemoteWorkspace( # [!code ++] + runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] + runtime_api_key="YOUR_API_KEY", # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` -This is a non-invasive method to reject actions between run() calls. -Also clears the agent_waiting_for_confirmation flag. -#### run() +## What is a Remote Agent Server? -Runs the conversation until the agent finishes. +A Remote Agent Server is an HTTP/WebSocket server that: +- **Package the Software Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) +- **Runs agents** on dedicated infrastructure +- **Manages workspaces** (Docker containers or remote sandboxes) +- **Streams events** to clients via WebSocket +- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details +- **Provides isolation** between different agent executions -In confirmation mode: -- First call: creates actions but doesn’t execute them, stops and waits -- Second call: executes pending actions (implicit confirmation) +Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. -In normal mode: -- Creates and executes actions immediately +{/* +Same interfaces as local: +[BaseConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[ConversationStateProtocol](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[EventsListBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: +[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). + */} -Can be paused between steps - -#### send_message() -Send a message to the agent. +## Architecture Overview -* Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. +Remote Agent Servers follow a simple three-part architecture: -#### set_confirmation_policy() +```mermaid +graph TD + Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] + Server --> Workspace[Workspace] -Set the confirmation policy and store it in conversation state. + subgraph Workspace Types + Workspace --> Local[Local Folder] + Workspace --> Docker[Docker Container] + Workspace --> API[Remote Sandbox via API] + end -#### set_security_analyzer() + Local --> Files[File System] + Docker --> Container[Isolated Runtime] + API --> Cloud[Cloud Infrastructure] -Set the security analyzer for the conversation. + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Workspace fill:#e8f5e8 +``` -#### update_secrets() +1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. +2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. +3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. -Add secrets to the conversation. +The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. -* Parameters: - `secrets` – Dictionary mapping secret keys to values or no-arg callables. - SecretValue = str | Callable[[], str]. Callables are invoked lazily - when a command references the secret key. +## How Remote Conversations Work -### class RemoteConversation +Each step in the diagram maps directly to how the SDK and server interact: -Bases: [`BaseConversation`](#class-baseconversation) +### 1. Workspace Connection → *(Client → Server)* +When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: -#### Properties +```python icon="python" +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) +``` -- `agent`: AgentBase -- `delete_on_close`: bool = False -- `id`: UUID -- `max_iteration_per_run`: int -- `state`: RemoteState - Access to remote conversation state. -- `workspace`: RemoteWorkspace +This turns the local `Conversation` into a **[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. -#### Methods -#### __init__() +### 2. Server Initialization → *(Server → Workspace)* -Remote conversation proxy that talks to an agent server. +Once the workspace starts: +- It launches the agent server process. +- Waits for it to be ready. +- Shares the server URL with the SDK client. -* Parameters: - * `agent` – Agent configuration (will be sent to the server) - * `workspace` – The working directory for agent operations and tool execution. - * `plugins` – Optional list of plugins to load on the server. Each plugin - is a PluginSource specifying source, ref, and repo_path. - * `conversation_id` – Optional existing conversation id to attach to - * `callbacks` – Optional callbacks to receive events (not yet streamed) - * `max_iteration_per_run` – Max iterations configured on server - * `stuck_detection` – Whether to enable stuck detection on server - * `stuck_detection_thresholds` – Optional configuration for stuck detection - thresholds. Can be a StuckDetectionThresholds instance or - a dict with keys: ‘action_observation’, ‘action_error’, - ‘monologue’, ‘alternating_pattern’. Values are integers - representing the number of repetitions before triggering. - * `hook_config` – Optional hook configuration for session hooks - * `visualizer` – +You don’t need to manage this manually—the workspace context handles startup and teardown automatically. - Visualization configuration. Can be: - - ConversationVisualizerBase subclass: Class to instantiate - > (default: ConversationVisualizer) - - ConversationVisualizerBase instance: Use custom visualizer - - None: No visualization - * `secrets` – Optional secrets to initialize the conversation with +### 3. Event Streaming → *(Bidirectional WebSocket)* -#### ask_agent() +The client and agent server maintain a live WebSocket connection for streaming events: -Ask the agent a simple, stateless question and get a direct LLM response. +```python icon="python" +def on_event(event): + print(f"Received: {type(event).__name__}") -This bypasses the normal conversation flow and does not modify, persist, -or become part of the conversation state. The request is not remembered by -the main agent, no events are recorded, and execution status is untouched. -It is also thread-safe and may be called while conversation.run() is -executing in another thread. +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[on_event], +) +``` -* Parameters: - `question` – A simple string question to ask the agent -* Returns: - A string response from the agent +This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. -#### close() +### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* -Close the conversation and clean up resources. +Workspace supports file and command operations via the agent server API ([base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: -Note: We don’t close self._client here because it’s shared with the workspace. -The workspace owns the client and will close it during its own cleanup. -Closing it here would prevent the workspace from making cleanup API calls. +```python icon="python" +workspace.file_upload(local_path, remote_path) +workspace.file_download(remote_path, local_path) +result = workspace.execute_command("ls -la") +print(result.stdout) +``` -#### condense() +These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. -Force condensation of the conversation history. +### Summary -This method sends a condensation request to the remote agent server. -The server will use the existing condensation request pattern to trigger -condensation if a condenser is configured and handles condensation requests. +The architecture makes remote execution seamless: +- Your **client code** stays the same. +- The **agent server** manages execution and streaming. +- The **workspace** provides secure, isolated runtime environments. -The condensation will be applied on the server side and will modify the -conversation state by adding a condensation event to the history. +Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. -* Raises: - `HTTPError` – If the server returns an error (e.g., no condenser configured). +## Next Steps -#### property conversation_stats +Explore different deployment options: -#### execute_tool() +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Run agent server in the same process +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run agent server in isolated Docker containers +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted agent server via API -Execute a tool directly without going through the agent loop. +For architectural details: +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment -Note: This method is not yet supported for RemoteConversation. -Tool execution for remote conversations happens on the server side -during the normal agent loop. +### Stuck Detector +Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md -* Parameters: - * `tool_name` – The name of the tool to execute - * `action` – The action to pass to the tool executor -* Raises: - `NotImplementedError` – Always, as this feature is not yet supported - for remote conversations. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### generate_title() +> A ready-to-run example is available [here](#ready-to-run-example)! -Generate a title for the conversation based on the first user message. +The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: -* Parameters: - * `llm` – Optional LLM to use for title generation. If provided, its usage_id - will be sent to the server. If not provided, uses the agent’s LLM. - * `max_length` – Maximum length of the generated title. -* Returns: - A generated title for the conversation. +1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) +2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) +3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) +4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) +5. **Context Window Errors**: Repeated context window errors that indicate memory management issues -#### pause() +When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. -#### reject_pending_actions() + + For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). + -#### run() -Trigger a run on the server. +## How It Works -* Parameters: - * `blocking` – If True (default), wait for the run to complete by polling - the server. If False, return immediately after triggering the run. - * `poll_interval` – Time in seconds between status polls (only used when - blocking=True). Default is 1.0 second. - * `timeout` – Maximum time in seconds to wait for the run to complete - (only used when blocking=True). Default is 3600 seconds. -* Raises: - `ConversationRunError` – If the run fails or times out. +In the [ready-to-run example](#ready-to-run-example), the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` +command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: -#### send_message() +1. The conversation proceeds normally until the agent starts repeating actions +2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck +3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action -Send a message to the agent. +The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the +stuck status at any point using `conversation.stuck_detector.is_stuck()`. -* Parameters: - * `message` – Either a string (which will be converted to a user message) - or a Message object - * `sender` – Optional identifier of the sender. Can be used to track - message origin in multi-agent scenarios. For example, when - one agent delegates to another, the sender can be set to - identify which agent is sending the message. +## Pattern Detection -#### set_confirmation_policy() +The stuck detector compares events based on their semantic content rather than object identity. For example: +- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) +- **Observations** are compared by their observation content and tool name +- **Errors** are compared by their error messages +- **Messages** are compared by their content and source -Set the confirmation policy for the conversation. +This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. -#### set_security_analyzer() +## Ready-to-run Example -Set the security analyzer for the remote conversation. + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + -#### property stuck_detector -Stuck detector for compatibility. -Not implemented for remote conversations. +```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py +import os -#### update_secrets() +from pydantic import SecretStr -### class SecretRegistry +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent -Bases: `OpenHandsModel` -Manages secrets and injects them into bash commands when needed. +logger = get_logger(__name__) -The secret registry stores a mapping of secret keys to SecretSources -that retrieve the actual secret values. When a bash command is about to be -executed, it scans the command for any secret keys and injects the corresponding -environment variables. - -Secret sources will redact / encrypt their sensitive values as appropriate when -serializing, depending on the content of the context. If a context is present -and contains a ‘cipher’ object, this is used for encryption. If it contains a -boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. -Otherwise secrets are redacted. +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -Additionally, it tracks the latest exported values to enable consistent masking -even when callable secrets fail on subsequent calls. +agent = get_default_agent(llm=llm) +llm_messages = [] -#### Properties -- `secret_sources`: dict[str, SecretSource] +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -#### Methods -#### find_secrets_in_text() +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) -Find all secret keys mentioned in the given text. +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) -* Parameters: - `text` – The text to search for secret keys -* Returns: - Set of secret keys found in the text +# Run the conversation - stuck detection happens automatically +conversation.run() -#### get_secrets_as_env_vars() +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") -Get secrets that should be exported as environment variables for a command. +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -* Parameters: - `command` – The bash command to check for secret references -* Returns: - Dictionary of environment variables to export (key -> value) +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -#### mask_secrets_in_output() + -Mask secret values in the given text. -This method uses both the current exported values and attempts to get -fresh values from callables to ensure comprehensive masking. +## Next Steps -* Parameters: - `text` – The text to mask secrets in -* Returns: - Text with secret values replaced by `` +- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control +- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK -#### model_config = (configuration object) +### Theory of Mind (TOM) Agent +Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent.md -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### model_post_init() +## Overview -This function is meant to behave like a BaseModel method to initialise private attributes. +Tom (Theory of Mind) Agent provides advanced user understanding capabilities that help your agent interpret vague instructions and adapt to user preferences over time. Built on research in user mental modeling, Tom agents can: -It takes context as an argument since that’s what pydantic-core passes when calling it. +- Understand unclear or ambiguous user requests +- Provide personalized guidance based on user modeling +- Build long-term user preference profiles +- Adapt responses based on conversation history -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. +This is particularly useful when: +- User instructions are vague or incomplete +- You need to infer user intent from minimal context +- Building personalized experiences across multiple conversations +- Understanding user preferences and working patterns -#### update_secrets() +## Research Foundation -Add or update secrets in the manager. +Tom agent is based on the TOM-SWE research paper on user mental modeling for software engineering agents: -* Parameters: - `secrets` – Dictionary mapping secret keys to either string values - or callable functions that return string values +```bibtex Citation +@misc{zhou2025tomsweusermentalmodeling, + title={TOM-SWE: User Mental Modeling For Software Engineering Agents}, + author={Xuhui Zhou and Valerie Chen and Zora Zhiruo Wang and Graham Neubig and Maarten Sap and Xingyao Wang}, + year={2025}, + eprint={2510.21903}, + archivePrefix={arXiv}, + primaryClass={cs.SE}, + url={https://arxiv.org/abs/2510.21903}, +} +``` -### class StuckDetector + +Paper: [TOM-SWE on arXiv](https://arxiv.org/abs/2510.21903) + -Bases: `object` +## Quick Start -Detects when an agent is stuck in repetitive or unproductive patterns. + +This example is available on GitHub: [examples/01_standalone_sdk/30_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/30_tom_agent.py) + -This detector analyzes the conversation history to identify various stuck patterns: -1. Repeating action-observation cycles -2. Repeating action-error cycles -3. Agent monologue (repeated messages without user input) -4. Repeating alternating action-observation patterns -5. Context window errors indicating memory issues +```python icon="python" expandable examples/01_standalone_sdk/30_tom_agent.py +"""Example demonstrating Tom agent with Theory of Mind capabilities. +This example shows how to set up an agent with Tom tools for getting +personalized guidance based on user modeling. Tom tools include: +- TomConsultTool: Get guidance for vague or unclear tasks +- SleeptimeComputeTool: Index conversations for user modeling +""" -#### Properties +import os -- `action_error_threshold`: int -- `action_observation_threshold`: int -- `alternating_pattern_threshold`: int -- `monologue_threshold`: int -- `state`: [ConversationState](#class-conversationstate) -- `thresholds`: StuckDetectionThresholds +from pydantic import SecretStr -#### Methods +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.tool import Tool +from openhands.tools.preset.default import get_default_tools +from openhands.tools.tom_consult import ( + SleeptimeComputeAction, + SleeptimeComputeObservation, + SleeptimeComputeTool, + TomConsultTool, +) -#### __init__() -#### is_stuck() +# Configure LLM +api_key: str | None = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." -Check if the agent is currently stuck. +llm: LLM = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), + usage_id="agent", + drop_params=True, +) -Note: To avoid materializing potentially large file-backed event histories, -only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. -If a user message exists within this window, only events after it are checked. -Otherwise, all events in the window are analyzed. +# Build tools list with Tom tools +# Note: Tom tools are automatically registered on import (PR #862) +tools = get_default_tools(enable_browser=False) -#### __init__() +# Configure Tom tools with parameters +tom_params: dict[str, bool | str] = { + "enable_rag": True, # Enable RAG in Tom agent +} -### openhands.sdk.event -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md +# Add LLM configuration for Tom tools (uses same LLM as main agent) +tom_params["llm_model"] = llm.model +if llm.api_key: + if isinstance(llm.api_key, SecretStr): + tom_params["api_key"] = llm.api_key.get_secret_value() + else: + tom_params["api_key"] = llm.api_key +if llm.base_url: + tom_params["api_base"] = llm.base_url -### class ActionEvent +# Add both Tom tools to the agent +tools.append(Tool(name=TomConsultTool.name, params=tom_params)) +tools.append(Tool(name=SleeptimeComputeTool.name, params=tom_params)) -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +# Create agent with Tom capabilities +# This agent can consult Tom for personalized guidance +# Note: Tom's user modeling data will be stored in ~/.openhands/ +agent: Agent = Agent(llm=llm, tools=tools) +# Start conversation +cwd: str = os.getcwd() +PERSISTENCE_DIR = os.path.expanduser("~/.openhands") +CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations") +conversation = Conversation( + agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR +) -#### Properties +# Optionally run sleeptime compute to index existing conversations +# This builds user preferences and patterns from conversation history +# Using execute_tool allows running tools before conversation.run() +print("\nRunning sleeptime compute to index conversations...") +try: + sleeptime_result = conversation.execute_tool( + "sleeptime_compute", SleeptimeComputeAction() + ) + # Cast to the expected observation type for type-safe access + if isinstance(sleeptime_result, SleeptimeComputeObservation): + print(f"Result: {sleeptime_result.message}") + print(f"Sessions processed: {sleeptime_result.sessions_processed}") + else: + print(f"Result: {sleeptime_result.text}") +except KeyError as e: + print(f"Tool not available: {e}") -- `action`: Action | None -- `critic_result`: CriticResult | None -- `llm_response_id`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `reasoning_content`: str | None -- `responses_reasoning_item`: ReasoningItemModel | None -- `security_risk`: SecurityRisk -- `source`: Literal['agent', 'user', 'environment'] -- `summary`: str | None -- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] -- `thought`: Sequence[TextContent] -- `tool_call`: MessageToolCall -- `tool_call_id`: str -- `tool_name`: str -- `visualize`: Text - Return Rich Text representation of this action event. +# Send a potentially vague message where Tom consultation might help +conversation.send_message( + "I need to debug some code but I'm not sure where to start. " + + "Can you help me figure out the best approach?" +) +conversation.run() -#### Methods +print("\n" + "=" * 80) +print("Tom agent consultation example completed!") +print("=" * 80) -#### to_llm_message() +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") -Individual message - may be incomplete for multi-action batches -### class AgentErrorEvent +# Optional: Index this conversation for Tom's user modeling +# This builds user preferences and patterns from conversation history +# Uncomment the lines below to index the conversation: +# +# conversation.send_message("Please index this conversation using sleeptime_compute") +# conversation.run() +# print("\nConversation indexed for user modeling!") -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -Error triggered by the agent. + -Note: This event should not contain model “thought” or “reasoning_content”. It -represents an error produced by the agent/scaffold, not model output. +## Tom Tools +### TomConsultTool -#### Properties +The consultation tool provides personalized guidance when the agent encounters vague or unclear user requests: -- `error`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `visualize`: Text - Return Rich Text representation of this agent error event. +```python icon="python" +# The agent can automatically call this tool when needed +# Example: User says "I need to debug something" +# Tom analyzes the vague request and provides specific guidance +``` -#### Methods +Key features: +- Analyzes conversation history for context +- Provides personalized suggestions based on user modeling +- Helps disambiguate vague instructions +- Adapts to user communication patterns -#### to_llm_message() +### SleeptimeComputeTool -### class Condensation +The indexing tool processes conversation history to build user preference profiles: -Bases: [`Event`](#class-event) +```python icon="python" +# Index conversations for future personalization +sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute") +if sleeptime_compute_tool: + result = sleeptime_compute_tool.executor( + SleeptimeComputeAction(), conversation + ) +``` -This action indicates a condensation of the conversation history is happening. +Key features: +- Processes conversation history into user models +- Stores preferences in `~/.openhands/` directory +- Builds understanding of user patterns over time +- Enables long-term personalization across sessions +## Configuration -#### Properties +### RAG Support -- `forgotten_event_ids`: list[[EventID](#class-eventid)] -- `has_summary_metadata`: bool - Checks if both summary and summary_offset are present. -- `llm_response_id`: [EventID](#class-eventid) -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `summary`: str | None -- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) - Generates a CondensationSummaryEvent. - Since summary events are not part of the main event store and are generated - dynamically, this property ensures the created event has a unique and consistent - ID based on the condensation event’s ID. - * Raises: - `ValueError` – If no summary is present. -- `summary_offset`: int | None -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. +Enable retrieval-augmented generation for enhanced context awareness: -#### Methods +```python icon="python" +tom_params = { + "enable_rag": True, # Enable RAG for better context retrieval +} +``` -#### apply() +### Custom LLM for Tom -Applies the condensation to a list of events. +You can optionally use a different LLM for Tom's internal reasoning: -This method removes events that are marked to be forgotten and returns a new -list of events. If the summary metadata is present (both summary and offset), -the corresponding CondensationSummaryEvent will be inserted at the specified -offset _after_ the forgotten events have been removed. +```python icon="python" +# Use the same LLM as main agent +tom_params["llm_model"] = llm.model +tom_params["api_key"] = llm.api_key.get_secret_value() -### class CondensationRequest +# Or configure a separate LLM for Tom +tom_llm = LLM(model="gpt-4", api_key=SecretStr("different-key")) +tom_params["llm_model"] = tom_llm.model +tom_params["api_key"] = tom_llm.api_key.get_secret_value() +``` -Bases: [`Event`](#class-event) +## Data Storage -This action is used to request a condensation of the conversation history. +Tom stores user modeling data persistently in `~/.openhands/`: + + + + + + + + + + + + + + + -#### Properties +where +- `user_models/` stores user preference profiles, with each user having their own subdirectory containing `user_model.json` (the current user model). +- `conversations/` contains indexed conversation data -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. +This persistent storage enables Tom to: +- Remember user preferences across sessions +- Track which conversations have been indexed +- Build long-term understanding of user patterns -#### Methods +## Use Cases -#### action +### 1. Handling Vague Requests -The action type, namely ActionType.CONDENSATION_REQUEST. +When a user provides minimal information: -* Type: - str +```python icon="python" +conversation.send_message("Help me with that bug") +# Tom analyzes history to determine which bug and suggest approach +``` -### class CondensationSummaryEvent +### 2. Personalized Recommendations -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +Tom adapts suggestions based on past interactions: -This event represents a summary generated by a condenser. +```python icon="python" +# After multiple conversations, Tom learns: +# - User prefers minimal explanations +# - User typically works with Python +# - User values efficiency over verbosity +``` +### 3. Intent Inference -#### Properties +Understanding what the user really wants: -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: SourceType -- `summary`: str - The summary text. +```python icon="python" +conversation.send_message("Make it better") +# Tom infers from context what "it" is and how to improve it +``` -#### Methods +## Best Practices -#### to_llm_message() +1. **Enable RAG**: For better context awareness, always enable RAG: + ```python icon="python" + tom_params = {"enable_rag": True} + ``` -### class ConversationStateUpdateEvent +2. **Index Regularly**: Run sleeptime compute after important conversations to build better user models -Bases: [`Event`](#class-event) +3. **Provide Context**: Even with Tom, providing more context leads to better results -Event that contains conversation state updates. +4. **Monitor Data**: Check `~/.openhands/` periodically to understand what's being learned -This event is sent via websocket whenever the conversation state changes, -allowing remote clients to stay in sync without making REST API calls. +5. **Privacy Considerations**: Be aware that conversation data is stored locally for user modeling -All fields are serialized versions of the corresponding ConversationState fields -to ensure compatibility with websocket transmission. +## Next Steps +- **[Agent Delegation](/sdk/guides/agent-delegation)** - Combine Tom with sub-agents for complex workflows +- **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively +- **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights -#### Properties +### Browser Session Recording +Source: https://docs.openhands.dev/sdk/guides/browser-session-recording.md -- `key`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `value`: Any +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### Methods +> A ready-to-run example is available [here](#ready-to-run-example)! -#### classmethod from_conversation_state() +The browser session recording feature allows you to capture your agent's browser interactions and replay them later using [rrweb](https://github.com/rrweb-io/rrweb). This is useful for debugging, auditing, and understanding how your agent interacts with web pages. -Create a state update event from a ConversationState object. +## How It Works -This creates an event containing a snapshot of important state fields. +The recording feature uses rrweb to capture DOM mutations, mouse movements, scrolling, and other browser events. The recordings are saved as JSON files that can be replayed using rrweb-player or the online viewer. -* Parameters: - * `state` – The ConversationState to serialize - * `conversation_id` – The conversation ID for the event -* Returns: - A ConversationStateUpdateEvent with serialized state data +The [ready-to-run example](#ready-to-run-example) demonstrates: -#### classmethod validate_key() +1. **Starting a recording**: Use `browser_start_recording` to begin capturing browser events +2. **Browsing and interacting**: Navigate to websites and perform actions while recording +3. **Stopping the recording**: Use `browser_stop_recording` to stop and save the recording -#### classmethod validate_value() +The recording files are automatically saved to the persistence directory when the recording is stopped. -### class Event +## Replaying Recordings -Bases: `DiscriminatedUnionMixin`, `ABC` +After recording a session, you can replay it using: -Base class for all events. +- **rrweb-player**: A standalone player component - [GitHub](https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player) +- **Online viewer**: Upload your recording at [rrweb.io/demo](https://www.rrweb.io/) +## Ready-to-run Example -#### Properties + +This example is available on GitHub: [examples/01_standalone_sdk/38_browser_session_recording.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/38_browser_session_recording.py) + -- `id`: str -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `timestamp`: str -- `visualize`: Text - Return Rich Text representation of this event. - This is a fallback implementation for unknown event types. - Subclasses should override this method to provide specific visualization. -### class LLMCompletionLogEvent +```python icon="python" expandable examples/01_standalone_sdk/38_browser_session_recording.py +"""Browser Session Recording Example -Bases: [`Event`](#class-event) +This example demonstrates how to use the browser session recording feature +to capture and save a recording of the agent's browser interactions using rrweb. -Event containing LLM completion log data. - -When an LLM is configured with log_completions=True in a remote conversation, -this event streams the completion log data back to the client through WebSocket -instead of writing it to a file inside the Docker container. - - -#### Properties +The recording can be replayed later using rrweb-player to visualize the agent's +browsing session. -- `filename`: str -- `log_data`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `model_name`: str -- `source`: Literal['agent', 'user', 'environment'] -- `usage_id`: str -### class LLMConvertibleEvent +The recording will be automatically saved to the persistence directory when +browser_stop_recording is called. You can replay it with: + - rrweb-player: https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player + - Online viewer: https://www.rrweb.io/ +""" -Bases: [`Event`](#class-event), `ABC` +import json +import os -Base class for events that can be converted to LLM messages. +from pydantic import SecretStr +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.browser_use.definition import BROWSER_RECORDING_OUTPUT_DIR -#### Properties -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +logger = get_logger(__name__) -#### Methods +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -#### static events_to_messages() +# Tools - including browser tools with recording capability +cwd = os.getcwd() +tools = [ + Tool(name=BrowserToolSet.name), +] -Convert event stream to LLM message stream, handling multi-action batches +# Agent +agent = Agent(llm=llm, tools=tools) -#### abstractmethod to_llm_message() +llm_messages = [] # collect raw LLM messages -### class MessageEvent -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -Message from either agent or user. -This is originally the “MessageAction”, but it suppose not to be tool call. +# Create conversation with persistence_dir set to save browser recordings +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir="./.conversations", +) +# The prompt instructs the agent to: +# 1. Start recording the browser session +# 2. Browse to a website and perform some actions +# 3. Stop recording (auto-saves to file) +PROMPT = """ +Please complete the following task to demonstrate browser session recording: -#### Properties +1. First, use `browser_start_recording` to begin recording the browser session. -- `activated_skills`: list[str] -- `critic_result`: CriticResult | None -- `extended_content`: list[TextContent] -- `llm_message`: Message -- `llm_response_id`: str | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `reasoning_content`: str -- `sender`: str | None -- `source`: Literal['agent', 'user', 'environment'] -- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] - Return the Anthropic thinking blocks from the LLM message. -- `visualize`: Text - Return Rich Text representation of this message event. +2. Then navigate to https://docs.openhands.dev/ and: + - Get the page content + - Scroll down the page + - Get the browser state to see interactive elements -#### Methods +3. Next, navigate to https://docs.openhands.dev/openhands/usage/cli/installation and: + - Get the page content + - Scroll down to see more content -#### to_llm_message() +4. Finally, use `browser_stop_recording` to stop the recording. + Events are automatically saved. +""" -### class ObservationBaseEvent +print("=" * 80) +print("Browser Session Recording Example") +print("=" * 80) +print("\nTask: Record an agent's browser session and save it for replay") +print("\nStarting conversation with agent...\n") -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +conversation.send_message(PROMPT) +conversation.run() -Base class for anything as a response to a tool call. +print("\n" + "=" * 80) +print("Conversation finished!") +print("=" * 80) -Examples include tool execution, error, user reject. +# Check if the recording files were created +# Recordings are saved in BROWSER_RECORDING_OUTPUT_DIR/recording-{timestamp}/ +if os.path.exists(BROWSER_RECORDING_OUTPUT_DIR): + # Find recording subdirectories (they start with "recording-") + recording_dirs = sorted( + [ + d + for d in os.listdir(BROWSER_RECORDING_OUTPUT_DIR) + if d.startswith("recording-") + and os.path.isdir(os.path.join(BROWSER_RECORDING_OUTPUT_DIR, d)) + ] + ) + if recording_dirs: + # Process the most recent recording directory + latest_recording = recording_dirs[-1] + recording_path = os.path.join(BROWSER_RECORDING_OUTPUT_DIR, latest_recording) + json_files = sorted( + [f for f in os.listdir(recording_path) if f.endswith(".json")] + ) -#### Properties + print(f"\n✓ Recording saved to: {recording_path}") + print(f"✓ Number of files: {len(json_files)}") -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `tool_call_id`: str -- `tool_name`: str -### class ObservationEvent + # Count total events across all files + total_events = 0 + all_event_types: dict[int | str, int] = {} + total_size = 0 -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + for json_file in json_files: + filepath = os.path.join(recording_path, json_file) + file_size = os.path.getsize(filepath) + total_size += file_size + with open(filepath) as f: + events = json.load(f) -#### Properties + # Events are stored as a list in each file + if isinstance(events, list): + total_events += len(events) + for event in events: + event_type = event.get("type", "unknown") + all_event_types[event_type] = all_event_types.get(event_type, 0) + 1 -- `action_id`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `observation`: Observation -- `visualize`: Text - Return Rich Text representation of this observation event. + print(f" - {json_file}: {len(events)} events, {file_size} bytes") -#### Methods + print(f"✓ Total events: {total_events}") + print(f"✓ Total size: {total_size} bytes") + if all_event_types: + print(f"✓ Event types: {all_event_types}") -#### to_llm_message() + print("\nTo replay this recording, you can use:") + print( + " - rrweb-player: " + "https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player" + ) + else: + print(f"\n✗ No recording directories found in: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") +else: + print(f"\n✗ Observations directory not found: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") -### class PauseEvent +print("\n" + "=" * 100) +print("Conversation finished.") +print(f"Total LLM messages: {len(llm_messages)}") +print("=" * 100) -Bases: [`Event`](#class-event) +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"Conversation ID: {conversation.id}") +print(f"EXAMPLE_COST: {cost}") +``` -Event indicating that the agent execution was paused by user request. + +### Context Condenser +Source: https://docs.openhands.dev/sdk/guides/context-condenser.md -#### Properties +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `visualize`: Text - Return Rich Text representation of this pause event. -### class SystemPromptEvent +> A ready-to-run example is available [here](#ready-to-run-example)! -Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) +## What is a Context Condenser? -System prompt added by the agent. +A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: -The system prompt can optionally include dynamic context that varies between -conversations. When `dynamic_context` is provided, it is included as a -second content block in the same system message. Cache markers are NOT -applied here - they are applied by `LLM._apply_prompt_caching()` when -caching is enabled, ensuring provider-specific cache control is only added -when appropriate. +- **💰 Increased API Costs**: More tokens in the context means higher costs per API call +- **⏱️ Slower Response Times**: Larger contexts take longer to process +- **📉 Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information +The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. -#### Properties +## Default Implementation: `LLMSummarizingCondenser` -- `dynamic_context`: TextContent | None -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `source`: Literal['agent', 'user', 'environment'] -- `system_prompt`: TextContent -- `tools`: list[ToolDefinition] -- `visualize`: Text - Return Rich Text representation of this system prompt event. +OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. -#### Methods +### How It Works -#### system_prompt +When conversation history exceeds a defined threshold, the LLM-based condenser: -The static system prompt text (cacheable across conversations) +1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context +2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained +3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries +4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction -* Type: - openhands.sdk.llm.message.TextContent +{/* Auto-switching light/dark mode image. */} +Light mode interface +Dark mode interface -#### tools +This approach achieves remarkable efficiency gains: +- Up to **2x reduction** in per-turn API costs +- **Consistent response times** even in long sessions +- **Equivalent or better performance** on software engineering tasks -List of available tools +Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). -* Type: - list[openhands.sdk.tool.tool.ToolDefinition] +### Extensibility -#### dynamic_context +The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): -Optional per-conversation context (hosts, repo info, etc.) -Sent as a second TextContent block inside the system message. +- **`RollingCondenser`** - For condensers that apply condensation to rolling history +- **`CondenserBase`** - For more specialized condensation strategies -* Type: - openhands.sdk.llm.message.TextContent | None +This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. -#### to_llm_message() -Convert to a single system LLM message. +### Setting Up Condensing -When `dynamic_context` is present the message contains two content -blocks: the static prompt followed by the dynamic context. Cache markers -are NOT applied here - they are applied by `LLM._apply_prompt_caching()` -when caching is enabled, which marks the static block (index 0) and leaves -the dynamic block (index 1) unmarked for cross-conversation cache sharing. +Create a `LLMSummarizingCondenser` to manage the context. +The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. -### class TokenEvent +This condenser triggers when there are more than `max_context_length` events in +the conversation history, and always keeps the first `keep_first` events (system prompts, +initial user messages) to preserve important context. -Bases: [`Event`](#class-event) +```python focus={3-4} icon="python" +from openhands.sdk.context import LLMSummarizingCondenser -Event from VLLM representing token IDs used in LLM interaction. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) +``` -#### Properties +### Ready-to-run example -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `prompt_token_ids`: list[int] -- `response_token_ids`: list[int] -- `source`: Literal['agent', 'user', 'environment'] -### class UserRejectObservation + +This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + -Bases: [`ObservationBaseEvent`](#class-observationbaseevent) -Observation when an action is rejected by user or hook. +Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: -This event is emitted when: -- User rejects an action during confirmation mode (rejection_source=”user”) -- A PreToolUse hook blocks an action (rejection_source=”hook”) +```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py +""" +To manage context in long-running conversations, the agent can use a context condenser +that keeps the conversation history within a specified size limit. This example +demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes +older parts of the conversation when the history exceeds a defined threshold. +""" +import os -#### Properties +from pydantic import SecretStr -- `action_id`: str -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `rejection_reason`: str -- `rejection_source`: Literal['user', 'hook'] -- `visualize`: Text - Return Rich Text representation of this user rejection event. +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context.condenser import LLMSummarizingCondenser +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -#### Methods -#### to_llm_message() +logger = get_logger(__name__) -### openhands.sdk.llm -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -### class CredentialStore +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] -Bases: `object` +# Create a condenser to manage the context. The condenser will automatically truncate +# conversation history when it exceeds max_size, and replaces the dropped events with an +# LLM-generated summary. This condenser triggers when there are more than ten events in +# the conversation history, and always keeps the first two events (system prompts, +# initial user messages) to preserve important context. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) -Store and retrieve OAuth credentials for LLM providers. +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) +llm_messages = [] # collect raw LLM messages -#### Properties -- `credentials_dir`: Path - Get the credentials directory, creating it if necessary. +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -#### Methods -#### __init__() +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) -Initialize the credential store. +# Send multiple messages to demonstrate condensation +print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") -* Parameters: - `credentials_dir` – Optional custom directory for storing credentials. - Defaults to ~/.local/share/openhands/auth/ +conversation.send_message( + "Hello! Can you create a Python file named math_utils.py with functions for " + "basic arithmetic operations (add, subtract, multiply, divide)?" +) +conversation.run() -#### delete() +conversation.send_message( + "Great! Now add a function to calculate the factorial of a number." +) +conversation.run() -Delete stored credentials for a vendor. +conversation.send_message("Add a function to check if a number is prime.") +conversation.run() -* Parameters: - `vendor` – The vendor/provider name -* Returns: - True if credentials were deleted, False if they didn’t exist +conversation.send_message( + "Add a function to calculate the greatest common divisor (GCD) of two numbers." +) +conversation.run() -#### get() +conversation.send_message( + "Now create a test file to verify all these functions work correctly." +) +conversation.run() -Get stored credentials for a vendor. +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -* Parameters: - `vendor` – The vendor/provider name (e.g., ‘openai’) -* Returns: - OAuthCredentials if found and valid, None otherwise +# Conversation persistence +print("Serializing conversation...") -#### save() +del conversation -Save credentials for a vendor. +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) -* Parameters: - `credentials` – The OAuth credentials to save +print("Sending message to deserialized conversation...") +conversation.send_message("Finally, clean up by deleting both files.") +conversation.run() -#### update_tokens() +print("=" * 100) +print("Conversation finished with LLM Summarizing Condenser.") +print(f"Total LLM messages collected: {len(llm_messages)}") +print("\nThe condenser automatically summarized older conversation history") +print("when the conversation exceeded the configured max_size threshold.") +print("This helps manage context length while preserving important information.") -Update tokens for an existing credential. +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -* Parameters: - * `vendor` – The vendor/provider name - * `access_token` – New access token - * `refresh_token` – New refresh token (if provided) - * `expires_in` – Token expiry in seconds -* Returns: - Updated credentials, or None if no existing credentials found + -### class ImageContent +## Next Steps -Bases: `BaseContent` +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings +### Ask Agent Questions +Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent.md -#### Properties +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -- `image_urls`: list[str] -- `type`: Literal['image'] +> A ready-to-run example is available [here](#ready-to-run-example)! -#### Methods +Use `ask_agent()` to get quick responses from the agent about the current conversation state without +interrupting the main execution flow. -#### model_config = (configuration object) +## Key Features -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +The `ask_agent()` method provides several important capabilities: -#### to_llm_dict() +#### Context-Aware Responses -Convert to LLM API format. - -### class LLM +The agent has access to the full conversation history when answering questions: -Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` +```python focus={2-3} icon="python" wrap +# Agent can reference what it has done so far +response = conversation.ask_agent( + "Summarize the activity so far in 1 sentence." +) +print(f"Response: {response}") +``` -Language model interface for OpenHands agents. +#### Non-Intrusive Operation -The LLM class provides a unified interface for interacting with various -language models through the litellm library. It handles model configuration, -API authentication, -retry logic, and tool calling capabilities. +Questions don't interrupt the main conversation flow - they're processed separately: -#### Example +```python focus={4-6} icon="python" wrap +# Start main conversation +thread = threading.Thread(target=conversation.run) +thread.start() -```pycon ->>> from openhands.sdk import LLM ->>> from pydantic import SecretStr ->>> llm = LLM( -... model="claude-sonnet-4-20250514", -... api_key=SecretStr("your-api-key"), -... usage_id="my-agent" -... ) ->>> # Use with agent or conversation +# Ask questions without affecting main execution +response = conversation.ask_agent("How's the progress?") ``` +#### Works During and After Execution -#### Properties +You can ask questions while the agent is running or after it has completed: -- `api_key`: str | SecretStr | None -- `api_version`: str | None -- `aws_access_key_id`: str | SecretStr | None -- `aws_region_name`: str | None -- `aws_secret_access_key`: str | SecretStr | None -- `base_url`: str | None -- `caching_prompt`: bool -- `custom_tokenizer`: str | None -- `disable_stop_word`: bool | None -- `disable_vision`: bool | None -- `drop_params`: bool -- `enable_encrypted_reasoning`: bool -- `extended_thinking_budget`: int | None -- `extra_headers`: dict[str, str] | None -- `force_string_serializer`: bool | None -- `input_cost_per_token`: float | None -- `is_subscription`: bool - Check if this LLM uses subscription-based authentication. - Returns True when the LLM was created via LLM.subscription_login(), - which uses the ChatGPT subscription Codex backend rather than the - standard OpenAI API. - * Returns: - True if using subscription-based transport, False otherwise. - * Return type: - bool -- `litellm_extra_body`: dict[str, Any] -- `log_completions`: bool -- `log_completions_folder`: str -- `max_input_tokens`: int | None -- `max_message_chars`: int -- `max_output_tokens`: int | None -- `metrics`: [Metrics](#class-metrics) - Get usage metrics for this LLM instance. - * Returns: - Metrics object containing token usage, costs, and other statistics. -- `model`: str -- `model_canonical_name`: str | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `model_info`: dict | None - Returns the model info dictionary. -- `modify_params`: bool -- `native_tool_calling`: bool -- `num_retries`: int -- `ollama_base_url`: str | None -- `openrouter_app_name`: str -- `openrouter_site_url`: str -- `output_cost_per_token`: float | None -- `prompt_cache_retention`: str | None -- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None -- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None -- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] -- `retry_max_wait`: int -- `retry_min_wait`: int -- `retry_multiplier`: float -- `safety_settings`: list[dict[str, str]] | None -- `seed`: int | None -- `stream`: bool -- `telemetry`: Telemetry - Get telemetry handler for this LLM instance. - * Returns: - Telemetry object for managing logging and metrics callbacks. -- `temperature`: float | None -- `timeout`: int | None -- `top_k`: float | None -- `top_p`: float | None -- `usage_id`: str +```python focus={3,7} icon="python" wrap +# During execution +time.sleep(2) # Let agent start working +response1 = conversation.ask_agent("Have you finished running?") -#### Methods +# After completion +thread.join() +response2 = conversation.ask_agent("What did you accomplish?") +``` -#### completion() +### Use Cases -Generate a completion from the language model. +- **Progress Monitoring**: Check on long-running tasks +- **Status Updates**: Get real-time information about agent activities +- **User Interfaces**: Provide sidebar information in chat applications -This is the method for getting responses from the model via Completion API. -It handles message formatting, tool calling, and response processing. +## Ready-to-run Example -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `_return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming tokens - kwargs* – Additional arguments passed to the LLM API -* Returns: - LLMResponse containing the model’s response and metadata. + + This example is available on GitHub: + [examples/01_standalone_sdk/28_ask_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/28_ask_agent_example.py) + -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. -* Raises: - `ValueError` – If streaming is requested (not supported). +This example shows how to use `ask_agent()` to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. -#### format_messages_for_llm() +```python icon="python" expandable examples/01_standalone_sdk/28_ask_agent_example.py +""" +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. -Formats Message objects for LLM consumption. +This example shows how to use ask_agent() to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. +""" -#### format_messages_for_responses() +import os +import threading +import time +from datetime import datetime -Prepare (instructions, input[]) for the OpenAI Responses API. +from pydantic import SecretStr -- Skips prompt caching flags and string serializer concerns -- Uses Message.to_responses_value to get either instructions (system) - or input items (others) -- Concatenates system instructions into a single instructions string -- For subscription mode, system prompts are prepended to user content +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import Event +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -#### get_token_count() -#### is_caching_prompt_active() +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -Check if prompt caching is supported and enabled for current model. +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] -* Returns: - True if prompt caching is supported and enabled for the given - : model. -* Return type: - boolean -#### classmethod load_from_env() +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" -#### classmethod load_from_json() + count = 0 -#### model_post_init() + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT {self.count}] {type(event).__name__}") + self.count += 1 -This function is meant to behave like a BaseModel method to initialise private attributes. -It takes context as an argument since that’s what pydantic-core passes when calling it. +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation( + agent=agent, workspace=cwd, visualizer=MinimalVisualizer, max_iteration_per_run=5 +) -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. -#### reset_metrics() +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") -Reset metrics and telemetry to fresh instances. -This is used by the LLMRegistry to ensure each registered LLM has -independent metrics, preventing metrics from being shared between -LLMs that were created via model_copy(). +print("=== Ask Agent Example ===") +print("This example demonstrates asking questions during conversation execution") -When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), -Pydantic’s model_copy() does a shallow copy of private attributes by default, -causing the original and copied LLM to share the same Metrics object. -This method allows the registry to fix this by resetting metrics to None, -which will be lazily recreated when accessed. +# Step 1: Build conversation context +print(f"\n[{timestamp()}] Building conversation context...") +conversation.send_message("Explore the current directory and describe the architecture") -#### responses() +# Step 2: Start conversation in background thread +print(f"[{timestamp()}] Starting conversation in background thread...") +thread = threading.Thread(target=conversation.run) +thread.start() -Alternative invocation path using OpenAI Responses API via LiteLLM. +# Give the agent time to start processing +time.sleep(2) -Maps Message[] -> (instructions, input[]) and returns LLMResponse. +# Step 3: Use ask_agent while conversation is running +print(f"\n[{timestamp()}] Using ask_agent while conversation is processing...") -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `include` – Optional list of fields to include in response - * `store` – Whether to store the conversation - * `_return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming deltas - kwargs* – Additional arguments passed to the API +# Ask context-aware questions +questions_and_responses = [] -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +question_1 = "Summarize the activity so far in 1 sentence." +print(f"\n[{timestamp()}] Asking: {question_1}") +response1 = conversation.ask_agent(question_1) +questions_and_responses.append((question_1, response1)) +print(f"Response: {response1}") -#### restore_metrics() +time.sleep(1) -#### classmethod subscription_login() +question_2 = "How's the progress?" +print(f"\n[{timestamp()}] Asking: {question_2}") +response2 = conversation.ask_agent(question_2) +questions_and_responses.append((question_2, response2)) +print(f"Response: {response2}") -Authenticate with a subscription service and return an LLM instance. +time.sleep(1) -This method provides subscription-based access to LLM models that are -available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather -than API credits. It handles credential caching, token refresh, and -the OAuth login flow. +question_3 = "Have you finished running?" +print(f"\n[{timestamp()}] {question_3}") +response3 = conversation.ask_agent(question_3) +questions_and_responses.append((question_3, response3)) +print(f"Response: {response3}") -Currently supported vendors: -- “openai”: ChatGPT Plus/Pro subscription for Codex models +# Step 4: Wait for conversation to complete +print(f"\n[{timestamp()}] Waiting for conversation to complete...") +thread.join() -Supported OpenAI models: -- gpt-5.1-codex-max -- gpt-5.1-codex-mini -- gpt-5.2 -- gpt-5.2-codex +# Step 5: Verify conversation state wasn't affected +final_event_count = len(conversation.state.events) +# Step 6: Ask a final question after conversation completion +print(f"\n[{timestamp()}] Asking final question after completion...") +final_response = conversation.ask_agent( + "Can you summarize what you accomplished in this conversation?" +) +print(f"Final response: {final_response}") -* Parameters: - * `vendor` – The vendor/provider. Currently only “openai” is supported. - * `model` – The model to use. Must be supported by the vendor’s - subscription service. - * `force_login` – If True, always perform a fresh login even if valid - credentials exist. - * `open_browser` – Whether to automatically open the browser for the - OAuth login flow. - llm_kwargs* – Additional arguments to pass to the LLM constructor. -* Returns: - An LLM instance configured for subscription-based access. -* Raises: - * `ValueError` – If the vendor or model is not supported. - * `RuntimeError` – If authentication fails. +# Step 7: Summary +print("\n" + "=" * 60) +print("SUMMARY OF ASK_AGENT DEMONSTRATION") +print("=" * 60) -#### uses_responses_api() +print("\nQuestions and Responses:") +for i, (question, response) in enumerate(questions_and_responses, 1): + print(f"\n{i}. Q: {question}") + print(f" A: {response[:100]}{'...' if len(response) > 100 else ''}") -Whether this model uses the OpenAI Responses API path. +final_truncated = final_response[:100] + ("..." if len(final_response) > 100 else "") +print(f"\nFinal Question Response: {final_truncated}") -#### vision_is_active() +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` -### class LLMProfileStore + -Bases: `object` -Standalone utility for persisting LLM configurations. +## Next Steps -#### Methods +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interrupt and redirect agent execution +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress -#### __init__() +### Conversation with Async +Source: https://docs.openhands.dev/sdk/guides/convo-async.md -Initialize the profile store. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -* Parameters: - `base_dir` – Path to the directory where the profiles are stored. - If None is provided, the default directory is used, i.e., - ~/.openhands/profiles. +> A ready-to-run example is available [here](#ready-to-run-example)! -#### delete() +### Concurrent Agents -Delete an existing profile. +Run multiple agent tasks in parallel using `asyncio.gather()`: -If the profile is not present in the profile directory, it does nothing. +```python icon="python" wrap +async def main(): + loop = asyncio.get_running_loop() + callback = AsyncCallbackWrapper(callback_coro, loop) -* Parameters: - `name` – Name of the profile to delete. -* Raises: - `TimeoutError` – If the lock cannot be acquired. + # Create multiple conversation tasks running in parallel + tasks = [ + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback) + ] + results = await asyncio.gather(*tasks) +``` -#### list() +## Ready-to-run Example -Returns a list of all profiles stored. + +This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) + -* Returns: - List of profile filenames (e.g., [“default.json”, “gpt4.json”]). +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop -#### load() +```python icon="python" expandable examples/01_standalone_sdk/11_async.py +""" +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop +""" -Load an LLM instance from the given profile name. +import asyncio +import os -* Parameters: - `name` – Name of the profile to load. -* Returns: - An LLM instance constructed from the profile configuration. -* Raises: - * `FileNotFoundError` – If the profile name does not exist. - * `ValueError` – If the profile file is corrupted or invalid. - * `TimeoutError` – If the lock cannot be acquired. +from pydantic import SecretStr -#### save() +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.conversation.types import ConversationCallbackType +from openhands.sdk.tool import Tool +from openhands.sdk.utils.async_utils import AsyncCallbackWrapper +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -Save a profile to the profile directory. -Note that if a profile name already exists, it will be overwritten. +logger = get_logger(__name__) -* Parameters: - * `name` – Name of the profile to save. - * `llm` – LLM instance to save - * `include_secrets` – Whether to include the profile secrets. Defaults to False. -* Raises: - `TimeoutError` – If the lock cannot be acquired. +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -### class LLMRegistry +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] -Bases: `object` +# Agent +agent = Agent(llm=llm, tools=tools) -A minimal LLM registry for managing LLM instances by usage ID. +llm_messages = [] # collect raw LLM messages -This registry provides a simple way to manage multiple LLM instances, -avoiding the need to recreate LLMs with the same configuration. -The registry also ensures that each registered LLM has independent metrics, -preventing metrics from being shared between LLMs that were created via -model_copy(). This is important for scenarios like creating a condenser LLM -from an agent LLM, where each should track its own usage independently. +# Callback coroutine +async def callback_coro(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -#### Properties +# Synchronous run conversation +def run_conversation(callback: ConversationCallbackType): + conversation = Conversation(agent=agent, callbacks=[callback]) -- `registry_id`: str -- `retry_listener`: Callable[[int, int], None] | None -- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None -- `usage_to_llm`: MappingProxyType - Access the internal usage-ID-to-LLM mapping (read-only view). + conversation.send_message( + "Hello! Can you create a new Python file named hello.py that prints " + "'Hello, World!'? Use task tracker to plan your steps." + ) + conversation.run() -#### Methods + conversation.send_message("Great! Now delete that file.") + conversation.run() -#### __init__() -Initialize the LLM registry. +async def main(): + loop = asyncio.get_running_loop() -* Parameters: - `retry_listener` – Optional callback for retry events. + # Create the callback + callback = AsyncCallbackWrapper(callback_coro, loop) -#### add() + # Run the conversation in a background thread and wait for it to finish... + await loop.run_in_executor(None, run_conversation, callback) -Add an LLM instance to the registry. + print("=" * 100) + print("Conversation finished. Got the following LLM messages:") + for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -This method ensures that the LLM has independent metrics before -registering it. If the LLM’s metrics are shared with another -registered LLM (e.g., due to model_copy()), fresh metrics will -be created automatically. + # Report cost + cost = llm.metrics.accumulated_cost + print(f"EXAMPLE_COST: {cost}") -* Parameters: - `llm` – The LLM instance to register. -* Raises: - `ValueError` – If llm.usage_id already exists in the registry. -#### get() - -Get an LLM instance from the registry. - -* Parameters: - `usage_id` – Unique identifier for the LLM usage slot. -* Returns: - The LLM instance. -* Raises: - `KeyError` – If usage_id is not found in the registry. +if __name__ == "__main__": + asyncio.run(main()) +``` -#### list_usage_ids() + -List all registered usage IDs. +## Next Steps -#### notify() +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents -Notify subscribers of registry events. +### Custom Visualizer +Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md -* Parameters: - `event` – The registry event to notify about. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### subscribe() +> A ready-to-run example is available [here](#ready-to-run-example)! -Subscribe to registry events. +The SDK provides flexible visualization options. You can use the default rich-formatted visualizer, customize it with highlighting patterns, or build completely custom visualizers by subclassing `ConversationVisualizerBase`. -* Parameters: - `callback` – Function to call when LLMs are created or updated. +## Visualizer Configuration Options -### class LLMResponse +The `visualizer` parameter in `Conversation` controls how events are displayed: -Bases: `BaseModel` +```python icon="python" focus={4-5, 7-8, 10-11, 13, 18, 20, 25} +from openhands.sdk import Conversation +from openhands.sdk.conversation import DefaultConversationVisualizer, ConversationVisualizerBase -Result of an LLM completion request. +# Option 1: Use default visualizer (enabled by default) +conversation = Conversation(agent=agent, workspace=workspace) -This type provides a clean interface for LLM completion results, exposing -only OpenHands-native types to consumers while preserving access to the -raw LiteLLM response for internal use. +# Option 2: Disable visualization +conversation = Conversation(agent=agent, workspace=workspace, visualizer=None) +# Option 3: Pass a visualizer class (will be instantiated automatically) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=DefaultConversationVisualizer) -#### Properties +# Option 4: Pass a configured visualizer instance +custom_viz = DefaultConversationVisualizer( + name="MyAgent", + highlight_regex={r"^Reasoning:": "bold cyan"} +) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=custom_viz) -- `id`: str - Get the response ID from the underlying LLM response. - This property provides a clean interface to access the response ID, - supporting both completion mode (ModelResponse) and response API modes - (ResponsesAPIResponse). - * Returns: - The response ID from the LLM response -- `message`: [Message](#class-message) -- `metrics`: [MetricsSnapshot](#class-metricssnapshot) -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `raw_response`: ModelResponse | ResponsesAPIResponse +# Option 5: Use custom visualizer class +class MyVisualizer(ConversationVisualizerBase): + def on_event(self, event): + print(f"Event: {event}") -#### Methods +conversation = Conversation(agent=agent, workspace=workspace, visualizer=MyVisualizer()) +``` -#### message +## Customizing the Default Visualizer -The completion message converted to OpenHands Message type +`DefaultConversationVisualizer` uses Rich panels and supports customization through configuration: -* Type: - [openhands.sdk.llm.message.Message](#class-message) +```python icon="python" focus={3-14, 19} +from openhands.sdk.conversation import DefaultConversationVisualizer -#### metrics +# Configure highlighting patterns using regex +custom_visualizer = DefaultConversationVisualizer( + name="MyAgent", # Prefix panel titles with agent name + highlight_regex={ + r"^Reasoning:": "bold cyan", # Lines starting with "Reasoning:" + r"^Thought:": "bold green", # Lines starting with "Thought:" + r"^Action:": "bold yellow", # Lines starting with "Action:" + r"\[ERROR\]": "bold red", # Error markers anywhere + r"\*\*(.*?)\*\*": "bold", # Markdown bold **text** + }, + skip_user_messages=False, # Show user messages +) -Snapshot of metrics from the completion request +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=custom_visualizer +) +``` -* Type: - [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) +**When to use**: Perfect for customizing colors and highlighting without changing the panel-based layout. -#### raw_response +## Creating Custom Visualizers -The original LiteLLM response (ModelResponse or -ResponsesAPIResponse) for internal use +For complete control over visualization, subclass `ConversationVisualizerBase`: -* Type: - litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse +```python icon="python" focus={4, 11, 28} +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import ActionEvent, ObservationEvent, AgentErrorEvent, Event -### class Message +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that prints raw event information.""" + + def __init__(self, name: str | None = None): + super().__init__(name=name) + self.step_count = 0 + + def on_event(self, event: Event) -> None: + """Handle each event.""" + if isinstance(event, ActionEvent): + self.step_count += 1 + tool_name = event.tool_name or "unknown" + print(f"Step {self.step_count}: {tool_name}") + + elif isinstance(event, ObservationEvent): + print(f" → Result received") + + elif isinstance(event, AgentErrorEvent): + print(f"❌ Error: {event.error}") -Bases: `BaseModel` +# Use your custom visualizer +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=MinimalVisualizer(name="Agent") +) +``` +### Key Methods -#### Properties +**`__init__(self, name: str | None = None)`** +- Initialize your visualizer with optional configuration +- `name` parameter is available from the base class for agent identification +- Call `super().__init__(name=name)` to initialize the base class -- `contains_image`: bool -- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: str | None -- `reasoning_content`: str | None -- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None -- `role`: Literal['user', 'system', 'assistant', 'tool'] -- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] -- `tool_call_id`: str | None -- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None +**`initialize(self, state: ConversationStateProtocol)`** +- Called automatically by `Conversation` after state is created +- Provides access to conversation state and statistics via `self._state` +- Override if you need custom initialization, but call `super().initialize(state)` -#### Methods +**`on_event(self, event: Event)`** *(required)* +- Called for each conversation event +- Implement your visualization logic here +- Access conversation stats via `self.conversation_stats` property -#### classmethod from_llm_chat_message() +**When to use**: When you need a completely different output format, custom state tracking, or integration with external systems. -Convert a LiteLLMMessage (Chat Completions) to our Message class. +## Ready-to-run Example -Provider-agnostic mapping for reasoning: -- Prefer message.reasoning_content if present (LiteLLM normalized field) -- Extract thinking_blocks from content array (Anthropic-specific) + +This example is available on GitHub: [examples/01_standalone_sdk/26_custom_visualizer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/26_custom_visualizer.py) + -#### classmethod from_llm_responses_output() +```python icon="python" expandable examples/01_standalone_sdk/26_custom_visualizer.py +"""Custom Visualizer Example -Convert OpenAI Responses API output items into a single assistant Message. +This example demonstrates how to create and use a custom visualizer by subclassing +ConversationVisualizer. This approach provides: +- Clean, testable code with class-based state management +- Direct configuration (just pass the visualizer instance to visualizer parameter) +- Reusable visualizer that can be shared across conversations -Policy (non-stream): -- Collect assistant text by concatenating output_text parts from message items -- Normalize function_call items to MessageToolCall list +This demonstrates how you can pass a ConversationVisualizer instance directly +to the visualizer parameter for clean, reusable visualization logic. +""" -#### to_chat_dict() +import logging +import os -Serialize message for OpenAI Chat Completions. +from pydantic import SecretStr -* Parameters: - * `cache_enabled` – Whether prompt caching is active. - * `vision_enabled` – Whether vision/image processing is enabled. - * `function_calling_enabled` – Whether native function calling is enabled. - * `force_string_serializer` – Force string serializer instead of list format. - * `send_reasoning_content` – Whether to include reasoning_content in output. +from openhands.sdk import LLM, Conversation +from openhands.sdk.conversation.visualizer import ConversationVisualizerBase +from openhands.sdk.event import ( + Event, +) +from openhands.tools.preset.default import get_default_agent -Chooses the appropriate content serializer and then injects threading keys: -- Assistant tool call turn: role == “assistant” and self.tool_calls -- Tool result turn: role == “tool” and self.tool_call_id (with name) -#### to_responses_dict() +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" -Serialize message for OpenAI Responses (input parameter). + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") -Produces a list of “input” items for the Responses API: -- system: returns [], system content is expected in ‘instructions’ -- user: one ‘message’ item with content parts -> input_text / input_image -(when vision enabled) -- assistant: emits prior assistant content as input_text, -and function_call items for tool_calls -- tool: emits function_call_output items (one per TextContent) -with matching call_id -#### to_responses_value() +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="agent", +) +agent = get_default_agent(llm=llm, cli_mode=True) -Return serialized form. +# ============================================================================ +# Configure Visualization +# ============================================================================ +# Set logging level to reduce verbosity +logging.getLogger().setLevel(logging.WARNING) -Either an instructions string (for system) or input items (for other roles). +# Start a conversation with custom visualizer +cwd = os.getcwd() +conversation = Conversation( + agent=agent, + workspace=cwd, + visualizer=MinimalVisualizer(), +) -### class MessageToolCall - -Bases: `BaseModel` - -Transport-agnostic tool call representation. +# Send a message and let the agent run +print("Sending task to agent...") +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("Task completed!") -One canonical id is used for linking across actions/observations and -for Responses function_call_output call_id. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + -#### Properties +## Next Steps -- `arguments`: str -- `id`: str -- `name`: str -- `origin`: Literal['completion', 'responses'] -- `costs`: list[Cost] -- `response_latencies`: list[ResponseLatency] -- `token_usages`: list[TokenUsage] +Now that you understand custom visualizers, explore these related topics: -#### Methods +- **[Events](/sdk/arch/events)** - Learn more about different event types +- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic -#### classmethod from_chat_tool_call() +### Pause and Resume +Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md -Create a MessageToolCall from a Chat Completions tool call. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### classmethod from_responses_function_call() +> A ready-to-run example is available [here](#ready-to-run-example)! -Create a MessageToolCall from a typed OpenAI Responses function_call item. +### Pausing Execution -Note: OpenAI Responses function_call.arguments is already a JSON string. +Pause the agent from another thread or after a delay using `conversation.pause()`, and +Resume the paused conversation after performing operations by calling `conversation.run()` again. -#### model_config = (configuration object) +```python icon="python" focus={9, 15} wrap +import time +thread = threading.Thread(target=conversation.run) +thread.start() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +print("Letting agent work for 5 seconds...") +time.sleep(5) -#### to_chat_dict() +print("Pausing the agent...") +conversation.pause() -Serialize to OpenAI Chat Completions tool_calls format. +print("Waiting for 5 seconds...") +time.sleep(5) -#### to_responses_dict() +print("Resuming the execution...") +conversation.run() +``` -Serialize to OpenAI Responses ‘function_call’ input item format. +## Ready-to-run Example -#### add_cost() + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + -#### add_response_latency() +Pause agent execution mid-task by calling `conversation.pause()`: -#### add_token_usage() +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time -Add a single usage record. +from pydantic import SecretStr -#### deep_copy() +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -Create a deep copy of the Metrics object. -#### diff() +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -Calculate the difference between current metrics and a baseline. +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] -This is useful for tracking metrics for specific operations like delegates. +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) -* Parameters: - `baseline` – A metrics object representing the baseline state -* Returns: - A new Metrics object containing only the differences since the baseline +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() -#### get() +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) -Return the metrics in a dictionary. +print(f"Initial status: {conversation.state.execution_status}") +print() -#### get_snapshot() +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() -Get a snapshot of the current metrics without the detailed lists. +# Let the agent work for a few seconds +print("Letting agent work for 2 seconds...") +time.sleep(2) -#### initialize_accumulated_token_usage() +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() -#### log() +# Wait for the thread to finish (it will stop when paused) +thread.join() -Log the metrics. +print(f"Agent status after pause: {conversation.state.execution_status}") +print() -#### merge() +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() -Merge ‘other’ metrics into this one. +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.execution_status}") -#### model_config = (configuration object) +# Resume execution +conversation.run() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +print(f"Final status: {conversation.state.execution_status}") -#### classmethod validate_accumulated_cost() +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -### class MetricsSnapshot + -Bases: `BaseModel` -A snapshot of metrics at a point in time. -Does not include lists of individual costs, latencies, or token usages. +## Next Steps +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents -#### Properties +### Persistence +Source: https://docs.openhands.dev/sdk/guides/convo-persistence.md -- `accumulated_cost`: float -- `accumulated_token_usage`: TokenUsage | None -- `max_budget_per_task`: float | None -- `model_name`: str +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### Methods +> A ready-to-run example is available [here](#ready-to-run-example)! -#### model_config = (configuration object) +## How to use Persistence -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Save conversation state to disk and restore it later for long-running or multi-session workflows. -### class OAuthCredentials +### Saving State -Bases: `BaseModel` +Create a conversation with a unique ID to enable persistence: -OAuth credentials for subscription-based LLM access. +```python focus={3-4,10-11} icon="python" wrap +import uuid +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" -#### Properties +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` -- `access_token`: str -- `expires_at`: int -- `refresh_token`: str -- `type`: Literal['oauth'] -- `vendor`: str +### Restoring State -#### Methods +Restore a conversation using the same ID and persistence directory: -#### is_expired() +```python focus={9-10} icon="python" +# Later, in a different session +del conversation -Check if the access token is expired. +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) -#### model_config = (configuration object) +conversation.send_message("Continue task") +conversation.run() # Continues from saved state +``` -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +## What Gets Persisted -### class OpenAISubscriptionAuth +The conversation state includes information that allows seamless restoration: -Bases: `object` +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys +- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) -Handle OAuth authentication for OpenAI ChatGPT subscription access. + + For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. + +## Persistence Directory Structure -#### Properties - -- `vendor`: str - Get the vendor name. - -#### Methods +When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each +conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` +(unless you specify a custom path). -#### __init__() +**Directory structure:** + + + + + + + + + + + + + + + + + + + + -Initialize the OpenAI subscription auth handler. +Each conversation directory contains: +- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata +- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) -* Parameters: - * `credential_store` – Optional custom credential store. - * `oauth_port` – Port for the local OAuth callback server. +The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. -#### create_llm() +## Ready-to-run Example -Create an LLM instance configured for Codex subscription access. + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + -* Parameters: - * `model` – The model to use (must be in OPENAI_CODEX_MODELS). - * `credentials` – OAuth credentials to use. If None, uses stored credentials. - * `instructions` – Optional instructions for the Codex model. - llm_kwargs* – Additional arguments to pass to LLM constructor. -* Returns: - An LLM instance configured for Codex access. -* Raises: - `ValueError` – If the model is not supported or no credentials available. +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid -#### get_credentials() +from pydantic import SecretStr -Get stored credentials if they exist. +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -#### has_valid_credentials() -Check if valid (non-expired) credentials exist. +logger = get_logger(__name__) -#### async login() +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -Perform OAuth login flow. +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -This starts a local HTTP server to handle the OAuth callback, -opens the browser for user authentication, and waits for the -callback with the authorization code. +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) -* Parameters: - `open_browser` – Whether to automatically open the browser. -* Returns: - The obtained OAuth credentials. -* Raises: - `RuntimeError` – If the OAuth flow fails or times out. +llm_messages = [] # collect raw LLM messages -#### logout() -Remove stored credentials. +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -* Returns: - True if credentials were removed, False if none existed. -#### async refresh_if_needed() +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" -Refresh credentials if they are expired. +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() -* Returns: - Updated credentials, or None if no credentials exist. -* Raises: - `RuntimeError` – If token refresh fails. +conversation.send_message("Great! Now delete that file.") +conversation.run() -### class ReasoningItemModel +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -Bases: `BaseModel` +# Conversation persistence +print("Serializing conversation...") -OpenAI Responses reasoning item (non-stream, subset we consume). +del conversation -Do not log or render encrypted_content. +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() -#### Properties +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -- `content`: list[str] | None -- `encrypted_content`: str | None -- `id`: str | None -- `status`: str | None -- `summary`: list[str] -#### Methods + -#### model_config = (configuration object) +## Reading serialized events -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Convert persisted events into LLM-ready messages for reuse or analysis. -### class RedactedThinkingBlock + +This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) + -Bases: `BaseModel` +```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py +"""Load persisted events and convert them into LLM-ready messages.""" -Redacted thinking block for previous responses without extended thinking. +import json +import os +import uuid +from pathlib import Path -This is used as a placeholder for assistant messages that were generated -before extended thinking was enabled. +from pydantic import SecretStr -#### Properties +conversation_id = uuid.uuid4() +persistence_root = Path(".conversations") +log_dir = ( + persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +) -- `data`: str -- `type`: Literal['redacted_thinking'] +os.environ.setdefault("LOG_JSON", "true") +os.environ.setdefault("LOG_TO_FILE", "true") +os.environ.setdefault("LOG_DIR", str(log_dir)) +os.environ.setdefault("LOG_LEVEL", "INFO") -#### Methods +from openhands.sdk import ( # noqa: E402 + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + Tool, +) +from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 +from openhands.tools.terminal import TerminalTool # noqa: E402 -#### model_config = (configuration object) -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +setup_logging(log_to_file=True, log_dir=str(log_dir)) +logger = get_logger(__name__) -### class RegistryEvent +api_key = os.getenv("LLM_API_KEY") +if not api_key: + raise RuntimeError("LLM_API_KEY environment variable is not set.") -Bases: `BaseModel` +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) -#### Properties +###### +# Create a conversation that persists its events +###### -- `llm`: [LLM](#class-llm) -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### class RouterLLM +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + persistence_dir=str(persistence_root), + conversation_id=conversation_id, +) -Bases: [`LLM`](#class-llm) - -Base class for multiple LLM acting as a unified LLM. -This class provides a foundation for implementing model routing by -inheriting from LLM, allowing routers to work with multiple underlying -LLM models while presenting a unified LLM interface to consumers. -Key features: -- Works with multiple LLMs configured via llms_for_routing -- Delegates all other operations/properties to the selected LLM -- Provides routing interface through select_llm() method +conversation.send_message( + "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " + "Reply with a short confirmation once done." +) +conversation.run() +conversation.send_message( + "Without using any tools, summarize in one sentence what you did." +) +conversation.run() -#### Properties +assert conversation.state.persistence_dir is not None +persistence_dir = Path(conversation.state.persistence_dir) +event_dir = persistence_dir / "events" -- `active_llm`: [LLM](#class-llm) | None -- `llms_for_routing`: dict[str, [LLM](#class-llm)] -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `router_name`: str +event_paths = sorted(event_dir.glob("event-*.json")) -#### Methods +if not event_paths: + raise RuntimeError("No event files found. Was persistence enabled?") -#### completion() +###### +# Read from serialized events +###### -This method intercepts completion calls and routes them to the appropriate -underlying LLM based on the routing logic implemented in select_llm(). -* Parameters: - * `messages` – List of conversation messages - * `tools` – Optional list of tools available to the model - * `return_metrics` – Whether to return usage metrics - * `add_security_risk_prediction` – Add security_risk field to tool schemas - * `on_token` – Optional callback for streaming tokens - kwargs* – Additional arguments passed to the LLM API +events = [Event.model_validate_json(path.read_text()) for path in event_paths] -#### NOTE -Summary field is always added to tool schemas for transparency and -explainability of agent actions. +convertible_events = [ + event for event in events if isinstance(event, LLMConvertibleEvent) +] +llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) -#### model_post_init() +if llm.uses_responses_api(): + logger.info("Formatting messages for the OpenAI Responses API.") + instructions, input_items = llm.format_messages_for_responses(llm_messages) + logger.info("Responses instructions:\n%s", instructions) + logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) +else: + logger.info("Formatting messages for the OpenAI Chat Completions API.") + chat_messages = llm.format_messages_for_llm(llm_messages) + logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) -This function is meant to behave like a BaseModel method to initialise private attributes. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -It takes context as an argument since that’s what pydantic-core passes when calling it. + -* Parameters: - * `self` – The BaseModel instance. - * `context` – The context. -#### abstractmethod select_llm() +## How State Persistence Works -Select which LLM to use based on messages and events. +The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. -This method implements the core routing logic for the RouterLLM. -Subclasses should analyze the provided messages to determine which -LLM from llms_for_routing is most appropriate for handling the request. +### Auto-Save Mechanism -* Parameters: - `messages` – List of messages in the conversation that can be used - to inform the routing decision. -* Returns: - The key/name of the LLM to use from llms_for_routing dictionary. +When you modify any public field on `ConversationState`, the SDK automatically: -#### classmethod set_placeholder_model() +1. Detects the field change via a custom `__setattr__` implementation +2. Serializes the entire base state to `base_state.json` +3. Triggers any registered state change callbacks -Guarantee model exists before LLM base validation runs. +This happens transparently—you don't need to call any save methods manually. -#### classmethod validate_llms_not_empty() +```python +# These changes are automatically persisted: +conversation.state.execution_status = ConversationExecutionStatus.RUNNING +conversation.state.max_iterations = 100 +``` -### class TextContent +### Events vs Base State -Bases: `BaseContent` +The persistence system separates data into two categories: +| Category | Storage | Contents | +|----------|---------|----------| +| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | +| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | -#### Properties +Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: +- **Fast event appends**: No need to rewrite the entire history +- **Atomic state updates**: Base state is always consistent +- **Efficient restoration**: Events can be loaded lazily -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `text`: str -- `type`: Literal['text'] -#### Methods -#### to_llm_dict() +## Next Steps -Convert to LLM API format. +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations -### class ThinkingBlock +### Send Message While Running +Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md -Bases: `BaseModel` +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Anthropic thinking block for extended thinking feature. -This represents the raw thinking blocks returned by Anthropic models -when extended thinking is enabled. These blocks must be preserved -and passed back to the API for tool use scenarios. + +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) + +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: -#### Properties +```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. -- `signature`: str | None -- `thinking`: str -- `type`: Literal['thinking'] +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. -#### Methods +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document -#### model_config = (configuration object) +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. -### openhands.sdk.security -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing -### class AlwaysConfirm +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +import os +import threading +import time +from datetime import datetime -#### Methods +from pydantic import SecretStr -#### model_config = (configuration object) +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -#### should_confirm() +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -Determine if an action with the given risk level requires confirmation. +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. -### class ConfirmRisky +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +print("=== Send Message While Processing Example ===") -#### Properties +# Step 1: Send initial message +start_time = timestamp() +conversation.send_message( + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +) -- `confirm_unknown`: bool -- `threshold`: [SecurityRisk](#class-securityrisk) +# Step 2: Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() -#### Methods +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working -#### model_config = (configuration object) +second_time = timestamp() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) -#### should_confirm() +# Wait for completion +thread.join() -Determine if an action with the given risk level requires confirmation. +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") -#### classmethod validate_threshold() + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") -### class ConfirmationPolicyBase +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -Bases: `DiscriminatedUnionMixin`, `ABC` + -#### Methods +### Sending Messages During Execution -#### model_config = (configuration object) +As shown in the example above, use threading to send messages while the agent is running: -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +```python icon="python" +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() -#### abstractmethod should_confirm() +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working -Determine if an action with the given risk level requires confirmation. +second_time = timestamp() -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. +# Wait for completion +thread.join() +``` -### class GraySwanAnalyzer +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion -Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. -Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. +## Next Steps -This analyzer sends conversation history and pending actions to the GraySwan -Cygnal API for security analysis. The API returns a violation score which is -mapped to SecurityRisk levels. +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations -Environment Variables: -: GRAYSWAN_API_KEY: Required API key for GraySwan authentication - GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy +### Critic (Experimental) +Source: https://docs.openhands.dev/sdk/guides/critic.md -#### Example + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + -```pycon ->>> from openhands.sdk.security.grayswan import GraySwanAnalyzer ->>> analyzer = GraySwanAnalyzer() ->>> risk = analyzer.security_risk(action_event) -``` +> A ready-to-run example is available [here](#ready-to-run-example)! -#### Properties +## What is a Critic? -- `api_key`: SecretStr | None -- `api_url`: str -- `history_limit`: int -- `low_threshold`: float -- `max_message_chars`: int -- `medium_threshold`: float -- `policy_id`: str | None -- `timeout`: float +A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: -#### Methods +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion +- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold -#### close() +You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. -Clean up resources. + +This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. + -#### model_config = (configuration object) +## Quick Start -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. -#### model_post_init() +## Understanding Critic Results -Initialize the analyzer after model creation. +Critic evaluations produce scores and feedback: -#### security_risk() +- **`score`**: Float between 0.0 and 1.0 representing predicted success probability +- **`message`**: Optional feedback with detailed probabilities +- **`success`**: Boolean property (True if score >= 0.5) -Analyze action for security risks using GraySwan API. +Results are automatically displayed in the conversation visualizer: -This method converts the conversation history and the pending action -to OpenAI message format and sends them to the GraySwan Cygnal API -for security analysis. +![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) -* Parameters: - `action` – The ActionEvent to analyze -* Returns: - SecurityRisk level based on GraySwan analysis +### Accessing Results Programmatically -#### set_events() +```python icon="python" focus={4-7} +from openhands.sdk import Event, ActionEvent, MessageEvent -Set the events for context when analyzing actions. +def callback(event: Event): + if isinstance(event, (ActionEvent, MessageEvent)): + if event.critic_result is not None: + print(f"Critic score: {event.critic_result.score:.3f}") + print(f"Success: {event.critic_result.success}") -* Parameters: - `events` – Sequence of events to use as context for security analysis +conversation = Conversation(agent=agent, callbacks=[callback]) +``` -#### validate_thresholds() +## Iterative Refinement with a Critic -Validate that thresholds are properly ordered. +The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. -### class LLMSecurityAnalyzer +### How It Works -Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) +1. Agent completes a task and calls `FinishAction` +2. Critic evaluates the result and produces a score +3. If score < `success_threshold`, a follow-up prompt is sent automatically +4. Agent continues working to address issues +5. Process repeats until score meets threshold or `max_iterations` is reached -LLM-based security analyzer. +### Configuration -This analyzer respects the security_risk attribute that can be set by the LLM -when generating actions, similar to OpenHands’ LLMRiskAnalyzer. +Use `IterativeRefinementConfig` to enable automatic retries: -It provides a lightweight security analysis approach that leverages the LLM’s -understanding of action context and potential risks. +```python icon="python" focus={1,4-7,12} +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig -#### Methods +# Configure iterative refinement +iterative_config = IterativeRefinementConfig( + success_threshold=0.7, # Retry if score < 70% + max_iterations=3, # Maximum retry attempts +) -#### model_config = (configuration object) +# Attach to critic +critic = APIBasedCritic( + server_url="https://llm-proxy.eval.all-hands.dev/vllm", + api_key=api_key, + model_name="critic", + iterative_refinement=iterative_config, +) +``` -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### Parameters -#### security_risk() +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | +| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | -Evaluate security risk based on LLM-provided assessment. +### Custom Follow-up Prompts -This method checks if the action has a security_risk attribute set by the LLM -and returns it. The LLM may not always provide this attribute but it defaults to -UNKNOWN if not explicitly set. +By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: -### class NeverConfirm +```python icon="python" focus={4-12} +from openhands.sdk.critic.base import CriticBase, CriticResult -Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) +class CustomCritic(APIBasedCritic): + def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: + score_percent = critic_result.score * 100 + return f""" +Your solution scored {score_percent:.1f}% (iteration {iteration}). -#### Methods +Please review your work carefully: +1. Check that all requirements are met +2. Verify tests pass +3. Fix any issues and try again +""" +``` -#### model_config = (configuration object) +### Example Workflow -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Here's what happens during iterative refinement: -#### should_confirm() +``` +Iteration 1: + → Agent creates files, runs tests + → Agent calls FinishAction + → Critic evaluates: score = 0.45 (below 0.7 threshold) + → Follow-up prompt sent automatically -Determine if an action with the given risk level requires confirmation. +Iteration 2: + → Agent reviews and fixes issues + → Agent calls FinishAction + → Critic evaluates: score = 0.72 (above threshold) + → ✅ Success! Conversation ends +``` -This method defines the core logic for determining whether user confirmation -is required before executing an action based on its security risk level. +## Troubleshooting -* Parameters: - `risk` – The security risk level of the action to be evaluated. - Defaults to SecurityRisk.UNKNOWN if not specified. -* Returns: - True if the action requires user confirmation before execution, - False if the action can proceed without confirmation. +### Critic Evaluations Not Appearing -### class SecurityAnalyzerBase +- Verify the critic is properly configured and passed to the Agent +- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) -Bases: `DiscriminatedUnionMixin`, `ABC` +### API Authentication Errors -Abstract base class for security analyzers. +- Verify `LLM_API_KEY` is set correctly +- Check that the API key has not expired -Security analyzers evaluate the risk of actions before they are executed -and can influence the conversation flow based on security policies. - -This is adapted from OpenHands SecurityAnalyzer but designed to work -with the agent-sdk’s conversation-based architecture. - -#### Methods - -#### analyze_event() - -Analyze an event for security risks. +### Iterative Refinement Not Triggering -This is a convenience method that checks if the event is an action -and calls security_risk() if it is. Non-action events return None. +- Ensure `iterative_refinement` config is attached to the critic +- Check that `success_threshold` is set appropriately (higher values trigger more retries) +- Verify the agent is using `FinishAction` to complete tasks -* Parameters: - `event` – The event to analyze -* Returns: - ActionSecurityRisk if event is an action, None otherwise +## Ready-to-run Example -#### analyze_pending_actions() + +The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) + -Analyze all pending actions in a conversation. +This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. -This method gets all unmatched actions from the conversation state -and analyzes each one for security risks. +```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py +"""Iterative Refinement with Critic Model Example. -* Parameters: - `conversation` – The conversation to analyze -* Returns: - List of tuples containing (action, risk_level) for each pending action +This is EXPERIMENTAL. -#### model_config = (configuration object) +This example demonstrates how to use a critic model to shepherd an agent through +complex, multi-step tasks. The critic evaluates the agent's progress and provides +feedback that can trigger follow-up prompts when the agent hasn't completed the +task successfully. -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +Key concepts demonstrated: +1. Setting up a critic with IterativeRefinementConfig for automatic retry +2. Conversation.run() automatically handles retries based on critic scores +3. Custom follow-up prompt generation via critic.get_followup_prompt() +4. Iterating until the task is completed successfully or max iterations reached -#### abstractmethod security_risk() +For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured +using the same base_url with /vllm suffix and "critic" as the model name. +""" -Evaluate the security risk of an ActionEvent. +import os +import re +import tempfile +from pathlib import Path -This is the core method that analyzes an ActionEvent and returns its risk level. -Implementations should examine the action’s content, context, and potential -impact to determine the appropriate risk level. +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +from openhands.sdk.critic.base import CriticBase +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -* Parameters: - `action` – The ActionEvent to analyze for security risks -* Returns: - ActionSecurityRisk enum indicating the risk level -#### should_require_confirmation() +# Configuration +# Higher threshold (70%) makes it more likely the agent needs multiple iterations, +# which better demonstrates how iterative refinement works. +# Adjust as needed to see different behaviors. +SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) -Determine if an action should require user confirmation. -This implements the default confirmation logic based on risk level -and confirmation mode settings. +def get_required_env(name: str) -> str: + value = os.getenv(name) + if value: + return value + raise ValueError( + f"Missing required environment variable: {name}. " + f"Set {name} before running this example." + ) -* Parameters: - * `risk` – The security risk level of the action - * `confirmation_mode` – Whether confirmation mode is enabled -* Returns: - True if confirmation is required, False otherwise -### class SecurityRisk +def get_default_critic(llm: LLM) -> CriticBase | None: + """Auto-configure critic for All-Hands LLM proxy. -Bases: `str`, `Enum` + When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an + APIBasedCritic configured with: + - server_url: {base_url}/vllm + - api_key: same as LLM + - model_name: "critic" -Security risk levels for actions. + Args: + llm: The LLM instance to derive critic configuration from. -Based on OpenHands security risk levels but adapted for agent-sdk. -Integer values allow for easy comparison and ordering. + Returns: + An APIBasedCritic if the LLM is configured for All-Hands proxy, + None otherwise. + Example: + llm = LLM( + model="anthropic/claude-sonnet-4-5", + api_key=api_key, + base_url="https://llm-proxy.eval.all-hands.dev", + ) + critic = get_default_critic(llm) + if critic is None: + # Fall back to explicit configuration + critic = APIBasedCritic( + server_url="https://my-critic-server.com", + api_key="my-api-key", + model_name="my-critic-model", + ) + """ + base_url = llm.base_url + api_key = llm.api_key + if base_url is None or api_key is None: + return None -#### Properties + # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) + pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" + if not re.match(pattern, base_url): + return None -- `description`: str - Get a human-readable description of the risk level. -- `visualize`: Text - Return Rich Text representation of this risk level. + return APIBasedCritic( + server_url=f"{base_url.rstrip('/')}/vllm", + api_key=api_key, + model_name="critic", + ) -#### Methods -#### HIGH = 'HIGH' +# Task prompt designed to be moderately complex with subtle requirements. +# The task is simple enough to complete in 1-2 iterations, but has specific +# requirements that are easy to miss - triggering critic feedback. +INITIAL_TASK_PROMPT = """\ +Create a Python word statistics tool called `wordstats` that analyzes text files. -#### LOW = 'LOW' +## Structure -#### MEDIUM = 'MEDIUM' +Create directory `wordstats/` with: +- `stats.py` - Main module with `analyze_file(filepath)` function +- `cli.py` - Command-line interface +- `tests/test_stats.py` - Unit tests -#### UNKNOWN = 'UNKNOWN' +## Requirements for stats.py -#### get_color() +The `analyze_file(filepath)` function must return a dict with these EXACT keys: +- `lines`: total line count (including empty lines) +- `words`: word count +- `chars`: character count (including whitespace) +- `unique_words`: count of unique words (case-insensitive) -Get the color for displaying this risk level in Rich text. +### Important edge cases (often missed!): +1. Empty files must return all zeros, not raise an exception +2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) +3. Numbers like "123" or "3.14" are NOT counted as words +4. Contractions like "don't" count as ONE word +5. File not found must raise FileNotFoundError with a clear message -#### is_riskier() +## Requirements for cli.py -Check if this risk level is riskier than another. +When run as `python cli.py `: +- Print each stat on its own line: "Lines: X", "Words: X", etc. +- Exit with code 1 if file not found, printing error to stderr +- Exit with code 0 on success -Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is -less risky than HIGH. UNKNOWN is not comparable to any other level. +## Required Tests (test_stats.py) -To make this act like a standard well-ordered domain, we reflexively consider -risk levels to be riskier than themselves. That is: +Write tests that verify: +1. Basic counting on normal text +2. Empty file returns all zeros +3. Hyphenated words counted correctly +4. Numbers are excluded from word count +5. FileNotFoundError raised for missing files - for risk_level in list(SecurityRisk): - : assert risk_level.is_riskier(risk_level) +## Verification Steps - # More concretely: - assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) - assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) - assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) +1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): +``` +Hello world! +This is a well-known test file. -This can be disabled by setting the reflexive parameter to False. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -* Parameters: - other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. - reflexive (bool*) – Whether the relationship is reflexive. -* Raises: - `ValueError` – If either risk level is UNKNOWN. +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -### openhands.sdk.tool -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -### class Action +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -Bases: `Schema`, `ABC` -Base schema for input action. +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -#### Properties +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `visualize`: Text - Return Rich Text representation of this action. - This method can be overridden by subclasses to customize visualization. - The base implementation displays all action fields systematically. -### class ExecutableTool - -Bases: `Protocol` +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -Protocol for tools that are guaranteed to have a non-None executor. +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -This eliminates the need for runtime None checks and type narrowing -when working with tools that are known to be executable. +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -#### Properties +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] -- `name`: str +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -#### Methods +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. -#### __init__() +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -### class FinishTool +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -Bases: `ToolDefinition[FinishAction, FinishObservation]` +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -Tool for signaling the completion of a task or conversation. +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -#### Properties +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -#### Methods +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -#### classmethod create() +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -Create FinishTool instance. +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -* Parameters: - * `conv_state` – Optional conversation state (not used by FinishTool). - params* – Additional parameters (none supported). -* Returns: - A sequence containing a single FinishTool instance. -* Raises: - `ValueError` – If any parameters are provided. +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -#### name = 'finish' +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -### class Observation +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -Bases: `Schema`, `ABC` +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -Base schema for output observation. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -#### Properties +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' -- `content`: list[TextContent | ImageContent] -- `is_error`: bool -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `text`: str - Extract all text content from the observation. - * Returns: - Concatenated text from all TextContent items in content. -- `to_llm_content`: Sequence[TextContent | ImageContent] - Default content formatting for converting observation to LLM readable content. - Subclasses can override to provide richer content (e.g., images, diffs). -- `visualize`: Text - Return Rich Text representation of this observation. - Subclasses can override for custom visualization; by default we show the - same text that would be sent to the LLM. +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -#### Methods +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -#### classmethod from_text() -Utility to create an Observation from a simple text string. +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -* Parameters: - * `text` – The text content to include in the observation. - * `is_error` – Whether this observation represents an error. - kwargs* – Additional fields for the observation subclass. -* Returns: - An Observation instance with the text wrapped in a TextContent. +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -### class ThinkTool +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -Bases: `ToolDefinition[ThinkAction, ThinkObservation]` +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -Tool for logging thoughts without making changes. +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -#### Properties +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -- `model_config`: = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -#### Methods +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -#### classmethod create() +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. -Create ThinkTool instance. +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -* Parameters: - * `conv_state` – Optional conversation state (not used by ThinkTool). - params* – Additional parameters (none supported). -* Returns: - A sequence containing a single ThinkTool instance. -* Raises: - `ValueError` – If any parameters are provided. +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -#### name = 'think' +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -### class Tool +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -Bases: `BaseModel` -Defines a tool to be initialized for the agent. +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -This is only used in agent-sdk for type schema for server use. +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -#### Properties +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -- `name`: str -- `params`: dict[str, Any] +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -#### Methods +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -#### model_config = (configuration object) +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -#### classmethod validate_name() +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -Validate that name is not empty. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. -#### classmethod validate_params() +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -Convert None params to empty dict. +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -### class ToolAnnotations +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -Bases: `BaseModel` +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -Annotations to provide hints about the tool’s behavior. -Based on Model Context Protocol (MCP) spec: -[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -#### Properties +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -- `destructiveHint`: bool -- `idempotentHint`: bool -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `openWorldHint`: bool -- `readOnlyHint`: bool -- `title`: str | None -### class ToolDefinition +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -Base class for all tool implementations. +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -This class serves as a base for the discriminated union of all tool types. -All tools must inherit from this class and implement the .create() method for -proper initialization with executors and parameters. +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -Features: -- Normalize input/output schemas (class or dict) into both model+schema. -- Validate inputs before execute. -- Coerce outputs only if an output model is defined; else return vanilla JSON. -- Export MCP tool description. +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -#### Examples +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -Simple tool with no parameters: -: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): - : @classmethod - def create(cls, conv_state=None, - `
` - ``` - ** - ``` - `
` - params): - `
` - > return [cls(name=”finish”, …, executor=FinishExecutor())] +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. -Complex tool with initialization parameters: -: class TerminalTool(ToolDefinition[TerminalAction, - : TerminalObservation]): - @classmethod - def create(cls, conv_state, - `
` - ``` - ** - ``` - `
` - params): - `
` - > executor = TerminalExecutor( - > : working_dir=conv_state.workspace.working_dir, - > `
` - > ``` - > ** - > ``` - > `
` - > params, - `
` - > ) - > return [cls(name=”terminal”, …, executor=executor)] - - -#### Properties +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` -- `action_type`: type[[Action](#class-action)] -- `annotations`: [ToolAnnotations](#class-toolannotations) | None -- `description`: str -- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] -- `meta`: dict[str, Any] | None -- `model_config`: ClassVar[ConfigDict] = (configuration object) - Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -- `name`: ClassVar[str] = '' -- `observation_type`: type[[Observation](#class-observation)] | None -- `title`: str +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 -#### Methods +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. -#### action_from_arguments() +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" -Create an action from parsed arguments. -This method can be overridden by subclasses to provide custom logic -for creating actions from arguments (e.g., for MCP tools). +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) -* Parameters: - `arguments` – The parsed arguments from the tool call. -* Returns: - The action instance created from the arguments. +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) -#### as_executable() +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) -Return this tool as an ExecutableTool, ensuring it has an executor. +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) -This method eliminates the need for runtime None checks by guaranteeing -that the returned tool has a non-None executor. +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") -* Returns: - This tool instance, typed as ExecutableTool. -* Raises: - `NotImplementedError` – If the tool has no executor. +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) -#### abstractmethod classmethod create() +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") -Create a sequence of Tool instances. +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() -This method must be implemented by all subclasses to provide custom -initialization logic, typically initializing the executor with parameters -from conv_state and other optional parameters. +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") -* Parameters: - args** – Variable positional arguments (typically conv_state as first arg). - kwargs* – Optional parameters for tool initialization. -* Returns: - A sequence of Tool instances. Even single tools are returned as a sequence - to provide a consistent interface and eliminate union return types. +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` -#### classmethod resolve_kind() +```bash Running the Example icon="terminal" +LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ + uv run python examples/01_standalone_sdk/34_critic_example.py +``` -Resolve a kind string to its corresponding tool class. +### Example Output -* Parameters: - `kind` – The name of the tool class to resolve -* Returns: - The tool class corresponding to the kind -* Raises: - `ValueError` – If the kind is unknown +``` +📁 Created workspace: /tmp/critic_demo_abc123 -#### set_executor() +====================================================================== +🚀 Starting Iterative Refinement with Critic Model +====================================================================== +Success threshold: 70% +Max iterations: 3 -Create a new Tool instance with the given executor. +... agent works on the task ... -#### to_mcp_tool() +✓ Critic evaluation: score=0.758, success=True -Convert a Tool to an MCP tool definition. +Created files: + - sample.txt + - wordstats/cli.py + - wordstats/stats.py + - wordstats/tests/test_stats.py -Allow overriding input/output schemas (usually by subclasses). +EXAMPLE_COST: 0.0234 +``` -* Parameters: - * `input_schema` – Optionally override the input schema. - * `output_schema` – Optionally override the output schema. +## Next Steps -#### to_openai_tool() +- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior +- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics +- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns -Convert a Tool to an OpenAI tool. +### Custom Tools +Source: https://docs.openhands.dev/sdk/guides/custom-tools.md -* Parameters: - * `add_security_risk_prediction` – Whether to add a security_risk field - to the action schema for LLM to predict. This is useful for - tools that may have safety risks, so the LLM can reason about - the risk level before calling the tool. - * `action_type` – Optionally override the action_type to use for the schema. - This is useful for MCPTool to use a dynamically created action type - based on the tool’s input schema. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -#### NOTE -Summary field is always added to the schema for transparency and -explainability of agent actions. +> The ready-to-run example is available [here](#ready-to-run-example)! -#### to_responses_tool() +## Understanding the Tool System -Convert a Tool to a Responses API function tool (LiteLLM typed). +The SDK's tool system is built around three core components: -For Responses API, function tools expect top-level keys: -(JSON configuration object) +1. **Action** - Defines input parameters (what the tool accepts) +2. **Observation** - Defines output data (what the tool returns) +3. **Executor** - Implements the tool's logic (what the tool does) -* Parameters: - * `add_security_risk_prediction` – Whether to add a security_risk field - * `action_type` – Optional override for the action type +These components are tied together by a **ToolDefinition** that registers the tool with the agent. -#### NOTE -Summary field is always added to the schema for transparency and -explainability of agent actions. +## Built-in Tools -### class ToolExecutor +The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. -Bases: `ABC`, `Generic` +```python icon="python" wrap +from openhands.tools import BashTool, FileEditorTool +from openhands.tools.preset import get_default_tools -Executor function type for a Tool. +# Use specific tools +agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) -#### Methods +# Or use preset +tools = get_default_tools() +agent = Agent(llm=llm, tools=tools) +``` -#### close() + +See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. + -Close the executor and clean up resources. +## Creating a Custom Tool -Default implementation does nothing. Subclasses should override -this method to perform cleanup (e.g., closing connections, -terminating processes, etc.). +Here's a minimal example of creating a custom grep tool: -### openhands.sdk.utils -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md + + + ### Define the Action + Defines input parameters (what the tool accepts) -Utility functions for the OpenHands SDK. + ```python icon="python" wrap + class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", + description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, + description="Optional glob to filter files (e.g. '*.py')" + ) + ``` + + + ### Define the Observation + Defines output data (what the tool returns) -### deprecated() + ```python icon="python" wrap + class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 -Return a decorator that deprecates a callable with explicit metadata. + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + ``` + + The to_llm_content() property formats observations for the LLM. + + + + ### Define the Executor + Implements the tool’s logic (what the tool does) -Use this helper when you can annotate a function, method, or property with -@deprecated(…). It transparently forwards to `deprecation.deprecated()` -while filling in the SDK’s current version metadata unless custom values are -supplied. + ```python icon="python" wrap + class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal -### maybe_truncate() + def __call__( + self, + action: GrepAction, + conversation=None, + ) -> GrepObservation: + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) -Truncate the middle of content if it exceeds the specified length. + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q}" + else: + cmd = f"grep -rHnE {pat} {root_q}" + cmd += " 2>/dev/null | head -100" + result = self.terminal(TerminalAction(command=cmd)) -Keeps the head and tail of the content to preserve context at both ends. -Optionally saves the full content to a file for later investigation. + matches: list[str] = [] + files: set[str] = set() -* Parameters: - * `content` – The text content to potentially truncate - * `truncate_after` – Maximum length before truncation. If None, no truncation occurs - * `truncate_notice` – Notice to insert in the middle when content is truncated - * `save_dir` – Working directory to save full content file in - * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) -* Returns: - Original content if under limit, or truncated content with head and tail - preserved and reference to saved file if applicable + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text -### sanitize_openhands_mentions() + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" + # take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) -Sanitize @OpenHands mentions in text to prevent self-mention loops. + return GrepObservation( + matches=matches, + files=sorted(files), + count=len(matches), + ) + ``` + + + ### Finally, define the tool + ```python icon="python" wrap + class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """Custom grep tool that searches file contents using regular expressions.""" -This function inserts a zero-width joiner (ZWJ) after the @ symbol in -@OpenHands mentions, making them non-clickable in GitHub comments while -preserving readability. The original case of the mention is preserved. + @classmethod + def create( + cls, + conv_state, + terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. -* Parameters: - `text` – The text to sanitize -* Returns: - Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) + Args: + conv_state: Conversation state to get + working directory from. + terminal_executor: Optional terminal executor to reuse. + If not provided, a new one will be created. -### Examples + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) -```pycon ->>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") -'Thanks @u200dOpenHands for the help!' ->>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") -'Check @u200dopenhands and @u200dOPENHANDS' ->>> sanitize_openhands_mentions("No mention here") -'No mention here' -``` + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + ``` + + -### sanitized_env() +## Good to know +### Tool Registration +Tools are registered using `register_tool()` and referenced by name: -Return a copy of env with sanitized values. +```python icon="python" wrap +# Register a simple tool class +register_tool("FileEditorTool", FileEditorTool) -PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored -libraries win. This function restores the original value so that subprocess -will not use them. +# Register a factory function that creates multiple tools +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) -### warn_deprecated() +# Use registered tools by name +tools = [ + Tool(name="FileEditorTool"), + Tool(name="BashAndGrepToolSet"), +] +``` -Emit a deprecation warning for dynamic access to a legacy feature. +### Factory Functions +Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: -Prefer this helper when a decorator is not practical—e.g. attribute accessors, -data migrations, or other runtime paths that must conditionally warn. Provide -explicit version metadata so the SDK reports consistent messages and upgrades -to `deprecation.UnsupportedWarning` after the removal threshold. +```python icon="python" wrap +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create execute_bash and custom grep tools sharing one executor.""" + bash_executor = BashExecutor( + working_dir=conv_state.workspace.working_dir + ) + # Create and configure tools... + return [bash_tool, grep_tool] +``` -### openhands.sdk.workspace -Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md +### Shared Executors +Multiple tools can share executors for efficiency and state consistency: -### class BaseWorkspace +```python icon="python" wrap +bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) +bash_tool = execute_bash_tool.set_executor(executor=bash_executor) -Bases: `DiscriminatedUnionMixin`, `ABC` +grep_executor = GrepExecutor(bash_executor) +grep_tool = ToolDefinition( + name="grep", + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, +) +``` -Abstract base class for workspace implementations. +## When to Create Custom Tools -Workspaces provide a sandboxed environment where agents can execute commands, -read/write files, and perform other operations. All workspace implementations -support the context manager protocol for safe resource management. +Create custom tools when you need to: +- Combine multiple operations into a single, structured interface +- Add typed parameters with validation +- Format complex outputs for LLM consumption +- Integrate with external APIs or services -#### Example +## Ready-to-run Example -```pycon ->>> with workspace: -... result = workspace.execute_command("echo 'hello'") -... content = workspace.read_file("example.txt") -``` + +This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) + +```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py +"""Advanced example showing explicit executor usage and custom grep tool.""" -#### Properties +import os +import shlex +from collections.abc import Sequence -- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] +from pydantic import Field, SecretStr -#### Methods +from openhands.sdk import ( + LLM, + Action, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Observation, + TextContent, + ToolDefinition, + get_logger, +) +from openhands.sdk.tool import ( + Tool, + ToolExecutor, + register_tool, +) +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import ( + TerminalAction, + TerminalExecutor, + TerminalTool, +) -#### abstractmethod execute_command() -Execute a bash command on the system. +logger = get_logger(__name__) -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory for the command (optional) - * `timeout` – Timeout in seconds (defaults to 30.0) -* Returns: - Result containing stdout, stderr, exit_code, and other - : metadata -* Return type: - [CommandResult](#class-commandresult) -* Raises: - `Exception` – If command execution fails +# --- Action / Observation --- -#### abstractmethod file_download() -Download a file from the system. +class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, description="Optional glob to filter files (e.g. '*.py')" + ) -* Parameters: - * `source_path` – Path to the source file on the system - * `destination_path` – Path where the file should be downloaded -* Returns: - Result containing success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) -* Raises: - `Exception` – If file download fails -#### abstractmethod file_upload() +class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 -Upload a file to the system. + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be uploaded -* Returns: - Result containing success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) -* Raises: - `Exception` – If file upload fails -#### abstractmethod git_changes() +# --- Executor --- -Get the git changes for the repository at the path given. -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal -#### abstractmethod git_diff() + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) -Get the git diff for the file at the path given. + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" + else: + cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed + result = self.terminal(TerminalAction(command=cmd)) -#### model_config = (configuration object) + matches: list[str] = [] + files: set[str] = set() -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text -#### pause() + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" — take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) -Pause the workspace to conserve resources. + return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) -For local workspaces, this is a no-op. -For container-based workspaces, this pauses the container. -* Raises: - `NotImplementedError` – If the workspace type does not support pausing. +# Tool description +_GREP_DESCRIPTION = """Fast content search tool. +* Searches file contents using regular expressions +* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) +* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") +* Returns matching file paths sorted by modification time. +* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. +* Use this tool when you need to find files containing specific patterns +* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead +""" # noqa: E501 -#### resume() -Resume a paused workspace. +# --- Tool Definition --- -For local workspaces, this is a no-op. -For container-based workspaces, this resumes the container. -* Raises: - `NotImplementedError` – If the workspace type does not support resuming. +class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """A custom grep tool that searches file contents using regular expressions.""" -### class CommandResult + @classmethod + def create( + cls, conv_state, terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. -Bases: `BaseModel` + Args: + conv_state: Conversation state to get working directory from. + terminal_executor: Optional terminal executor to reuse. If not provided, + a new one will be created. -Result of executing a command in the workspace. + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] -#### Properties -- `command`: str -- `exit_code`: int -- `stderr`: str -- `stdout`: str -- `timeout_occurred`: bool +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -#### Methods +# Tools - demonstrating both simplified and advanced patterns +cwd = os.getcwd() -#### model_config = (configuration object) -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create terminal and custom grep tools sharing one executor.""" -### class FileOperationResult + terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) + # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) + terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] -Bases: `BaseModel` + # Use the GrepTool.create() method with shared terminal_executor + grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] -Result of a file upload or download operation. + return [terminal_tool, grep_tool] -#### Properties +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) -- `destination_path`: str -- `error`: str | None -- `file_size`: int | None -- `source_path`: str -- `success`: bool +tools = [ + Tool(name=FileEditorTool.name), + Tool(name="BashAndGrepToolSet"), +] -#### Methods +# Agent +agent = Agent(llm=llm, tools=tools) -#### model_config = (configuration object) +llm_messages = [] # collect raw LLM messages -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. -### class LocalWorkspace +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -Bases: [`BaseWorkspace`](#class-baseworkspace) -Local workspace implementation that operates on the host filesystem. +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -LocalWorkspace provides direct access to the local filesystem and command execution -environment. It’s suitable for development and testing scenarios where the agent -should operate directly on the host system. +conversation.send_message( + "Hello! Can you use the grep tool to find all files " + "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 + "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 +) +conversation.run() -#### Example +conversation.send_message("Great! Now delete that file.") +conversation.run() -```pycon ->>> workspace = LocalWorkspace(working_dir="/path/to/project") ->>> with workspace: -... result = workspace.execute_command("ls -la") -... content = workspace.read_file("README.md") +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -#### Methods + -#### __init__() +## Next Steps -Create a new model by parsing and validating input data from keyword arguments. +- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers +- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation -Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be -validated to form a valid model. +### Assign Reviews +Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md -self is explicitly positional-only to allow self as a field name. +> The reference workflow is available [here](#reference-workflow)! -#### execute_command() +Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. -Execute a bash command locally. +## How it works -Uses the shared shell execution utility to run commands with proper -timeout handling, output streaming, and error management. - -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory (optional) - * `timeout` – Timeout in seconds -* Returns: - Result with stdout, stderr, exit_code, command, and - : timeout_occurred -* Return type: - [CommandResult](#class-commandresult) +It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. -#### file_download() +**Core Components:** +- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts +- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent -Download (copy) a file locally. +**Prompt Options:** +1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) +2. **`PROMPT_LOCATION`** - URL or file path for external prompts -For local systems, file download is implemented as a file copy operation -using shutil.copy2 to preserve metadata. +The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be copied -* Returns: - Result with success status and file information -* Return type: - [FileOperationResult](#class-fileoperationresult) +## Assign Reviews Use Case -#### file_upload() +This specific implementation uses the basic action template to handle three PR management scenarios: -Upload (copy) a file locally. +**1. Need Reviewer Action** +- Identifies PRs waiting for review +- Notifies reviewers to take action -For local systems, file upload is implemented as a file copy operation -using shutil.copy2 to preserve metadata. +**2. Need Author Action** +- Finds stale PRs with no activity for 5+ days +- Prompts authors to update, request review, or close -* Parameters: - * `source_path` – Path to the source file - * `destination_path` – Path where the file should be copied -* Returns: - Result with success status and file information -* Return type: - [FileOperationResult](#class-fileoperationresult) +**3. Need Reviewers** +- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) +- Uses git blame analysis to identify relevant contributors +- Automatically assigns reviewers based on file ownership and contribution history +- Balances reviewer workload across team members -#### git_changes() +## Quick Start -Get the git changes for the repository at the path given. + + + ```bash icon="terminal" + cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml + ``` + + + Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". + + + The default is: Daily at 12 PM UTC. + + -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +## Features -#### git_diff() +- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership +- **Automated Notifications** - Sends contextual reminders to reviewers and authors +- **Workload Balancing** - Distributes review requests evenly across team members +- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch -Get the git diff for the file at the path given. +## Reference Workflow -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed + +This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) + -#### model_config = (configuration object) +```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml +--- +# To set this up: +# 1. Change the name below to something relevant to your task +# 2. Modify the "env" section below with your prompt +# 3. Add your LLM_API_KEY to the repository secrets +# 4. Commit this file to your repository +# 5. Trigger the workflow manually or set up a schedule +name: Assign Reviews -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +on: + # Manual trigger + workflow_dispatch: + # Scheduled trigger (disabled by default, uncomment and customize as needed) + schedule: + # Run at 12 PM UTC every day + - cron: 0 12 * * * -#### pause() +permissions: + contents: write + pull-requests: write + issues: write -Pause the workspace (no-op for local workspaces). +jobs: + run-task: + runs-on: ubuntu-24.04 + env: + # Configuration (modify these values as needed) + AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py + # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both + # Option 1: Use a URL or file path for the prompt + PROMPT_LOCATION: '' + # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + # Option 2: Use direct text for the prompt + PROMPT_STRING: > + Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. + Read the sections below in order, and perform each in order. Do NOT take action + on the same issue or PR twice. -Local workspaces have nothing to pause since they operate directly -on the host filesystem. + # Issues with needs-info - Check for OP Response -#### resume() + Find all open issues that have the "needs-info" label. For each issue: + 1. Identify the original poster (issue author) + 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added + 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline + and look for "labeled" events with the label "needs-info" + 4. If the original poster has commented after the label was added: + - Remove the "needs-info" label + - Add the "needs-triage" label + - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." -Resume the workspace (no-op for local workspaces). + # Issues with needs-triage -Local workspaces have nothing to resume since they operate directly -on the host filesystem. + Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last + activity: + 1. First, check if the issue has already been triaged by verifying it does NOT have: + - The "enhancement" label + - Any "priority" label (priority:low, priority:medium, priority:high, etc.) + 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label + 3. For issues that have NOT been triaged yet: + - Read the issue description and comments + - Determine if it requires maintainer attention by checking: + * Is it a bug report, feature request, or question? + * Does it have enough information to be actionable? + * Has a maintainer already commented? + * Is the last comment older than 4 days? + - If it needs maintainer attention and no maintainer has commented: + * Find an appropriate maintainer based on the issue topic and recent activity + * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have + a chance?" -### class RemoteWorkspace + # Need Reviewer Action -Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) + Find all open PRs where: + 1. The PR is waiting for review (there are no open review comments or change requests) + 2. The PR is in a "clean" state (CI passing, no merge conflicts) + 3. The PR is not marked as draft (draft: false) + 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. -Remote workspace implementation that connects to an OpenHands agent server. + In this case, send a message to the reviewers: + [Automatic Post]: This PR seems to be currently waiting for review. + {reviewer_names}, could you please take a look when you have a chance? -RemoteWorkspace provides access to a sandboxed environment running on a remote -OpenHands agent server. This is the recommended approach for production deployments -as it provides better isolation and security. + # Need Author Action -#### Example + Find all open PRs where the most recent change or comment was made on the pull + request more than 5 days ago (use 14 days if the PR is marked as draft). -```pycon ->>> workspace = RemoteWorkspace( -... host="https://agent-server.example.com", -... working_dir="/workspace" -... ) ->>> with workspace: -... result = workspace.execute_command("ls -la") -... content = workspace.read_file("README.md") -``` + And send a message to the author: + [Automatic Post]: It has been a while since there was any activity on this PR. + {author}, are you still working on it? If so, please go ahead, if not then + please request review, close it, or request that someone else follow up. -#### Properties + # Need Reviewers -- `alive`: bool - Check if the remote workspace is alive by querying the health endpoint. - * Returns: - True if the health endpoint returns a successful response, False otherwise. -- `client`: Client + Find all open pull requests that: + 1. Have no reviewers assigned to them. + 2. Are not marked as draft. + 3. Were created more than 1 day ago. + 4. CI is passing and there are no merge conflicts. -#### Methods + For each of these pull requests, read the git blame information for the files, + and find the most recent and active contributors to the file/location of the changes. + Assign one of these people as a reviewer, but try not to assign too many reviews to + any single person. Add this message: -#### execute_command() + [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. + Thanks in advance for the help! -Execute a bash command on the remote system. + LLM_MODEL: + LLM_BASE_URL: + steps: + - name: Checkout repository + uses: actions/checkout@v5 -This method starts a bash command via the remote agent server API, -then polls for the output until the command completes. + - name: Set up Python + uses: actions/setup-python@v6 + with: + python-version: '3.13' -* Parameters: - * `command` – The bash command to execute - * `cwd` – Working directory (optional) - * `timeout` – Timeout in seconds -* Returns: - Result with stdout, stderr, exit_code, and other metadata -* Return type: - [CommandResult](#class-commandresult) + - name: Install uv + uses: astral-sh/setup-uv@v7 + with: + enable-cache: true -#### file_download() + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" -Download a file from the remote system. + - name: Check required configuration + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + run: | + if [ -z "$LLM_API_KEY" ]; then + echo "Error: LLM_API_KEY secret is not set." + exit 1 + fi -Requests the file from the remote system via HTTP API and saves it locally. + # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set + if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then + echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." + echo "Please provide only one in the env section of the workflow file." + exit 1 + fi -* Parameters: - * `source_path` – Path to the source file on remote system - * `destination_path` – Path where the file should be saved locally -* Returns: - Result with success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) + if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then + echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." + echo "Please set one in the env section of the workflow file." + exit 1 + fi -#### file_upload() + if [ -n "$PROMPT_LOCATION" ]; then + echo "Prompt location: $PROMPT_LOCATION" + else + echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" + fi + echo "LLM model: $LLM_MODEL" + if [ -n "$LLM_BASE_URL" ]; then + echo "LLM base URL: $LLM_BASE_URL" + fi -Upload a file to the remote system. + - name: Run task + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + PYTHONPATH: '' + run: | + echo "Running agent script: $AGENT_SCRIPT_URL" -Reads the local file and sends it to the remote system via HTTP API. + # Download script if it's a URL + if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then + echo "Downloading agent script from URL..." + curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py + AGENT_SCRIPT_PATH="/tmp/agent_script.py" + else + AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" + fi -* Parameters: - * `source_path` – Path to the local source file - * `destination_path` – Path where the file should be uploaded on remote system -* Returns: - Result with success status and metadata -* Return type: - [FileOperationResult](#class-fileoperationresult) + # Run with appropriate prompt argument + if [ -n "$PROMPT_LOCATION" ]; then + echo "Using prompt from: $PROMPT_LOCATION" + uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" + else + echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" + uv run python "$AGENT_SCRIPT_PATH" + fi -#### git_changes() + - name: Upload logs as artifact + uses: actions/upload-artifact@v4 + if: always() + with: + name: openhands-task-logs + path: | + *.log + output/ + retention-days: 7 +``` -Get the git changes for the repository at the path given. +## Related Files -* Parameters: - `path` – Path to the git repository -* Returns: - List of changes -* Return type: - list[GitChange] -* Raises: - `Exception` – If path is not a git repository or getting changes failed +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) -#### git_diff() +### PR Review +Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md -Get the git diff for the file at the path given. +> The reference workflow is available [here](#reference-workflow)! -* Parameters: - `path` – Path to the file -* Returns: - Git diff -* Return type: - GitDiff -* Raises: - `Exception` – If path is not a git repository or getting diff failed +Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: +- Requesting `openhands-agent` as a reviewer +- Adding the `review-this` label to the PR -#### model_config = (configuration object) + +The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. + -Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +## Quick Start -#### model_post_init() +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml -Override this method to perform additional initialization after __init__ and model_construct. -This is useful if you want to do some validation that requires the entire model to be initialized. +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY -#### reset_client() +# 3. (Optional) Create a "review-this" label in your repository +# Go to Issues → Labels → New label +# You can also trigger reviews by requesting "openhands-agent" as a reviewer +``` -Reset the HTTP client to force re-initialization. +## Features -This is useful when connection parameters (host, api_key) have changed -and the client needs to be recreated with new values. +- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR +- **Customizable** - Add your own code review guidelines without forking -### class Workspace +## Security -### class Workspace +- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. +- Maintainers need to read the PR to make sure it's safe to run. -Bases: `object` +## Customizing the Code Review -Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. +Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. -Usage: -: - Workspace(working_dir=…) -> LocalWorkspace - - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace +### How It Works -### Agent -Source: https://docs.openhands.dev/sdk/arch/agent.md +The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. -The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. + +**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + -**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) +### Example: Custom Code Review Skill -## Core Responsibilities +Create `.agents/skills/custom-codereview-guide.md` in your repository: -The Agent system has four primary responsibilities: +```markdown +--- +name: custom-codereview-guide +description: Project-specific review guidelines for MyProject +triggers: +- /codereview +--- -1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history -2. **Tool Orchestration** - Select and execute tools, handle results and errors -3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) -4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) +# MyProject-Specific Review Guidelines -## Architecture +In addition to general code review practices, check for: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% -flowchart TB - subgraph Input[" "] - Events["Event History"] - Context["Agent Context
Skills + Prompts"] - end - - subgraph Core["Agent Core"] - Condense["Condenser
History compression"] - Reason["LLM Query
Generate actions"] - Security["Security Analyzer
Risk assessment"] - end - - subgraph Execution[" "] - Tools["Tool Executor
Action → Observation"] - Results["Observation Events"] - end - - Events --> Condense - Context -.->|Skills| Reason - Condense --> Reason - Reason --> Security - Security --> Tools - Tools --> Results - Results -.->|Feedback| Events - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Reason primary - class Condense,Security secondary - class Tools tertiary +## Project Conventions + +- All API endpoints must have OpenAPI documentation +- Database migrations must be reversible +- Feature flags required for new features + +## Architecture Rules + +- No direct database access from controllers +- All external API calls must go through the gateway service + +## Communication Style + +- Be direct and constructive +- Use GitHub suggestion syntax for code fixes ``` -### Key Components + +**Note**: These rules supplement the default `code-review` skill, not replace it. + -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | -| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | -| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | -| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | -| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | + +**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. -## Reasoning-Action Loop +If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. + -The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: + +**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. + -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% -flowchart TB - Start["step() called"] - Pending{"Pending
actions?"} - ExecutePending["Execute pending actions"] - - HasCondenser{"Has
condenser?"} - Condense["Call condenser.condense()"] - CondenseResult{"Result
type?"} - EmitCondensation["Emit Condensation event"] - UseView["Use View events"] - UseRaw["Use raw events"] - - Query["Query LLM with messages"] - ContextExceeded{"Context
window
exceeded?"} - EmitRequest["Emit CondensationRequest"] - - Parse{"Response
type?"} - CreateActions["Create ActionEvents"] - CreateMessage["Create MessageEvent"] - - Confirmation{"Need
confirmation?"} - SetWaiting["Set WAITING_FOR_CONFIRMATION"] - - Execute["Execute actions"] - Observe["Create ObservationEvents"] - - Return["Return"] - - Start --> Pending - Pending -->|Yes| ExecutePending --> Return - Pending -->|No| HasCondenser - - HasCondenser -->|Yes| Condense - HasCondenser -->|No| UseRaw - Condense --> CondenseResult - CondenseResult -->|Condensation| EmitCondensation --> Return - CondenseResult -->|View| UseView --> Query - UseRaw --> Query - - Query --> ContextExceeded - ContextExceeded -->|Yes| EmitRequest --> Return - ContextExceeded -->|No| Parse - - Parse -->|Tool calls| CreateActions - Parse -->|Message| CreateMessage --> Return - - CreateActions --> Confirmation - Confirmation -->|Yes| SetWaiting --> Return - Confirmation -->|No| Execute - - Execute --> Observe - Observe --> Return - - style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Step Execution Flow:** - -1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return -2. **Condensation:** If condenser exists: - - Call `condenser.condense()` with current event view - - If returns `View`: use condensed events for LLM query (continue in same step) - - If returns `Condensation`: emit event and return (will be processed next step) -3. **LLM Query:** Query LLM with messages from event history - - If context window exceeded: emit `CondensationRequest` and return -4. **Response Parsing:** Parse LLM response into events - - Tool calls → create `ActionEvent`(s) - - Text message → create `MessageEvent` and return -5. **Confirmation Check:** If actions need user approval: - - Set conversation status to `WAITING_FOR_CONFIRMATION` and return -6. **Action Execution:** Execute tools and create `ObservationEvent`(s) - -**Key Characteristics:** -- **Stateless:** Agent holds no mutable state between steps -- **Event-Driven:** Reads from event history, writes new events -- **Interruptible:** Each step is atomic and can be paused/resumed +### Benefits of Custom Skills -## Agent Context +1. **No forking required**: Keep using the official SDK while customizing behavior +2. **Version controlled**: Your review guidelines live in your repository +3. **Easy updates**: SDK updates don't overwrite your customizations +4. **Team alignment**: Everyone uses the same review standards +5. **Composable**: Add project-specific rules alongside default guidelines -The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: + +See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. + -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Context["AgentContext"] - - subgraph Skills["Skills"] - Repo["repo
Always active"] - Knowledge["knowledge
Trigger-based"] - end - SystemAug["System prompt prefix/suffix
Per-conversation"] - System["Prompt template
Per-conversation"] - - subgraph Application["Applied to LLM"] - SysPrompt["System Prompt"] - UserMsg["User Messages"] - end - - Context --> Skills - Context --> SystemAug - Repo --> SysPrompt - Knowledge -.->|When triggered| UserMsg - System --> SysPrompt - SystemAug --> SysPrompt - - style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +## Reference Workflow -| Skill Type | Activation | Use Case | -|------------|------------|----------| -| **repo** | Always included | Project-specific context, conventions | -| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | + +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) + -Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. +```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml +--- +# OpenHands PR Review Workflow +# +# To set this up: +# 1. Copy this file to .github/workflows/pr-review.yml in your repository +# 2. Add LLM_API_KEY to repository secrets +# 3. Customize the inputs below as needed +# 4. Commit this file to your repository +# 5. Trigger the review by either: +# - Adding the "review-this" label to any PR, OR +# - Requesting openhands-agent as a reviewer +# +# For more information, see: +# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review +name: PR Review by OpenHands +on: + # Trigger when a label is added or a reviewer is requested + pull_request: + types: [labeled, review_requested] -## Tool Execution +permissions: + contents: read + pull-requests: write + issues: write -Tools follow a **strict action-observation pattern**: +jobs: + pr-review: + # Run when review-this label is added OR openhands-agent is requested as reviewer + if: | + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Checkout for composite action + uses: actions/checkout@v4 + with: + repository: OpenHands/software-agent-sdk + # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') + ref: main + sparse-checkout: .github/actions/pr-review -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - LLM["LLM generates tool_call"] - Convert["Convert to ActionEvent"] - - Decision{"Confirmation
mode?"} - Defer["Store as pending"] - - Execute["Execute tool"] - Success{"Success?"} - - Obs["ObservationEvent
with result"] - Error["ObservationEvent
with error"] - - LLM --> Convert - Convert --> Decision - - Decision -->|Yes| Defer - Decision -->|No| Execute - - Execute --> Success - Success -->|Yes| Obs - Success -->|No| Error - - style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + - name: Run PR Review + uses: ./.github/actions/pr-review + with: + # LLM configuration + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-base-url: '' + # Review style: roasted (other option: standard) + review-style: roasted + # SDK version to use (version tag or branch name) + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} ``` -**Execution Modes:** - -| Mode | Behavior | Use Case | -|------|----------|----------| -| **Direct** | Execute immediately | Development, trusted environments | -| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | +### Action Inputs -**Security Integration:** +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | -Before execution, the security analyzer evaluates each action: -- **Low Risk:** Execute immediately -- **Medium Risk:** Log warning, execute with monitoring -- **High Risk:** Block execution, request user confirmation +## Related Files -## Component Relationships +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) -### How Agent Interacts +### TODO Management +Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Agent["Agent"] - Conv["Conversation"] - LLM["LLM"] - Tools["Tools"] - Context["AgentContext"] - - Conv -->|.step calls| Agent - Agent -->|Reads events| Conv - Agent -->|Query| LLM - Agent -->|Execute| Tools - Context -.->|Skills and Context| Agent - Agent -.->|New events| Conv - - style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +> The reference workflow is available [here](#reference-workflow)! -**Relationship Characteristics:** -- **Conversation → Agent**: Orchestrates step execution, provides event history -- **Agent → LLM**: Queries for next actions, receives tool calls or messages -- **Agent → Tools**: Executes actions, receives observations -- **AgentContext → Agent**: Injects skills and prompts into LLM queries +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership -## See Also +## Quick Start -- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle -- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns -- **[Events](/sdk/arch/events)** - Event types and structures -- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns -- **[LLM](/sdk/arch/llm)** - Language model abstraction - -### Agent Server Package -Source: https://docs.openhands.dev/sdk/arch/agent-server.md - -The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. + + + ```bash icon="terminal" + cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + ``` + + + Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `Settings → Actions → General → Workflow permissions` and enable: + - `Read and write permissions` + - `Allow GitHub Actions to create and approve pull requests` + + + Trigger the agent by adding TODO comments into your code. -**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) + Example: `# TODO(openhands): Add input validation for user email` -## Purpose + + The workflow is configurable and any identifier can be used in place of `TODO(openhands)` + + + -The Agent Server enables: -- **Remote execution**: Clients interact with agents via HTTP API -- **Multi-user isolation**: Each user gets isolated workspace -- **Container orchestration**: Manages Docker containers for workspaces -- **Centralized management**: Monitor and control all agents -- **Scalability**: Horizontal scaling with multiple servers -## Architecture Overview +## Features -```mermaid -graph TB - Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] - - API --> Auth[Authentication] - API --> Router[API Router] - - Router --> WS[Workspace Manager] - Router --> Conv[Conversation Handler] - - WS --> Docker[Docker Manager] - Docker --> C1[Container 1
User A] - Docker --> C2[Container 2
User B] - Docker --> C3[Container 3
User C] - - Conv --> Agent[Software Agent SDK] - Agent --> C1 - Agent --> C2 - Agent --> C3 - - style Client fill:#e1f5fe - style API fill:#fff3e0 - style WS fill:#e8f5e8 - style Docker fill:#f3e5f5 - style Agent fill:#fce4ec -``` +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers -### Key Components +## Best Practices -**1. FastAPI Server** -- HTTP REST API endpoints -- Authentication and authorization -- Request validation -- WebSocket support for streaming +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments +- **Review PRs** - Always review the generated PRs before merging -**2. Workspace Manager** -- Creates and manages Docker containers -- Isolates workspaces per user -- Handles container lifecycle -- Manages resource limits +## Reference Workflow -**3. Conversation Handler** -- Routes requests to appropriate workspace -- Manages conversation state -- Handles concurrent requests -- Supports streaming responses + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + -**4. Docker Manager** -- Interfaces with Docker daemon -- Builds and pulls images -- Creates and destroys containers -- Monitors container health +```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml +--- +# Automated TODO Management Workflow +# Make sure to replace and with +# appropriate values for your LLM setup. +# +# This workflow automatically scans for TODO(openhands) comments and creates +# pull requests to implement them using the OpenHands agent. +# +# Setup: +# 1. Add LLM_API_KEY to repository secrets +# 2. Ensure GITHUB_TOKEN has appropriate permissions +# 3. Make sure Github Actions are allowed to create and review PRs +# 4. Commit this file to .github/workflows/ in your repository +# 5. Configure the schedule or trigger manually -## Design Decisions +name: Automated TODO Management -### Why HTTP API? +on: + # Manual trigger + workflow_dispatch: + inputs: + max_todos: + description: Maximum number of TODOs to process in this run + required: false + default: '3' + type: string + todo_identifier: + description: TODO identifier to search for (e.g., TODO(openhands)) + required: false + default: TODO(openhands) + type: string -Alternative approaches considered: -- **gRPC**: More efficient but harder for web clients -- **WebSockets only**: Good for streaming but not RESTful -- **HTTP + WebSockets**: Best of both worlds + # Trigger when 'automatic-todo' label is added to a PR + pull_request: + types: [labeled] -**Decision**: HTTP REST for operations, WebSockets for streaming -- ✅ Works from any client (web, mobile, CLI) -- ✅ Easy to debug (curl, Postman) -- ✅ Standard authentication (API keys, OAuth) -- ✅ Streaming where needed + # Scheduled trigger (disabled by default, uncomment and customize as needed) + # schedule: + # # Run every Monday at 9 AM UTC + # - cron: "0 9 * * 1" -### Why Container Per User? +permissions: + contents: write + pull-requests: write + issues: write -Alternative approaches: -- **Shared container**: Multiple users in one container -- **Container per session**: New container each conversation -- **Container per user**: One container per user (chosen) +jobs: + scan-todos: + runs-on: ubuntu-latest + # Only run if triggered manually or if 'automatic-todo' label was added + if: > + github.event_name == 'workflow_dispatch' || + (github.event_name == 'pull_request' && + github.event.label.name == 'automatic-todo') + outputs: + todos: ${{ steps.scan.outputs.todos }} + todo-count: ${{ steps.scan.outputs.todo-count }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for better context -**Decision**: Container per user -- ✅ Strong isolation between users -- ✅ Persistent workspace across sessions -- ✅ Better resource management -- ⚠️ More containers, but worth it for isolation + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' -### Why FastAPI? + - name: Copy TODO scanner + run: | + cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py + chmod +x /tmp/scanner.py -Alternative frameworks: -- **Flask**: Simpler but less type-safe -- **Django**: Too heavyweight -- **FastAPI**: Modern, fast, type-safe (chosen) + - name: Scan for TODOs + id: scan + run: | + echo "Scanning for TODO comments..." -**Decision**: FastAPI -- ✅ Automatic API documentation (OpenAPI) -- ✅ Type validation with Pydantic -- ✅ Async support for performance -- ✅ WebSocket support built-in + # Run the scanner and capture output + TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" + python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json -## API Design + # Count TODOs + TODO_COUNT=$(python -c \ + "import json; data=json.load(open('todos.json')); print(len(data))") + echo "Found $TODO_COUNT $TODO_IDENTIFIER items" -### Key Endpoints + # Limit the number of TODOs to process + MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" + if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then + echo "Limiting to first $MAX_TODOS TODOs" + python -c " + import json + data = json.load(open('todos.json')) + limited = data[:$MAX_TODOS] + json.dump(limited, open('todos.json', 'w'), indent=2) + " + TODO_COUNT=$MAX_TODOS + fi -**Workspace Management** -``` -POST /workspaces Create new workspace -GET /workspaces/{id} Get workspace info -DELETE /workspaces/{id} Delete workspace -POST /workspaces/{id}/execute Execute command -``` + # Set outputs + echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT + echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT -**Conversation Management** -``` -POST /conversations Create conversation -GET /conversations/{id} Get conversation -POST /conversations/{id}/messages Send message -GET /conversations/{id}/stream Stream responses (WebSocket) -``` + # Display found TODOs + echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY + if [ "$TODO_COUNT" -eq 0 ]; then + echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY + else + echo "Found $TODO_COUNT TODO(openhands) items:" \ + >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + python -c " + import json + data = json.load(open('todos.json')) + for i, todo in enumerate(data, 1): + print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + + f'{todo[\"description\"]}') + " >> $GITHUB_STEP_SUMMARY + fi -**Health & Monitoring** -``` -GET /health Server health check -GET /metrics Prometheus metrics -``` - -### Authentication + process-todos: + needs: scan-todos + if: needs.scan-todos.outputs.todo-count > 0 + runs-on: ubuntu-latest + strategy: + matrix: + todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} + max-parallel: 1 # Process one TODO at a time to avoid conflicts + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + token: ${{ secrets.GITHUB_TOKEN }} -**API Key Authentication** -```bash -curl -H "Authorization: Bearer YOUR_API_KEY" \ - https://agent-server.example.com/conversations -``` + - name: Switch to feature branch with TODO management files + run: | + git checkout openhands/todo-management-example + git pull origin openhands/todo-management-example -**Per-user workspace isolation** -- API key → user ID mapping -- Each user gets separate workspace -- Users can't access each other's workspaces + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' -### Streaming Responses + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true -**WebSocket for real-time updates** -```python -async with websocket_connect(url) as ws: - # Send message - await ws.send_json({"message": "Hello"}) - - # Receive events - async for event in ws: - if event["type"] == "message": - print(event["content"]) -``` + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" -**Why streaming?** -- Real-time feedback to users -- Show agent thinking process -- Better UX for long-running tasks + - name: Copy agent files + run: | + cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py + cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py + chmod +x agent.py -## Deployment Models + - name: Configure Git + run: | + git config --global user.name "openhands-bot" + git config --global user.email \ + "openhands-bot@users.noreply.github.com" -### 1. Local Development + - name: Process TODO + env: + LLM_MODEL: + LLM_BASE_URL: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_REPOSITORY: ${{ github.repository }} + TODO_FILE: ${{ matrix.todo.file }} + TODO_LINE: ${{ matrix.todo.line }} + TODO_DESCRIPTION: ${{ matrix.todo.description }} + PYTHONPATH: '' + run: | + echo "Processing TODO: $TODO_DESCRIPTION" + echo "File: $TODO_FILE:$TODO_LINE" -Run server locally for testing: -```bash -# Start server -openhands-agent-server --port 8000 + # Create a unique branch name for this TODO + BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ + sed 's/[^a-zA-Z0-9]/-/g' | \ + sed 's/--*/-/g' | \ + sed 's/^-\|-$//g' | \ + tr '[:upper:]' '[:lower:]' | \ + cut -c1-50)" + echo "Branch name: $BRANCH_NAME" -# Or with Docker -docker run -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest -``` + # Create and switch to new branch (force create if exists) + git checkout -B "$BRANCH_NAME" -**Use case**: Development and testing + # Run the agent to process the TODO + # Stay in repository directory for git operations -### 2. Single-Server Deployment + # Create JSON payload for the agent + TODO_JSON=$(cat <&1 | tee agent_output.log + AGENT_EXIT_CODE=$? + set -e -### 3. Multi-Server Deployment + echo "Agent exit code: $AGENT_EXIT_CODE" + echo "Agent output log:" + cat agent_output.log -Scale horizontally with load balancer: -``` - Load Balancer - | - +-------------+-------------+ - | | | - Server 1 Server 2 Server 3 - (Agents) (Agents) (Agents) - | | | - +-------------+-------------+ - | - Shared State Store - (Database, Redis, etc.) -``` + # Show files in working directory + echo "Files in working directory:" + ls -la -**Use case**: Production SaaS, high traffic, need redundancy + # If agent failed, show more details + if [ $AGENT_EXIT_CODE -ne 0 ]; then + echo "Agent failed with exit code $AGENT_EXIT_CODE" + echo "Last 50 lines of agent output:" + tail -50 agent_output.log + exit $AGENT_EXIT_CODE + fi -### 4. Kubernetes Deployment + # Check if any changes were made + cd "$GITHUB_WORKSPACE" + if git diff --quiet; then + echo "No changes made by agent, skipping PR creation" + exit 0 + fi -Container orchestration with Kubernetes: -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: agent-server -spec: - replicas: 3 - template: - spec: - containers: - - name: agent-server - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - containerPort: 8000 -``` + # Commit changes + git add -A + git commit -m "Implement TODO: $TODO_DESCRIPTION -**Use case**: Enterprise deployments, auto-scaling, high availability + Automatically implemented by OpenHands agent. -## Resource Management + Co-authored-by: openhands " -### Container Limits + # Push branch + git push origin "$BRANCH_NAME" -Set per-workspace resource limits: -```python -# In server configuration -WORKSPACE_CONFIG = { - "resource_limits": { - "memory": "2g", # 2GB RAM - "cpus": "2", # 2 CPU cores - "disk": "10g" # 10GB disk - }, - "timeout": 300, # 5 min timeout -} -``` + # Create pull request + PR_TITLE="Implement TODO: $TODO_DESCRIPTION" + PR_BODY="## 🤖 Automated TODO Implementation -**Why limit resources?** -- Prevent one user from consuming all resources -- Fair usage across users -- Protect server from runaway processes -- Cost control + This PR automatically implements the following TODO: -### Cleanup & Garbage Collection + **File:** \`$TODO_FILE:$TODO_LINE\` + **Description:** $TODO_DESCRIPTION -**Container lifecycle**: -- Containers created on first use -- Kept alive between requests (warm) -- Cleaned up after inactivity timeout -- Force cleanup on server shutdown + ### Implementation + The OpenHands agent has analyzed the TODO and implemented the + requested functionality. -**Storage management**: -- Old workspaces deleted automatically -- Disk usage monitored -- Alerts when approaching limits + ### Review Notes + - Please review the implementation for correctness + - Test the changes in your development environment + - The original TODO comment will be updated with this PR URL + once merged -## Security Considerations + --- + *This PR was created automatically by the TODO Management workflow.*" -### Multi-Tenant Isolation + # Create PR using GitHub CLI or API + curl -X POST \ + -H "Authorization: token $GITHUB_TOKEN" \ + -H "Accept: application/vnd.github.v3+json" \ + "https://api.github.com/repos/${{ github.repository }}/pulls" \ + -d "{ + \"title\": \"$PR_TITLE\", + \"body\": \"$PR_BODY\", + \"head\": \"$BRANCH_NAME\", + \"base\": \"${{ github.ref_name }}\" + }" -**Container isolation**: -- Each user gets separate container -- Containers can't communicate -- Network isolation (optional) -- File system isolation + summary: + needs: [scan-todos, process-todos] + if: always() + runs-on: ubuntu-latest + steps: + - name: Generate Summary + run: | + echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY -**API isolation**: -- API keys mapped to users -- Users can only access their workspaces -- Server validates all permissions + TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" + echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY -### Input Validation + if [ "$TODO_COUNT" -gt 0 ]; then + echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "Check the pull requests created for each TODO" \ + "implementation." >> $GITHUB_STEP_SUMMARY + else + echo "**Status:** ℹ️ No TODOs found to process" \ + >> $GITHUB_STEP_SUMMARY + fi -**Server validates**: -- API request schemas -- Command injection attempts -- Path traversal attempts -- File size limits + echo "" >> $GITHUB_STEP_SUMMARY + echo "---" >> $GITHUB_STEP_SUMMARY + echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY +``` -**Defense in depth**: -- API validation -- Container validation -- Docker security features -- OS-level security +## Related Documentation -### Network Security +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) -**Best practices**: -- HTTPS only (TLS certificates) -- Firewall rules (only port 443/8000) -- Rate limiting -- DDoS protection +### Hello World +Source: https://docs.openhands.dev/sdk/guides/hello-world.md -**Container networking**: -```python -# Disable network for workspace -WORKSPACE_CONFIG = { - "network_mode": "none" # No network access -} +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -# Or allow specific hosts -WORKSPACE_CONFIG = { - "allowed_hosts": ["api.example.com"] -} -``` +> A ready-to-run example is available [here](#ready-to-run-example)! -## Monitoring & Observability +## Your First Agent -### Health Checks +This is the most basic example showing how to set up and run an OpenHands agent. -```bash -# Simple health check -curl https://agent-server.example.com/health + + + ### LLM Configuration -# Response -{ - "status": "healthy", - "docker": "connected", - "workspaces": 15, - "uptime": 86400 -} -``` + Configure the language model that will power your agent: + ```python icon="python" + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, # Optional + service_id="agent" + ) + ``` + + + ### Select an Agent + Use the preset agent with common built-in tools: + ```python icon="python" + agent = get_default_agent(llm=llm, cli_mode=True) + ``` + The default agent includes `BashTool`, `FileEditorTool`, etc. + + For the complete list of available tools see the + [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). + -### Metrics + + + ### Start a Conversation + Start a conversation to manage the agent's lifecycle: + ```python icon="python" + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message( + "Write 3 facts about the current project into FACTS.txt." + ) + conversation.run() + ``` + + + ### Expected Behavior + When you run this example: + 1. The agent analyzes the current directory + 2. Gathers information about the project + 3. Creates `FACTS.txt` with 3 relevant facts + 4. Completes and exits -**Prometheus metrics**: -- Request count and latency -- Active workspaces -- Container resource usage -- Error rates + Example output file: -**Logging**: -- Structured JSON logs -- Per-request tracing -- Workspace events -- Error tracking + ```text icon="text" wrap + FACTS.txt + --------- + 1. This is a Python project using the OpenHands Software Agent SDK. + 2. The project includes examples demonstrating various agent capabilities. + 3. The SDK provides tools for file manipulation, bash execution, and more. + ``` + + -### Alerting +## Ready-to-run Example -**Alert on**: -- Server down -- High error rate -- Resource exhaustion -- Container failures + +This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) + -## Client SDK +```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py +import os -Python SDK for interacting with Agent Server: +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -```python -from openhands.client import AgentServerClient -client = AgentServerClient( - url="https://agent-server.example.com", - api_key="your-api-key" +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), ) -# Create conversation -conversation = client.create_conversation() - -# Send message -response = client.send_message( - conversation_id=conversation.id, - message="Hello, agent!" +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], ) -# Stream responses -for event in client.stream_conversation(conversation.id): - if event.type == "message": - print(event.content) +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") ``` -**Client handles**: -- Authentication -- Request/response serialization -- Error handling -- Streaming -- Retries + -## Cost Considerations +## Next Steps -### Server Costs +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs +- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage -**Compute**: CPU and memory for containers -- Each active workspace = 1 container -- Typically 1-2 GB RAM per workspace -- 0.5-1 CPU core per workspace +### Hooks +Source: https://docs.openhands.dev/sdk/guides/hooks.md -**Storage**: Workspace files and conversation state -- ~1-10 GB per workspace (depends on usage) -- Conversation history in database +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -**Network**: API requests and responses -- Minimal (mostly text) -- Streaming adds bandwidth +> A ready-to-run example is available [here](#ready-to-run-example)! -### Cost Optimization +## Overview -**1. Idle timeout**: Shutdown containers after inactivity -```python -WORKSPACE_CONFIG = { - "idle_timeout": 3600 # 1 hour -} -``` +Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: +- Logging and analytics +- Emitting custom metrics +- Auditing or compliance +- Tracing and debugging -**2. Resource limits**: Don't over-provision -```python -WORKSPACE_CONFIG = { - "resource_limits": { - "memory": "1g", # Smaller limit - "cpus": "0.5" # Fractional CPU - } -} -``` +## Hook Types -**3. Shared resources**: Use single server for multiple low-traffic apps +| Hook | When it runs | Can block? | +|------|--------------|------------| +| PreToolUse | Before tool execution | Yes (exit 2) | +| PostToolUse | After tool execution | No | +| UserPromptSubmit | Before processing user message | Yes (exit 2) | +| Stop | When agent tries to finish | Yes (exit 2) | +| SessionStart | When conversation starts | No | +| SessionEnd | When conversation ends | No | -**4. Auto-scaling**: Scale servers based on demand +## Key Concepts -## When to Use Agent Server +- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution +- Isolation: hooks run outside the agent loop logic, avoiding core modifications +- Composition: enable or disable hooks per environment (local vs. prod) -### Use Agent Server When: +## Ready-to-run Example -✅ **Multi-user system**: Web app with many users -✅ **Remote clients**: Mobile app, web frontend -✅ **Centralized management**: Need to monitor all agents -✅ **Workspace isolation**: Users shouldn't interfere -✅ **SaaS product**: Building agent-as-a-service -✅ **Scaling**: Need to handle concurrent users + +This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) + -**Examples**: -- Chatbot platforms -- Code assistant web apps -- Agent marketplaces -- Enterprise agent deployments +```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py +"""OpenHands Agent SDK — Hooks Example -### Use Standalone SDK When: +Demonstrates the OpenHands hooks system. +Hooks are shell scripts that run at key lifecycle events: -✅ **Single-user**: Personal tool or script -✅ **Local execution**: Running on your machine -✅ **Full control**: Need programmatic access -✅ **Simpler deployment**: No server management -✅ **Lower latency**: No network overhead +- PreToolUse: Block dangerous commands before execution +- PostToolUse: Log tool usage after execution +- UserPromptSubmit: Inject context into user messages +- Stop: Enforce task completion criteria -**Examples**: -- CLI tools -- Automation scripts -- Local development -- Desktop applications +The hook scripts are in the scripts/ directory alongside this file. +""" -### Hybrid Approach +import os +import signal +import tempfile +from pathlib import Path -Use SDK locally but RemoteAPIWorkspace for execution: -- Agent logic in your Python code -- Execution happens on remote server -- Best of both worlds +from pydantic import SecretStr -## Building Custom Agent Server +from openhands.sdk import LLM, Conversation +from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher +from openhands.tools.preset.default import get_default_agent -The server is extensible for custom needs: -**Custom authentication**: -```python -from openhands.agent_server import AgentServer +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) -class CustomAgentServer(AgentServer): - async def authenticate(self, request): - # Custom auth logic - return await oauth_verify(request) -``` +SCRIPT_DIR = Path(__file__).parent / "hook_scripts" -**Custom workspace configuration**: -```python -server = AgentServer( - workspace_factory=lambda user: DockerWorkspace( - image=f"custom-image-{user.tier}", - resource_limits=user.resource_limits - ) +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), ) -``` -**Custom middleware**: -```python -@server.middleware -async def logging_middleware(request, call_next): - # Custom logging - response = await call_next(request) - return response -``` +# Create temporary workspace with git repo +with tempfile.TemporaryDirectory() as tmpdir: + workspace = Path(tmpdir) + os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") -## Next Steps + log_file = workspace / "tool_usage.log" + summary_file = workspace / "summary.txt" -### For Usage Examples + # Configure hooks using the typed approach (recommended) + # This provides better type safety and IDE support + hook_config = HookConfig( + pre_tool_use=[ + HookMatcher( + matcher="terminal", + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "block_dangerous.sh"), + timeout=10, + ) + ], + ) + ], + post_tool_use=[ + HookMatcher( + matcher="*", + hooks=[ + HookDefinition( + command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), + timeout=5, + ) + ], + ) + ], + user_prompt_submit=[ + HookMatcher( + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "inject_git_context.sh"), + ) + ], + ) + ], + stop=[ + HookMatcher( + hooks=[ + HookDefinition( + command=( + f"SUMMARY_FILE={summary_file} " + f"{SCRIPT_DIR / 'require_summary.sh'}" + ), + ) + ], + ) + ], + ) -- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally -- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup -- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API -- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options + # Alternative: You can also use .from_dict() for loading from JSON config files + # Example with a single hook matcher: + # hook_config = HookConfig.from_dict({ + # "hooks": { + # "PreToolUse": [{ + # "matcher": "terminal", + # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] + # }] + # } + # }) -### For Related Architecture + agent = get_default_agent(llm=llm) + conversation = Conversation( + agent=agent, + workspace=str(workspace), + hook_config=hook_config, + ) -- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details -- [SDK Architecture](/sdk/arch/sdk) - Core framework -- [Architecture Overview](/sdk/arch/overview) - System design + # Demo 1: Safe command (PostToolUse logs it) + print("=" * 60) + print("Demo 1: Safe command - logged by PostToolUse") + print("=" * 60) + conversation.send_message("Run: echo 'Hello from hooks!'") + conversation.run() -### For Implementation Details + if log_file.exists(): + print(f"\n[Log: {log_file.read_text().strip()}]") -- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + # Demo 2: Dangerous command (PreToolUse blocks it) + print("\n" + "=" * 60) + print("Demo 2: Dangerous command - blocked by PreToolUse") + print("=" * 60) + conversation.send_message("Run: rm -rf /tmp/test") + conversation.run() -### Condenser -Source: https://docs.openhands.dev/sdk/arch/condenser.md + # Demo 3: Context injection + Stop hook enforcement + print("\n" + "=" * 60) + print("Demo 3: Context injection + Stop hook") + print("=" * 60) + print("UserPromptSubmit injects git status; Stop requires summary.txt\n") + conversation.send_message( + "Check what files have changes, then create summary.txt describing the repo." + ) + conversation.run() -The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + if summary_file.exists(): + print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") -**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + print("\n" + "=" * 60) + print("Example Complete!") + print("=" * 60) -## Core Responsibilities + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") +``` + -The Condenser system has four primary responsibilities: -1. **History Compression** - Reduce event lists to fit within context windows -2. **Threshold Detection** - Determine when condensation should trigger -3. **Summary Generation** - Create meaningful summaries via LLM or heuristics -4. **View Management** - Transform event history into LLM-ready views +### Hook Scripts -## Architecture +The example uses external hook scripts in the `hook_scripts/` directory: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph Interface["Abstract Interface"] - Base["CondenserBase
Abstract base"] - end - - subgraph Implementations["Concrete Implementations"] - NoOp["NoOpCondenser
No compression"] - LLM["LLMSummarizingCondenser
LLM-based"] - Pipeline["PipelineCondenser
Multi-stage"] - end - - subgraph Process["Condensation Process"] - View["View
Event history"] - Check["should_condense()?"] - Condense["get_condensation()"] - Result["View | Condensation"] - end - - subgraph Output["Condensation Output"] - CondEvent["Condensation Event
Summary metadata"] - NewView["Condensed View
Reduced tokens"] - end - - Base --> NoOp - Base --> LLM - Base --> Pipeline - - View --> Check - Check -->|Yes| Condense - Check -->|No| Result - Condense --> CondEvent - CondEvent --> NewView - NewView --> Result - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Base primary - class LLM,Pipeline secondary - class Check,Condense tertiary -``` - -### Key Components - -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | -| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | -| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | -| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | -| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | -| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | -| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | - -## Condenser Types - -### NoOpCondenser - -Pass-through condenser that performs no compression: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - View["View"] - NoOp["NoOpCondenser"] - Same["Same View"] - - View --> NoOp --> Same - - style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` - -### LLMSummarizingCondenser - -Uses an LLM to generate summaries of conversation history: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - View["Long View
120+ events"] - Check["Threshold
exceeded?"] - Summarize["LLM Summarization"] - Summary["Summary Text"] - Metadata["Condensation Event"] - AddToHistory["Add to History"] - NextStep["Next Step: View.from_events()"] - NewView["Condensed View"] - - View --> Check - Check -->|Yes| Summarize - Summarize --> Summary - Summary --> Metadata - Metadata --> AddToHistory - AddToHistory --> NextStep - NextStep --> NewView - - style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Process:** -1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) -2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) -3. **LLM Call:** Generate summary of middle events using dedicated LLM -4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` -5. **Add to History:** Agent adds `Condensation` to event log and returns early -6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary - -**Configuration:** -- **`max_size`:** Event count threshold before condensation triggers (default: 120) -- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) -- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) - -### PipelineCondenser - -Chains multiple condensers in sequence: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - View["Original View"] - C1["Condenser 1"] - C2["Condenser 2"] - C3["Condenser 3"] - Final["Final View"] - - View --> C1 --> C2 --> C3 --> Final - - style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) - -## Condensation Flow - -### Trigger Mechanisms - -Condensers can be triggered in two ways: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Automatic["Automatic Trigger"] - Agent1["Agent Step"] - Build1["View.from_events()"] - Check1["condenser.condense(view)"] - Trigger1["should_condense()?"] - end - - Agent1 --> Build1 --> Check1 --> Trigger1 - - style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` - -**Automatic Trigger:** -- **When:** Threshold exceeded (e.g., event count > `max_size`) -- **Who:** Agent calls `condenser.condense()` each step -- **Purpose:** Proactively keep context within limits - - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Manual["Manual Trigger"] - Error["LLM Context Error"] - Request["CondensationRequest Event"] - NextStep["Next Agent Step"] - Trigger2["condense() detects request"] - end - - Error --> Request --> NextStep --> Trigger2 - - style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` -**Manual Trigger:** -- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) -- **Who:** Agent (on LLM context window error) or application code -- **Purpose:** Force compression when context limit exceeded - -### Condensation Workflow - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Start["Agent calls condense(view)"] - - Decision{"should_condense?"} - - ReturnView["Return View
Agent proceeds"] - - Extract["Select Events to Keep/Forget"] - Generate["LLM Generates Summary"] - Create["Create Condensation Event"] - ReturnCond["Return Condensation"] - AddHistory["Agent adds to history"] - NextStep["Next Step: View.from_events()"] - FilterEvents["Filter forgotten events"] - InsertSummary["Insert summary at offset"] - NewView["New condensed view"] - - Start --> Decision - Decision -->|No| ReturnView - Decision -->|Yes| Extract - Extract --> Generate - Generate --> Create - Create --> ReturnCond - ReturnCond --> AddHistory - AddHistory --> NextStep - NextStep --> FilterEvents - FilterEvents --> InsertSummary - InsertSummary --> NewView - - style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` - -**Key Steps:** - -1. **Threshold Check:** `should_condense()` determines if condensation needed -2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) -3. **Summary Generation:** LLM creates compressed representation of forgotten events -4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary -5. **Return to Agent:** Condenser returns `Condensation` (not `View`) -6. **History Update:** Agent adds `Condensation` to event log and exits step -7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary - -## View and Condensation - -### View Structure - -A `View` represents the conversation history as it will be sent to the LLM: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Events["Full Event List
+ Condensation events"] - FromEvents["View.from_events()"] - Filter["Filter forgotten events"] - Insert["Insert summary"] - View["View
LLMConvertibleEvents"] - Convert["events_to_messages()"] - LLM["LLM Input"] - - Events --> FromEvents - FromEvents --> Filter - Filter --> Insert - Insert --> View - View --> Convert - Convert --> LLM - - style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**View Components:** -- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) -- **`unhandled_condensation_request`:** Flag for pending manual condensation -- **`condensations`:** List of all Condensation events processed -- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics - -### Condensation Event - -When condensation occurs, a `Condensation` event is created: - -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Old["Middle Events
~60 events"] - Summary["Summary Text
LLM-generated"] - Event["Condensation Event
forgotten_event_ids"] - Applied["View.from_events()"] - New["New View
~60 events + summary"] - - Old -.->|Summarized| Summary - Summary --> Event - Event --> Applied - Applied --> New - - style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` - -**Condensation Fields:** -- **`forgotten_event_ids`:** List of event IDs to filter out -- **`summary`:** Compressed text representation of forgotten events -- **`summary_offset`:** Index where summary event should be inserted -- Inherits from `Event`: `id`, `timestamp`, `source` - -## Rolling Window Pattern + +```bash +#!/bin/bash +# PreToolUse hook: Block dangerous rm -rf commands +# Uses jq for JSON parsing (needed for nested fields like tool_input.command) -`RollingCondenser` implements a common pattern for threshold-based condensation: +input=$(cat) +command=$(echo "$input" | jq -r '.tool_input.command // ""') -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - View["Current View
120+ events"] - Check["Count Events"] - - Compare{"Count >
max_size?"} - - Keep["Keep All Events"] - - Split["Split Events"] - Head["Head
First 4 events"] - Middle["Middle
~56 events"] - Tail["Tail
~56 events"] - Summarize["LLM Summarizes Middle"] - Result["Head + Summary + Tail
~60 events total"] - - View --> Check - Check --> Compare - - Compare -->|Under| Keep - Compare -->|Over| Split - - Split --> Head - Split --> Middle - Split --> Tail - - Middle --> Summarize - Head --> Result - Summarize --> Result - Tail --> Result - - style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +# Block rm -rf commands +if [[ "$command" =~ "rm -rf" ]]; then + echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' + exit 2 # Exit code 2 = block the operation +fi -**Rolling Window Strategy:** -1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts -2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context -3. **Summarize Middle:** Compress events between head and tail into summary -4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) +exit 0 # Exit code 0 = allow the operation +``` +
-## Component Relationships + +```bash +#!/bin/bash +# PostToolUse hook: Log all tool usage +# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) -### How Condenser Integrates +# LOG_FILE should be set by the calling script +LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Agent["Agent"] - Condenser["Condenser"] - State["Conversation State"] - Events["Event Log"] - - Agent -->|"View.from_events()"| State - State -->|View| Agent - Agent -->|"condense(view)"| Condenser - Condenser -->|"View | Condensation"| Agent - Agent -->|Adds Condensation| Events - - style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" +exit 0 ``` + -**Relationship Characteristics:** -- **Agent → State**: Calls `View.from_events()` to get current view -- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered -- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) -- **Agent → Events**: Adds `Condensation` event to log when returned + +```bash +#!/bin/bash +# UserPromptSubmit hook: Inject git status when user asks about code changes -## See Also +input=$(cat) -- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning -- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management -- **[Events](/sdk/arch/events)** - Condensation event type and append-only log -- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers +# Check if user is asking about changes, diff, or git +if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then + # Get git status if in a git repo + if git rev-parse --git-dir > /dev/null 2>&1; then + status=$(git status --short 2>/dev/null | head -10) + if [ -n "$status" ]; then + # Escape for JSON + escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') + echo "{\"additionalContext\": \"Current git status: $escaped\"}" + fi + fi +fi +exit 0 +``` + -### Conversation -Source: https://docs.openhands.dev/sdk/arch/conversation.md + +```bash +#!/bin/bash +# Stop hook: Require a summary.txt file before allowing agent to finish +# SUMMARY_FILE should be set by the calling script -The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. +SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" -**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) +if [ ! -f "$SUMMARY_FILE" ]; then + echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' + exit 2 +fi +exit 0 +``` + -## Core Responsibilities -The Conversation system has four primary responsibilities: +## Next Steps -1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents -2. **State Orchestration** - Maintain conversation history, events, and execution status -3. **Workspace Coordination** - Bridge agent operations with execution environments -4. **Runtime Services** - Provide persistence, monitoring, security, and visualization +- See also: [Metrics and Observability](/sdk/guides/metrics) +- Architecture: [Events](/sdk/arch/events) -## Architecture +### Iterative Refinement +Source: https://docs.openhands.dev/sdk/guides/iterative-refinement.md -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% -flowchart LR - User["User Code"] - - subgraph Factory[" "] - Entry["Conversation()"] - end +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; - subgraph Implementations[" "] - Local["LocalConversation
Direct execution"] - Remote["RemoteConversation
Via agent-server API"] - end - - subgraph Core[" "] - State["ConversationState
• agent
workspace • stats • ..."] - EventLog["ConversationState.events
Event storage"] - end - - User --> Entry - Entry -.->|LocalWorkspace| Local - Entry -.->|RemoteWorkspace| Remote - - Local --> State - Remote --> State - - State --> EventLog - - classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px - classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px - - class Entry factory - class Local,Remote impl - class State,EventLog core - class Persist,Stuck,Viz,Secrets service -``` +> The ready-to-run example is available [here](#ready-to-run-example)! -### Key Components +## Overview -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | -| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | -| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | -| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | -| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | +Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: +1. A **refactoring agent** performs the main task (e.g., code conversion) +2. A **critique agent** evaluates the quality and provides detailed feedback +3. If quality is below threshold, the refactoring agent tries again with the feedback -## Factory Pattern +This pattern is useful for: +- Code refactoring and modernization (e.g., COBOL to Java) +- Document translation and localization +- Content generation with quality requirements +- Any task requiring iterative improvement -The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: +## How It Works -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Input["Conversation(agent, workspace)"] - Check{Workspace Type?} - Local["LocalConversation
Agent runs in-process"] - Remote["RemoteConversation
Agent runs via API"] - - Input --> Check - Check -->|str or LocalWorkspace| Local - Check -->|RemoteWorkspace| Remote - - style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### The Iteration Loop -**Dispatch Logic:** -- **Local:** String paths or `LocalWorkspace` → in-process execution -- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket +The core workflow runs in a loop until quality threshold is met: -This abstraction enables switching deployment modes without code changes—just swap the workspace type. +```python icon="python" wrap +QUALITY_THRESHOLD = 90.0 +MAX_ITERATIONS = 5 -## State Management +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Phase 1: Refactoring agent converts COBOL to Java + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir) + ) + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() -State updates follow a **two-path pattern** depending on the type of change: + # Phase 2: Critique agent evaluates the conversion + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir) + ) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Start["State Update Request"] - Lock["Acquire FIFO Lock"] - Decision{New Event?} - - StateOnly["Update State Fields
stats, status, metadata"] - EventPath["Append to Event Log
messages, actions, observations"] - - Callback["Trigger Callbacks"] - Release["Release Lock"] - - Start --> Lock - Lock --> Decision - Decision -->|No| StateOnly - Decision -->|Yes| EventPath - StateOnly --> Callback - EventPath --> Callback - Callback --> Release - - style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px - style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + # Parse score and decide whether to continue + current_score = parse_critique_score(critique_file) + + iteration += 1 ``` -**Two Update Patterns:** +### Critique Scoring -1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) -2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur +The critique agent evaluates each file on four dimensions (0-25 pts each): +- **Correctness**: Does the Java code preserve the original business logic? +- **Code Quality**: Is the code clean and following Java conventions? +- **Completeness**: Are all COBOL features properly converted? +- **Best Practices**: Does it use proper OOP, error handling, and documentation? -**Thread Safety:** -- FIFO Lock ensures ordered, atomic updates -- Callbacks fire after successful commit -- Read operations never block writes +### Feedback Loop -## Execution Models +When the score is below threshold, the refactoring agent receives the critique file location: -The conversation system supports two execution models with identical APIs: +```python icon="python" wrap +if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" +``` -### Local vs Remote Execution +## Customization -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Local["LocalConversation"] - L1["User sends message"] - L2["Agent executes in-process"] - L3["Direct tool calls"] - L4["Events via callbacks"] - L1 --> L2 --> L3 --> L4 - end - style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Adjusting Thresholds -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Remote["RemoteConversation"] - R1["User sends message"] - R2["HTTP → Agent Server"] - R3["Isolated container execution"] - R4["WebSocket event stream"] - R1 --> R2 --> R3 --> R4 - end - style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +```python icon="python" wrap +QUALITY_THRESHOLD = 95.0 # Require higher quality +MAX_ITERATIONS = 10 # Allow more iterations ``` -| Aspect | LocalConversation | RemoteConversation | -|--------|-------------------|-------------------| -| **Execution** | In-process | Remote container/server | -| **Communication** | Direct function calls | HTTP + WebSocket | -| **State Sync** | Immediate | Network serialized | -| **Use Case** | Development, CLI tools | Production, web apps | -| **Isolation** | Process-level | Container-level | - -**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. +### Using Real COBOL Files -## Auxiliary Services +The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). -The conversation system provides pluggable services that operate independently on the event stream: +## Ready-to-run Example -| Service | Purpose | Architecture Pattern | -|---------|---------|---------------------| -| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | -| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | -| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | -| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | -| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | + +This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) + -**Design Principle:** Services read from the event log but never mutate state directly. This enables: -- Services can be enabled/disabled independently -- Easy to add new services without changing core orchestration -- Event stream acts as the integration point +```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py +#!/usr/bin/env python3 +""" +Iterative Refinement Example: COBOL to Java Refactoring -## Component Relationships +This example demonstrates an iterative refinement workflow where: +1. A refactoring agent converts COBOL files to Java files +2. A critique agent evaluates the quality of each conversion and provides scores +3. If the average score is below 90%, the process repeats with feedback -### How Conversation Interacts +The workflow continues until the refactoring meets the quality threshold. -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Conv["Conversation"] - Agent["Agent"] - WS["Workspace"] - Tools["Tools"] - LLM["LLM"] - - Conv -->|Delegates to| Agent - Conv -->|Configures| WS - Agent -.->|Updates| Conv - Agent -->|Uses| Tools - Agent -->|Queries| LLM - - style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +Source COBOL files can be obtained from: +https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl +""" -**Relationship Characteristics:** -- **Conversation → Agent**: One-way orchestration, agent reports back via state updates -- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation -- **Agent → Conversation**: Indirect via state events +import os +import re +import tempfile +from pathlib import Path -## See Also +from pydantic import SecretStr -- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design -- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design -- **[Event System](/sdk/arch/events)** - Event types and flow -- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples +from openhands.sdk import LLM, Conversation +from openhands.tools.preset.default import get_default_agent -### Design Principles -Source: https://docs.openhands.dev/sdk/arch/design.md -The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. +QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) -[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. -## Optional Isolation over Mandatory Sandboxing +def setup_workspace() -> tuple[Path, Path, Path]: + """Create workspace directories for the refactoring workflow.""" + workspace_dir = Path(tempfile.mkdtemp()) + cobol_dir = workspace_dir / "cobol" + java_dir = workspace_dir / "java" + critique_dir = workspace_dir / "critiques" - -**V0 Challenge:** -Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. -Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. - + cobol_dir.mkdir(parents=True, exist_ok=True) + java_dir.mkdir(parents=True, exist_ok=True) + critique_dir.mkdir(parents=True, exist_ok=True) -**V1 Principle:** -**Sandboxing should be opt-in, not universal.** -V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. -When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. + return workspace_dir, cobol_dir, java_dir -## Stateless by Default, One Source of Truth for State - -**V0 Challenge:** -V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. - +def create_sample_cobol_files(cobol_dir: Path) -> list[str]: + """Create sample COBOL files for demonstration. -**V1 Principle:** -**Keep everything stateless, with exactly one mutable state.** -All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. -The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. + In a real scenario, you would clone files from: + https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl + """ + sample_files = { + "CBACT01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBACT01C. + ***************************************************************** + * Program: CBACT01C - Account Display Program + * Purpose: Display account information for a given account number + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-ACCOUNT-ID PIC 9(11). + 01 WS-ACCOUNT-STATUS PIC X(1). + 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. + 01 WS-CUSTOMER-NAME PIC X(50). + 01 WS-ERROR-MSG PIC X(80). -## Clear Boundaries between Agent and Applications + PROCEDURE DIVISION. + PERFORM 1000-INIT. + PERFORM 2000-PROCESS. + PERFORM 3000-TERMINATE. + STOP RUN. - -**V0 Challenge:** -The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. -Heavy research dependencies and benchmark integrations further bloated production builds. - + 1000-INIT. + INITIALIZE WS-ACCOUNT-ID + INITIALIZE WS-ACCOUNT-STATUS + INITIALIZE WS-ACCOUNT-BALANCE + INITIALIZE WS-CUSTOMER-NAME. -**V1 Principle:** -**Maintain strict separation of concerns.** -V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). -Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. + 2000-PROCESS. + DISPLAY "ENTER ACCOUNT NUMBER: " + ACCEPT WS-ACCOUNT-ID + IF WS-ACCOUNT-ID = ZEROS + MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG + DISPLAY WS-ERROR-MSG + ELSE + DISPLAY "ACCOUNT: " WS-ACCOUNT-ID + DISPLAY "STATUS: " WS-ACCOUNT-STATUS + DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE + END-IF. + 3000-TERMINATE. + DISPLAY "PROGRAM COMPLETE". +""", + "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBCUS01C. + ***************************************************************** + * Program: CBCUS01C - Customer Information Program + * Purpose: Manage customer data operations + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-CUSTOMER-ID PIC 9(9). + 01 WS-FIRST-NAME PIC X(25). + 01 WS-LAST-NAME PIC X(25). + 01 WS-ADDRESS PIC X(100). + 01 WS-PHONE PIC X(15). + 01 WS-EMAIL PIC X(50). + 01 WS-OPERATION PIC X(1). + 88 OP-ADD VALUE 'A'. + 88 OP-UPDATE VALUE 'U'. + 88 OP-DELETE VALUE 'D'. + 88 OP-DISPLAY VALUE 'V'. -## Composable Components for Extensibility + PROCEDURE DIVISION. + PERFORM 1000-MAIN-PROCESS. + STOP RUN. - -**V0 Challenge:** -Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. - + 1000-MAIN-PROCESS. + DISPLAY "CUSTOMER MANAGEMENT SYSTEM" + DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" + ACCEPT WS-OPERATION + EVALUATE TRUE + WHEN OP-ADD + PERFORM 2000-ADD-CUSTOMER + WHEN OP-UPDATE + PERFORM 3000-UPDATE-CUSTOMER + WHEN OP-DELETE + PERFORM 4000-DELETE-CUSTOMER + WHEN OP-DISPLAY + PERFORM 5000-DISPLAY-CUSTOMER + WHEN OTHER + DISPLAY "INVALID OPERATION" + END-EVALUATE. -**V1 Principle:** -**Everything should be composable and safe to extend.** -Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. -Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. + 2000-ADD-CUSTOMER. + DISPLAY "ADDING NEW CUSTOMER" + ACCEPT WS-CUSTOMER-ID + ACCEPT WS-FIRST-NAME + ACCEPT WS-LAST-NAME + DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. -### Events -Source: https://docs.openhands.dev/sdk/arch/events.md + 3000-UPDATE-CUSTOMER. + DISPLAY "UPDATING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. -The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. + 4000-DELETE-CUSTOMER. + DISPLAY "DELETING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. -**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + 5000-DISPLAY-CUSTOMER. + DISPLAY "DISPLAYING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "ID: " WS-CUSTOMER-ID + DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. +""", + "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBTRN01C. + ***************************************************************** + * Program: CBTRN01C - Transaction Processing Program + * Purpose: Process financial transactions + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-TRANS-ID PIC 9(16). + 01 WS-TRANS-TYPE PIC X(2). + 88 TRANS-CREDIT VALUE 'CR'. + 88 TRANS-DEBIT VALUE 'DB'. + 88 TRANS-TRANSFER VALUE 'TR'. + 01 WS-TRANS-AMOUNT PIC S9(13)V99. + 01 WS-FROM-ACCOUNT PIC 9(11). + 01 WS-TO-ACCOUNT PIC 9(11). + 01 WS-TRANS-DATE PIC 9(8). + 01 WS-TRANS-STATUS PIC X(10). -## Core Responsibilities + PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE. + PERFORM 2000-PROCESS-TRANSACTION. + PERFORM 3000-FINALIZE. + STOP RUN. -The Event System has four primary responsibilities: + 1000-INITIALIZE. + MOVE ZEROS TO WS-TRANS-ID + MOVE SPACES TO WS-TRANS-TYPE + MOVE ZEROS TO WS-TRANS-AMOUNT + MOVE "PENDING" TO WS-TRANS-STATUS. -1. **Type Safety** - Enforce event schemas through Pydantic models -2. **LLM Integration** - Convert events to/from LLM message formats -3. **Append-Only Log** - Maintain immutable event history -4. **Service Integration** - Enable observers to react to event streams + 2000-PROCESS-TRANSACTION. + DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " + ACCEPT WS-TRANS-TYPE + DISPLAY "ENTER AMOUNT: " + ACCEPT WS-TRANS-AMOUNT + EVALUATE TRUE + WHEN TRANS-CREDIT + PERFORM 2100-PROCESS-CREDIT + WHEN TRANS-DEBIT + PERFORM 2200-PROCESS-DEBIT + WHEN TRANS-TRANSFER + PERFORM 2300-PROCESS-TRANSFER + WHEN OTHER + MOVE "INVALID" TO WS-TRANS-STATUS + END-EVALUATE. -## Architecture + 2100-PROCESS-CREDIT. + DISPLAY "PROCESSING CREDIT" + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% -flowchart TB - Base["Event
Base class"] - LLMBase["LLMConvertibleEvent
Abstract base"] - - subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] - Message["MessageEvent
User/assistant text"] - Action["ActionEvent
Tool calls"] - System["SystemPromptEvent
Initial system prompt"] - CondSummary["CondensationSummaryEvent
Condenser summary"] - - ObsBase["ObservationBaseEvent
Base for tool responses"] - Observation["ObservationEvent
Tool results"] - UserReject["UserRejectObservation
User rejected action"] - AgentError["AgentErrorEvent
Agent error"] - end - - subgraph Internals["Internal Events
NOT visible to the LLM"] - ConvState["ConversationStateUpdateEvent
State updates"] - CondReq["CondensationRequest
Request compression"] - Cond["Condensation
Compression result"] - Pause["PauseEvent
User pause"] - end - - Base --> LLMBase - Base --> Internals - LLMBase --> LLMTypes - ObsBase --> Observation - ObsBase --> UserReject - ObsBase --> AgentError - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Base,LLMBase,Message,Action,SystemPromptEvent primary - class ObsBase,Observation,UserReject,AgentError secondary - class ConvState,CondReq,Cond,Pause tertiary -``` + 2200-PROCESS-DEBIT. + DISPLAY "PROCESSING DEBIT" + ACCEPT WS-FROM-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. -### Key Components + 2300-PROCESS-TRANSFER. + DISPLAY "PROCESSING TRANSFER" + ACCEPT WS-FROM-ACCOUNT + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | -| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | -| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | -| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | -| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | -| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | -| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | -| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | -| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | -| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | -| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | -| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | -| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | -| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | + 3000-FINALIZE. + DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. +""", + } -## Event Types + created_files = [] + for filename, content in sample_files.items(): + file_path = cobol_dir / filename + file_path.write_text(content) + created_files.append(filename) -### LLM-Convertible Events + return created_files -Events that participate in agent reasoning and can be converted to LLM messages: +def get_refactoring_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], + critique_file: Path | None = None, +) -> str: + """Generate the prompt for the refactoring agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) -| Event Type | Source | Content | LLM Role | -|------------|--------|---------|----------| -| **MessageEvent (user)** | user | Text, images | `user` | -| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | -| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | -| **ObservationEvent** | environment | Tool execution result | `tool` | -| **UserRejectObservation** | environment | Rejection reason | `tool` | -| **AgentErrorEvent** | agent | Error details | `tool` | -| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | -| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | + base_prompt = f"""Convert the following COBOL files to Java: -The event system bridges agent events to LLM messages: +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Events["Event List"] - Filter["Filter LLMConvertibleEvent"] - Group["Group ActionEvents
by llm_response_id"] - Convert["Convert to Messages"] - LLM["LLM Input"] - - Events --> Filter - Filter --> Group - Group --> Convert - Convert --> LLM - - style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px - style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +Files to convert: +{files_list} -**Special Handling - Parallel Function Calling:** +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices -When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): -1. Group all ActionEvents by `llm_response_id` -2. Combine into single Message with multiple `tool_calls` -3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included -4. All subsequent events in the batch have empty thought fields +Read each COBOL file and create the corresponding Java file in the target directory. +""" -**Example:** -``` -ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) -ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) -→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) -``` + if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" -### Internal Events + return base_prompt -Events for metadata, control flow, and user actions (not sent to LLM): -| Event Type | Source | Purpose | Key Fields | -|------------|--------|---------|------------| -| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | -| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | -| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | -| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | +def get_critique_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], +) -> str: + """Generate the prompt for the critique agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) -**Source Types:** -- **user**: Event originated from user input -- **agent**: Event generated by agent logic -- **environment**: Event from system/framework/tools + return f"""Evaluate the quality of COBOL to Java refactoring. -## Component Relationships +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} -### How Events Integrate +Original COBOL files: +{files_list} -## `source` vs LLM `role` +Please evaluate each converted Java file against its original COBOL source. -Events often carry **two different concepts** that are easy to confuse: +For each file, assess: +1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) +2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) +3. Completeness: Are all COBOL features properly converted? (0-25 pts) +4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) -- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. -- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. +Create a critique report in the following EXACT format: -These fields are **intentionally independent**. +# COBOL to Java Refactoring Critique Report -Common examples include: +## Summary +[Brief overall assessment] -- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. -- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. +## File Evaluations -**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. +### [Original COBOL filename] +- **Java File**: [corresponding Java filename or "NOT FOUND"] +- **Correctness**: [score]/25 - [brief explanation] +- **Code Quality**: [score]/25 - [brief explanation] +- **Completeness**: [score]/25 - [brief explanation] +- **Best Practices**: [score]/25 - [brief explanation] +- **File Score**: [total]/100 +- **Issues to Address**: + - [specific issue 1] + - [specific issue 2] + ... +[Repeat for each file] -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Events["Event System"] - Agent["Agent"] - Conversation["Conversation"] - Tools["Tools"] - Services["Auxiliary Services"] - - Agent -->|Reads| Events - Agent -->|Writes| Events - Conversation -->|Manages| Events - Tools -->|Creates| Events - Events -.->|Stream| Services - - style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +## Overall Score +- **Average Score**: [calculated average of all file scores] +- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] -**Relationship Characteristics:** -- **Agent → Events**: Reads history for context, writes actions/messages -- **Conversation → Events**: Owns and persists event log -- **Tools → Events**: Create ObservationEvents after execution -- **Services → Events**: Read-only observers for monitoring, visualization +## Priority Improvements +1. [Most critical improvement needed] +2. [Second priority] +3. [Third priority] -## Error Events: Agent vs Conversation +Save this report to: {java_dir.parent}/critiques/critique_report.md +""" -Two distinct error events exist in the SDK, with different purpose and visibility: -- AgentErrorEvent - - Type: ObservationBaseEvent (LLM-convertible) - - Scope: Error for a specific tool call (has tool_name and tool_call_id) - - Source: "agent" - - LLM visibility: Sent as a tool message so the model can react/recover - - Effect: Conversation continues; not a terminal state - - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py +def parse_critique_score(critique_file: Path) -> float: + """Parse the average score from the critique report.""" + if not critique_file.exists(): + return 0.0 -- ConversationErrorEvent - - Type: Event (not LLM-convertible) - - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) - - Source: typically "environment" - - LLM visibility: Not sent to the model - - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications - - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py + content = critique_file.read_text() -## See Also + # Look for "Average Score: X" pattern + patterns = [ + r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", + r"Average Score:\s*(\d+(?:\.\d+)?)", + r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", + ] -- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events -- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management -- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation -- **[Condenser](/sdk/arch/condenser)** - Event history compression + for pattern in patterns: + match = re.search(pattern, content, re.IGNORECASE) + if match: + return float(match.group(1)) -### LLM -Source: https://docs.openhands.dev/sdk/arch/llm.md + return 0.0 -The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. -**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) +def run_iterative_refinement() -> None: + """Run the iterative refinement workflow.""" + # Setup + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") -## Core Responsibilities + llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="iterative_refinement", + ) -The LLM system has five primary responsibilities: + workspace_dir, cobol_dir, java_dir = setup_workspace() + critique_dir = workspace_dir / "critiques" -1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers -2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) -3. **Configuration Management** - Load from environment, JSON, or programmatic configuration -4. **Telemetry & Cost** - Track usage, latency, and costs across providers -5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries + print(f"Workspace: {workspace_dir}") + print(f"COBOL Directory: {cobol_dir}") + print(f"Java Directory: {java_dir}") + print(f"Critique Directory: {critique_dir}") + print() -## Architecture + # Create sample COBOL files + cobol_files = create_sample_cobol_files(cobol_dir) + print(f"Created {len(cobol_files)} sample COBOL files:") + for f in cobol_files: + print(f" - {f}") + print() -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% -flowchart TB - subgraph Configuration["Configuration Sources"] - Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] - JSON["JSON Files
config/llm.json"] - Code["Programmatic
LLM(...)"] - end - - subgraph Core["Core LLM"] - Model["LLM Model
Pydantic configuration"] - Pipeline["Request Pipeline
Retry, timeout, telemetry"] - end - - subgraph Backend["LiteLLM Backend"] - Providers["100+ Providers
OpenAI, Anthropic, etc."] - end - - subgraph Output["Telemetry"] - Usage["Token Usage"] - Cost["Cost Tracking"] - Latency["Latency Metrics"] - end - - Env --> Model - JSON --> Model - Code --> Model - - Model --> Pipeline - Pipeline --> Providers - - Pipeline --> Usage - Pipeline --> Cost - Pipeline --> Latency - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Model primary - class Pipeline secondary - class LiteLLM tertiary -``` + critique_file = critique_dir / "critique_report.md" + current_score = 0.0 + iteration = 0 -### Key Components + while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + iteration += 1 + print("=" * 80) + print(f"ITERATION {iteration}") + print("=" * 80) -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | -| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | -| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | -| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | -| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | -| **Telemetry** | Usage tracking | Token counts, costs, latency | + # Phase 1: Refactoring + print("\n--- Phase 1: Refactoring Agent ---") + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir), + ) -## Configuration + previous_critique = critique_file if iteration > 1 else None + refactoring_prompt = get_refactoring_prompt( + cobol_dir, java_dir, cobol_files, previous_critique + ) -See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + print("Refactoring phase complete.") -### Programmatic Configuration + # Phase 2: Critique + print("\n--- Phase 2: Critique Agent ---") + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir), + ) -Create LLM instances directly in code: + critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + print("Critique phase complete.") -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Code["Python Code"] - LLM["LLM(model=...)"] - Agent["Agent"] - - Code --> LLM - LLM --> Agent - - style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` + # Parse the score + current_score = parse_critique_score(critique_file) + print(f"\nCurrent Score: {current_score:.1f}%") -**Example:** -```python -from pydantic import SecretStr -from openhands.sdk import LLM + if current_score >= QUALITY_THRESHOLD: + print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") + else: + print( + f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " + "Continuing refinement..." + ) -llm = LLM( - model="anthropic/claude-sonnet-4.1", - api_key=SecretStr("sk-ant-123"), - temperature=0.1, - timeout=120, -) -``` + # Final summary + print("\n" + "=" * 80) + print("ITERATIVE REFINEMENT COMPLETE") + print("=" * 80) + print(f"Total iterations: {iteration}") + print(f"Final score: {current_score:.1f}%") + print(f"Workspace: {workspace_dir}") -### Environment Variable Configuration + # List created Java files + print("\nCreated Java files:") + for java_file in java_dir.glob("*.java"): + print(f" - {java_file.name}") -Load from environment using naming convention: + # Show critique file location + if critique_file.exists(): + print(f"\nFinal critique report: {critique_file}") -**Environment Variable Pattern:** -- **Prefix:** All variables start with `LLM_` -- **Mapping:** `LLM_FIELD` → `field` (lowercased) -- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr + # Report cost + cost = llm.metrics.accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") -**Common Variables:** -```bash -export LLM_MODEL="anthropic/claude-sonnet-4.1" -export LLM_API_KEY="sk-ant-123" -export LLM_USAGE_ID="primary" -export LLM_TIMEOUT="120" -export LLM_NUM_RETRIES="5" + +if __name__ == "__main__": + run_iterative_refinement() ``` -### JSON Configuration + -Serialize and load from JSON files: +## Next Steps -**Example:** -```python -# Save -llm.model_dump_json(exclude_none=True, indent=2) +- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents +- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow -# Load -llm = LLM.load_from_json("config/llm.json") -``` +### Exception Handling +Source: https://docs.openhands.dev/sdk/guides/llm-error-handling.md -**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). -If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. +The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. +This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. -## Request Pipeline +## Why typed exceptions? -### Completion Flow +LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% -flowchart TB - Request["completion() or responses() call"] - Validate["Validate Config"] - - Attempt["LiteLLM Request"] - Success{"Success?"} - - Retry{"Retries
remaining?"} - Wait["Exponential Backoff"] - - Telemetry["Record Telemetry"] - Response["Return Response"] - Error["Raise Error"] - - Request --> Validate - Validate --> Attempt - Attempt --> Success - - Success -->|Yes| Telemetry - Success -->|No| Retry - - Retry -->|Yes| Wait - Retry -->|No| Error - - Wait --> Attempt - Telemetry --> Response - - style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +- One code path to handle auth, rate limits, timeouts, service issues, and bad requests +- Clear behavior when conversation history exceeds the context window +- Backward compatibility when you switch providers or SDK versions -**Pipeline Stages:** +## Quick start: Using agents and conversations -1. **Validation:** Check required fields (model, messages) -2. **Request:** Call LiteLLM with provider-specific formatting -3. **Retry Logic:** Exponential backoff on failures (configurable) -4. **Telemetry:** Record tokens, cost, latency -5. **Response:** Return completion or raise error +Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. -### Responses API Support +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import Agent, Conversation, LLM +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) -In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +agent = Agent(llm=llm, tools=[]) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) -#### Architecture +try: + conversation.send_message( + "Continue the long analysis we started earlier…" + ) + conversation.run() -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Check{"Model supports
Responses API?"} - - subgraph Standard["Standard Path"] - ChatFormat["Format as
Chat Messages"] - ChatCall["litellm.completion()"] - end - - subgraph ResponsesPath["Responses Path"] - RespFormat["Format as
instructions + input[]"] - RespCall["litellm.responses()"] - end - - ChatResponse["ModelResponse"] - RespResponse["ResponsesAPIResponse"] - - Parse["Parse to Message"] - Return["LLMResponse"] - - Check -->|No| ChatFormat - Check -->|Yes| RespFormat - - ChatFormat --> ChatCall - RespFormat --> RespCall - - ChatCall --> ChatResponse - RespCall --> RespResponse - - ChatResponse --> Parse - RespResponse --> Parse - - Parse --> Return - - style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +except LLMContextWindowExceedError: + # Conversation is longer than the model’s context window + # Options: + # 1) Enable a condenser (recommended for long sessions) + # 2) Shorten inputs or reset conversation + print("Hit the context limit. Consider enabling a condenser.") -#### Supported Models +except LLMAuthenticationError: + print( + "Invalid or missing API credentials." + "Check your API key or auth setup." + ) -Models that automatically use the Responses API path: +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") -| Pattern | Examples | Documentation | -|---------|----------|---------------| -| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") -**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") -## Provider Integration +except LLMError as e: + # Fallback for other SDK LLM errors (parsing/validation, etc.) + print(f"Unhandled LLM error: {e}") +``` -### LiteLLM Abstraction -Software Agent SDK uses LiteLLM for provider abstraction: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - SDK["Software Agent SDK"] - LiteLLM["LiteLLM"] - - subgraph Providers["100+ Providers"] - OpenAI["OpenAI"] - Anthropic["Anthropic"] - Google["Google"] - Azure["Azure"] - Others["..."] - end - - SDK --> LiteLLM - LiteLLM --> OpenAI - LiteLLM --> Anthropic - LiteLLM --> Google - LiteLLM --> Azure - LiteLLM --> Others - - style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Avoiding context‑window errors with a condenser -**Benefits:** -- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. -- **Unified API:** Same interface regardless of provider -- **Format Translation:** Provider-specific request/response formatting -- **Error Handling:** Normalized error codes and messages +If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. -### LLM Providers +```python icon="python" focus={5-6, 9-14} wrap +from openhands.sdk.context.condenser import LLMSummarizingCondenser -Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. -The pages linked below live under the OpenHands app section but apply -verbatim to SDK applications because both layers wrap the same -`openhands.sdk.llm.LLM` interface. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), + max_size=10, + keep_first=2, +) -| Provider / scenario | Documentation | -| --- | --- | -| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | -| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | -| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | -| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | -| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | -| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | -| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | -| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | -| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | -| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | +agent = Agent(llm=llm, tools=[], condenser=condenser) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) +``` -When you follow any of those guides while building with the SDK, create an -`LLM` object using the documented parameters (for example, API keys, base URLs, -or custom headers) and pass it into your agent or registry. The OpenHands UI -surfacing is simply a convenience layer on top of the same configuration model. + + See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). + +## Handling errors with direct LLM calls -## Telemetry and Cost Tracking +The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. -### Telemetry Collection +### Example: Using `.completion()` -LLM requests automatically collect metrics: +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Request["LLM Request"] - - subgraph Metrics - Tokens["Token Counts
Input/Output"] - Cost["Cost
USD"] - Latency["Latency
ms"] - end - - Events["Event Log"] - - Request --> Tokens - Request --> Cost - Request --> Latency - - Tokens --> Events - Cost --> Events - Latency --> Events - - style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + response = llm.completion([ + Message.user([TextContent(text="Summarize our design doc")]) + ]) + print(response.message) + +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMAuthenticationError: + print("Invalid or missing API credentials.") +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") +except LLMError as e: + print(f"Unhandled LLM error: {e}") ``` -**Tracked Metrics:** -- **Token Usage:** Input tokens, output tokens, total -- **Cost:** Per-request cost using configured rates -- **Latency:** Request duration in milliseconds -- **Errors:** Failure types and retry counts +### Example: Using `.responses()` -### Cost Configuration +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError -Configure per-token costs for custom models: +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) -```python -llm = LLM( - model="custom/my-model", - input_cost_per_token=0.00001, # $0.01 per 1K tokens - output_cost_per_token=0.00003, # $0.03 per 1K tokens -) +try: + resp = llm.responses([ + Message.user( + [TextContent(text="Write a one-line haiku about code.")] + ) + ]) + print(resp.message) +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMError as e: + print(f"LLM error: {e}") ``` -**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) +## Exception reference -**Custom Costs:** Override for: -- Internal models -- Custom pricing agreements -- Cost estimation for budgeting +All exceptions live under `openhands.sdk.llm.exceptions` unless noted. -## Component Relationships +| Category | Error | Description | +|--------|------|-------------| +| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | +| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | +| | `LLMRateLimitError` | Provider rate limit exceeded. | +| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | +| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | +| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | +| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | +| | `LLMNoActionError` | Model did not return an action when one was expected. | +| | `LLMResponseError` | Could not extract an action from the response. | +| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | +| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | +| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | +| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | +| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | +| | `OperationCancelled` | A running operation was cancelled programmatically. | -### How LLM Integrates + + All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all + for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. + -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - LLM["LLM"] - Agent["Agent"] - Conversation["Conversation"] - Events["Events"] - Security["Security Analyzer"] - Condenser["Context Condenser"] - - Agent -->|Uses| LLM - LLM -->|Records| Events - Security -.->|Optional| LLM - Condenser -.->|Optional| LLM - Conversation -->|Provides context| Agent - - style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +### LLM Fallback Strategy +Source: https://docs.openhands.dev/sdk/guides/llm-fallback.md -**Relationship Characteristics:** -- **Agent → LLM**: Agent uses LLM for reasoning and tool calls -- **LLM → Events**: LLM requests/responses recorded as events -- **Security → LLM**: Optional security analyzer can use separate LLM -- **Condenser → LLM**: Optional context condenser can use separate LLM -- **Configuration**: LLM configured independently, passed to agent -- **Telemetry**: LLM metrics flow through event system to UI/logging +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -## See Also +> A ready-to-run example is available [here](#ready-to-run-example)! -- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions -- **[Events](/sdk/arch/events)** - LLM request/response event types -- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis -- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration +`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. -### MCP Integration -Source: https://docs.openhands.dev/sdk/arch/mcp.md +## Basic Usage -The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. +Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): -**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) +```python icon="python" wrap focus={16, 17, 21, 22, 23} +from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk.llm import FallbackStrategy -## Core Responsibilities +# Menage persisted LLM profiles +# default store directory: .openhands/profiles +store = LLMProfileStore() -The MCP Integration system has four primary responsibilities: +fallback_llm = LLM( + usage_id="fallback-1", + model="openai/gpt-4o", + api_key=SecretStr("your-openai-key"), +) +store.save("fallback-1", fallback_llm, include_secrets=True) -1. **MCP Client Management** - Connect to and communicate with MCP servers -2. **Tool Discovery** - Enumerate available tools from MCP servers -3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions -4. **Execution Bridge** - Execute MCP tool calls from agent actions +# Configure an LLM with a fallback strategy +primary_llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1"], + ), +) +``` -## Architecture +## How It Works -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% -flowchart TB - subgraph Client["MCP Client"] - Sync["MCPClient
Sync/Async bridge"] - Async["AsyncMCPClient
FastMCP base"] - end - - subgraph Bridge["Tool Bridge"] - Def["MCPToolDefinition
Schema conversion"] - Exec["MCPToolExecutor
Execution handler"] - end - - subgraph Integration["Agent Integration"] - Action["MCPToolAction
Dynamic model"] - Obs["MCPToolObservation
Result wrapper"] - end - - subgraph External["External"] - Server["MCP Server
stdio/HTTP"] - Tools["External Tools"] - end - - Sync --> Async - Async --> Server - - Server --> Def - Def --> Exec - - Exec --> Action - Action --> Server - Server --> Obs - - Server -.->|Spawns| Tools - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Sync,Async primary - class Def,Exec secondary - class Action,Obs tertiary +1. The primary LLM handles the request as normal +2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order +3. The first successful fallback response is returned to the caller +4. If all fallbacks fail, the original primary error is raised +5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + + +Only transient errors trigger fallback. +Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. +For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) + + +## Multiple Fallback Levels + +Chain as many fallback LLMs as you need. They are tried in list order: + +```python icon="python" wrap focus={5-7} +llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + ), +) ``` -### Key Components +If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | -| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | -| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | -| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | -| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | +## Custom Profile Store Directory -## MCP Client +By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: -### Sync/Async Bridge +```python icon="python" wrap focus={3} +FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir="/path/to/my/profiles", +) +``` -The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: +## Metrics -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Sync["Sync Code
Agent execution"] - Bridge["call_async_from_sync()"] - Executor["AsyncExecutor
Background loop"] - Async["Async MCP Call"] - Server["MCP Server"] - Result["Result"] - - Sync --> Bridge - Bridge --> Executor - Executor --> Async - Async --> Server - Server --> Result - Result --> Sync - - style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px +Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: + +```python icon="python" wrap +# After running a conversation +metrics = llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") + +for usage in metrics.token_usages: + print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") ``` -**Bridge Pattern:** -- **Problem:** MCP protocol is async, but agent tools run synchronously -- **Solution:** Background event loop that executes async code from sync contexts -- **Benefit:** Agents use MCP tools without async/await in tool definitions +Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. -**Client Features:** -- **Lifecycle Management:** `__enter__`/`__exit__` for context manager -- **Timeout Support:** Configurable timeouts for MCP operations -- **Error Handling:** Wraps MCP errors in observations -- **Connection Pooling:** Reuses connections across tool calls +## Use Cases -### MCP Server Configuration +- **Rate limit handling** — When one provider throttles you, seamlessly switch to another +- **High availability** — Keep your agent running during provider outages +- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure +- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. -MCP servers are configured using the FastMCP format: +## Ready-to-run Example -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "filesystem": { - "command": "npx", - "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] - } - } -} -``` + +This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) + -**Configuration Fields:** -- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) -- **args:** Arguments to pass to command -- **env:** Environment variables (optional) +```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py +"""Example: Using FallbackStrategy for LLM resilience. -## Tool Discovery and Conversion +When the primary LLM fails with a transient error (rate limit, timeout, etc.), +FallbackStrategy automatically tries alternate LLMs in order. Fallback is +per-call: each new request starts with the primary model. Token usage and +cost from fallback calls are merged into the primary LLM's metrics. -### Discovery Flow +This example: + 1. Saves two fallback LLM profiles to a temporary store. + 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. + 3. Runs a conversation — if the primary model is unavailable, the agent + transparently falls back to the next available model. +""" -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Config["MCP Config"] - Spawn["Spawn Server"] - List["List Tools"] - - subgraph Convert["Convert Each Tool"] - Schema["MCP Schema"] - Action["Generate Action Model"] - Def["Create ToolDefinition"] - end - - Register["Register in ToolRegistry"] - - Config --> Spawn - Spawn --> List - List --> Schema - - Schema --> Action - Action --> Def - Def --> Register - - style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +import os +import tempfile -**Discovery Steps:** +from pydantic import SecretStr -1. **Spawn Server:** Launch MCP server via stdio -2. **List Tools:** Call `tools/list` MCP endpoint -3. **Parse Schemas:** Extract tool names, descriptions, parameters -4. **Generate Models:** Dynamically create Pydantic models for actions -5. **Create Definitions:** Wrap in `ToolDefinition` objects -6. **Register:** Add to agent's tool registry +from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool +from openhands.sdk.llm import FallbackStrategy +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -### Schema Conversion -MCP tool schemas are converted to SDK tool definitions: +# Read configuration from environment +api_key = os.getenv("LLM_API_KEY", None) +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - MCP["MCP Tool Schema
JSON Schema"] - Parse["Parse Parameters"] - Model["Dynamic Pydantic Model
MCPToolAction"] - Def["ToolDefinition
SDK format"] - - MCP --> Parse - Parse --> Model - Model --> Def - - style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +profile_store_dir = tempfile.mkdtemp() +store = LLMProfileStore(base_dir=profile_store_dir) -**Conversion Rules:** +fallback_1 = LLM( + usage_id="fallback-1", + model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +) +store.save("fallback-1", fallback_1, include_secrets=True) -| MCP Schema | SDK Action Model | -|------------|------------------| -| **name** | Class name (camelCase) | -| **description** | Docstring | -| **inputSchema** | Pydantic fields | -| **required** | Field(required=True) | -| **type** | Python type hints | +fallback_2 = LLM( + usage_id="fallback-2", + model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), +) +store.save("fallback-2", fallback_2, include_secrets=True) -**Example:** +print(f"Saved fallback profiles: {store.list()}") -```python -# MCP Schema -{ - "name": "fetch_url", - "description": "Fetch content from URL", - "inputSchema": { - "type": "object", - "properties": { - "url": {"type": "string"}, - "timeout": {"type": "number"} - }, - "required": ["url"] - } -} -# Generated Action Model -class FetchUrl(MCPToolAction): - """Fetch content from URL""" - url: str - timeout: float | None = None +# Configure the primary LLM with a FallbackStrategy +primary_llm = LLM( + usage_id="agent-primary", + model=primary_model, + api_key=SecretStr(api_key), + base_url=base_url, + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir=profile_store_dir, + ), +) + + +# Run a conversation +agent = Agent( + llm=primary_llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +conversation = Conversation(agent=agent, workspace=os.getcwd()) +conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.run() + + +# Inspect metrics (includes any fallback usage) +metrics = primary_llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +print(f"Token usage records: {len(metrics.token_usages)}") +for usage in metrics.token_usages: + print( + f" model={usage.model}" + f" prompt={usage.prompt_tokens}" + f" completion={usage.completion_tokens}" + ) + +print(f"EXAMPLE_COST: {metrics.accumulated_cost}") ``` -## Tool Execution + -### Execution Flow +## Next Steps -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Agent["Agent generates action"] - Action["MCPToolAction"] - Executor["MCPToolExecutor"] - - Convert["Convert to MCP format"] - Call["MCP call_tool"] - Server["MCP Server"] - - Result["MCP Result"] - Obs["MCPToolObservation"] - Return["Return to Agent"] - - Agent --> Action - Action --> Executor - Executor --> Convert - Convert --> Call - Call --> Server - Server --> Result - Result --> Obs - Obs --> Return - - style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles +- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) +- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models -**Execution Steps:** +### Image Input +Source: https://docs.openhands.dev/sdk/guides/llm-image-input.md -1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` -2. **Executor Lookup:** Find `MCPToolExecutor` for tool name -3. **Format Conversion:** Convert action fields to MCP arguments -4. **MCP Call:** Execute `call_tool` via MCP client -5. **Result Parsing:** Parse MCP result (text, images, resources) -6. **Observation Creation:** Wrap in `MCPToolObservation` -7. **Error Handling:** Catch exceptions, return error observations +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### MCPToolExecutor +> A ready-to-run example is available [here](#ready-to-run-example)! -Executors bridge SDK actions to MCP calls: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Executor["MCPToolExecutor"] - Client["MCP Client"] - Name["tool_name"] - - Executor -->|Uses| Client - Executor -->|Knows| Name - - style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Sending Images -**Executor Responsibilities:** -- **Client Management:** Hold reference to MCP client -- **Tool Identification:** Know which MCP tool to call -- **Argument Conversion:** Transform action fields to MCP format -- **Result Handling:** Parse MCP responses -- **Error Recovery:** Handle connection errors, timeouts, server failures +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). -## MCP Tool Lifecycle +Pass images along with text in the message content: -### From Configuration to Execution +```python focus={14} icon="python" wrap +from openhands.sdk import ImageContent -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Load["Load MCP Config"] - Start["Start Conversation"] - Spawn["Spawn MCP Servers"] - Discover["Discover Tools"] - Register["Register Tools"] - - Ready["Agent Ready"] - - Step["Agent Step"] - LLM["LLM Tool Call"] - Execute["Execute MCP Tool"] - Result["Return Observation"] - - End["End Conversation"] - Cleanup["Close MCP Clients"] - - Load --> Start - Start --> Spawn - Spawn --> Discover - Discover --> Register - Register --> Ready - - Ready --> Step - Step --> LLM - LLM --> Execute - Execute --> Result - Result --> Step - - Step --> End - End --> Cleanup - - style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) ``` -**Lifecycle Phases:** +Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. -| Phase | Operations | Components | -|-------|-----------|------------| -| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | -| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | -| **Execution** | Handle tool calls | Agent, MCPToolAction | -| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | +## Ready-to-run Example -## MCP Annotations + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + -MCP tools can include metadata hints for agents: +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Tool["MCP Tool"] - - subgraph Annotations - ReadOnly["readOnlyHint"] - Destructive["destructiveHint"] - Progress["progressEnabled"] - end - - Security["Security Analysis"] - - Tool --> ReadOnly - Tool --> Destructive - Tool --> Progress - - ReadOnly --> Security - Destructive --> Security - - style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. -**Annotation Types:** +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" -| Annotation | Meaning | Use Case | -|------------|---------|----------| -| **readOnlyHint** | Tool doesn't modify state | Lower security risk | -| **destructiveHint** | Tool modifies/deletes data | Require confirmation | -| **progressEnabled** | Tool reports progress | Show progress UI | +import os -These annotations feed into the security analyzer for risk assessment. +from pydantic import SecretStr -## Component Relationships +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool -### How MCP Integrates -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - MCP["MCP System"] - Skills["Skills"] - Tools["Tool Registry"] - Agent["Agent"] - Security["Security"] - - Skills -->|Configures| MCP - MCP -->|Registers| Tools - Agent -->|Uses| Tools - MCP -->|Provides hints| Security - - style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +logger = get_logger(__name__) + +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." + +cwd = os.getcwd() + +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) -**Relationship Characteristics:** -- **Skills → MCP**: Repository skills can embed MCP configurations -- **MCP → Tools**: MCP tools registered alongside native tools -- **Agent → Tools**: Agents use MCP tools like any other tool -- **MCP → Security**: Annotations inform security risk assessment -- **Transparent Integration**: Agent doesn't distinguish MCP from native tools +llm_messages = [] # collect raw LLM messages for inspection -## Design Rationale -**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. -**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. +IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" -**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() -**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() -## See Also +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework -- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills -- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment -- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications -- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -### Overview -Source: https://docs.openhands.dev/sdk/arch/overview.md + -The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. +## Next Steps -Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently -## Relationship with OpenHands Applications +### LLM Profile Store +Source: https://docs.openhands.dev/sdk/guides/llm-profile-store.md -The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. -- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. -- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. -- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% -graph TB - subgraph Interfaces["OpenHands Interfaces"] - UI[OpenHands GUI
React frontend] - CLI[OpenHands CLI
Command-line interface] - Custom[Your Custom Client
Automations & workflows] - end +> A ready-to-run example is available [here](#ready-to-run-example)! - SDK[Software Agent SDK
openhands.sdk + tools + workspace] - - subgraph External["External Services"] - LLM[LLM Providers
OpenAI, Anthropic, etc.] - Runtime[Runtime Services
Docker, Remote API, etc.] - end +The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. +Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. - UI --> SDK - CLI --> SDK - Custom --> SDK - - SDK --> LLM - SDK --> Runtime - - classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class UI,CLI,Custom interface - class SDK sdk - class LLM,Runtime external -``` +## Benefits +- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. +- **Reusability:** Import a defined profile into any script or session with a single identifier. +- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. +## How It Works -## Four-Package Architecture + + + ### Create a Store -The agent-sdk is organized into four distinct Python packages: + The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, + but you can point it anywhere. -| Package | What It Does | When You Need It | -|---------|-------------|------------------| -| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | -| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | -| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | -| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | + ```python icon="python" focus={3, 4, 6, 7} + from openhands.sdk import LLMProfileStore -### Two Deployment Modes + # Default location: ~/.openhands/profiles + store = LLMProfileStore() -The SDK supports two deployment architectures depending on your needs: + # Or bring your own directory + store = LLMProfileStore(base_dir="./my-profiles") + ``` + + + ### Save a Profile -#### Mode 1: Local Development + Got an LLM configured just right? Save it for later. -**Installation:** Just install `openhands-sdk` + `openhands-tools` + ```python icon="python" focus={11, 12} + from pydantic import SecretStr + from openhands.sdk import LLM, LLMProfileStore -```bash -pip install openhands-sdk openhands-tools -``` + fast_llm = LLM( + usage_id="fast", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("sk-..."), + temperature=0.0, + ) -**Architecture:** + store = LLMProfileStore() + store.save("fast", fast_llm) + ``` -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk - Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools - - SDK -->|uses| Tools - - classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 - classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 -``` + + API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to + persist them; otherwise, they will be read from the environment at load time. + +
+ + ### Load a Profile -- `LocalWorkspace` included in SDK (no extra install) -- Everything runs in one process -- Perfect for prototyping and simple use cases -- Quick setup, no Docker required + Next time you need that LLM, just load it: -#### Mode 2: Production / Sandboxed + ```python icon="python" + # Same model, ready to go. + llm = store.load("fast") + ``` + + + ### List and Clean Up -**Installation:** Install all 4 packages + See what you've got, delete what you don't need: -```bash -pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server -``` + ```python icon="python" focus={1, 3, 4} + print(store.list()) # ['fast.json', 'creative.json'] -**Architecture:** + store.delete("creative") + print(store.list()) # ['fast.json'] + ``` + +
-```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% -flowchart LR - - WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk - - subgraph WS[" "] - direction LR - Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws - Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws - end - - Server["openhands.agent_server
FastAPI + WebSocket"]:::server - Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk - Tools["openhands.tools
BashTool · FileEditor · …"]:::tools - - WSBase -.->|extended by| Docker - WSBase -.->|extended by| Remote - Docker -->|spawns container with| Server - Remote -->|connects via HTTP to| Server - Server -->|runs| Agent - Agent -->|uses| Tools - - classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 - classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 - classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 - classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 - - style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none -``` +## Good to Know -- `RemoteWorkspace` auto-spawns agent-server in containers -- Sandboxed execution for security -- Multi-user deployments -- Distributed systems (e.g., Kubernetes) support +Profile names must be simple filenames (no slashes, no dots at the start). - -**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). - +## Ready-to-run Example -### SDK Package (`openhands.sdk`) + +This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) + -**Purpose:** Core components and base classes for OpenHands agent. +```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py +"""Example: Using LLMProfileStore to save and reuse LLM configurations. -**Key Components:** -- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop -- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle -- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry -- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration -- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) -- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) -- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation -- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management -- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution +LLMProfileStore persists LLM configurations as JSON files, so you can define +a profile once and reload it across sessions without repeating setup code. +""" + +import os +import tempfile -**Design:** Stateless, immutable components with type-safe Pydantic models. +from pydantic import SecretStr -**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. +from openhands.sdk import LLM, LLMProfileStore -**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) -### Tools Package (`openhands.tools`) +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +store = LLMProfileStore(base_dir=tempfile.mkdtemp()) - -**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. - +# 1. Create two LLM profiles with different usage -**Purpose:** Pre-built tools following consistent patterns. +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. +fast_llm = LLM( + usage_id="fast", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.0, +) - -For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. - +creative_llm = LLM( + usage_id="creative", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.9, +) +# 2. Save profiles -### Workspace Package (`openhands.workspace`) +# Note that secrets are excluded by default for safety. +store.save("fast", fast_llm) +store.save("creative", creative_llm) -**Purpose:** Workspace implementations extending SDK base classes. +# To persist the API key as well, pass `include_secrets=True`: +# store.save("fast", fast_llm, include_secrets=True) -**Key Components:** Docker Workspace, Remote API Workspace, and more. +# 3. List available persisted profiles -**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. +print(f"Stored profiles: {store.list()}") -**Use Cases:** Sandboxed execution, multi-user deployments, production environments. +# 4. Load a profile - -For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). - +loaded = store.load("fast") +assert isinstance(loaded, LLM) +print( + "Loaded profile. " + f"usage:{loaded.usage_id}, " + f"model: {loaded.model}, " + f"temperature: {loaded.temperature}." +) -### Agent Server Package (`openhands.agent_server`) +# 5. Delete a profile -**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. +store.delete("creative") +print(f"After deletion: {store.list()}") -**Features:** -- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode -- Service management with isolated per-user sessions -- API key authentication and health checking +print("EXAMPLE_COST: 0") +``` -**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). + -**Use Cases:** Multi-user web apps, SaaS products, distributed systems. +## Next Steps - -For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). - +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully -## How Components Work Together +### Reasoning +Source: https://docs.openhands.dev/sdk/guides/llm-reasoning.md -### Basic Execution Flow (Local) +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -When you send a message to an agent, here's what happens: +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. -```mermaid -sequenceDiagram - participant You - participant Conversation - participant Agent - participant LLM - participant Tool - - You->>Conversation: "Create hello.txt" - Conversation->>Agent: Process message - Agent->>LLM: What should I do? - LLM-->>Agent: Use BashTool("touch hello.txt") - Agent->>Tool: Execute action - Note over Tool: Runs in same environment
as Agent (local/container/remote) - Tool-->>Agent: Observation - Agent->>LLM: Got result, continue? - LLM-->>Agent: Done - Agent-->>Conversation: Update state - Conversation-->>You: "File created!" -``` +This guide demonstrates two provider-specific approaches: +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter -**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. +## Anthropic Extended Thinking -### Deployment Flexibility +> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! -The same agent code runs in different environments by swapping workspace configuration: +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process +through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. -```mermaid -graph TB - subgraph "Your Code (Unchanged)" - Code["Agent + Tools + LLM"] - end - - subgraph "Deployment Options" - Local["Local
Direct execution"] - Docker["Docker
Containerized"] - Remote["Remote
Multi-user server"] - end - - Code -->|LocalWorkspace| Local - Code -->|DockerWorkspace| Docker - Code -->|RemoteAPIWorkspace| Remote - - style Code fill:#e1f5fe - style Local fill:#e8f5e8 - style Docker fill:#e8f5e8 - style Remote fill:#e8f5e8 -``` +### How It Works -## Next Steps +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: -### Get Started -- [Getting Started](/sdk/getting-started) – Build your first agent -- [Hello World](/sdk/guides/hello-world) – Minimal example +```python focus={6-11} icon="python" wrap +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") -### Explore Components +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` -**SDK Package:** -- [Agent](/sdk/arch/agent) – Core reasoning-action loop -- [Conversation](/sdk/arch/conversation) – State management and lifecycle -- [LLM](/sdk/arch/llm) – Language model integration -- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern -- [Events](/sdk/arch/events) – Typed event framework -- [Workspace](/sdk/arch/workspace) – Base workspace architecture +### Understanding Thinking Blocks -**Tools Package:** -- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: -**Workspace Package:** -- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data -**Agent Server:** -- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, +giving you insight into how Claude is approaching the problem. -### Deploy -- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely -- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup -- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service -- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server +### Ready-to-run Example Antrophic -### Source Code -- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework -- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools -- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces -- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + -### SDK Package -Source: https://docs.openhands.dev/sdk/arch/sdk.md +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" -The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. +import os -**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) +from pydantic import SecretStr -## Purpose +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool -The SDK package handles: -- **Agent reasoning loop**: How agents process messages and make decisions -- **State management**: Conversation lifecycle and persistence -- **LLM integration**: Provider-agnostic language model access -- **Tool system**: Typed actions and observations -- **Workspace abstraction**: Where code executes -- **Extensibility**: Skills, condensers, MCP, security -## Core Components +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -```mermaid -graph TB - Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] - - Agent --> LLM[LLM
Language Model] - Agent --> Tools[Tool System
Capabilities] - Agent --> Micro[Skills
Behavior Modules] - Agent --> Cond[Condenser
Memory Manager] - - Tools --> Workspace[Workspace
Execution] - - Conv --> Events[Events
Communication] - Tools --> MCP[MCP
External Tools] - Workspace --> Security[Security
Validation] - - style Conv fill:#e1f5fe - style Agent fill:#f3e5f5 - style LLM fill:#e8f5e8 - style Tools fill:#fff3e0 - style Workspace fill:#fce4ec -``` +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -### 1. Conversation - State & Lifecycle +# Setup agent with bash tool +agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) -**What it does**: Manages the entire conversation lifecycle and state. -**Key responsibilities**: -- Maintains conversation state (immutable) -- Handles message flow between user and agent -- Manages turn-taking and async execution -- Persists and restores conversation state -- Emits events for monitoring +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") -**Design decisions**: -- **Immutable state**: Each operation returns a new Conversation instance -- **Serializable**: Can be saved to disk or database and restored -- **Async-first**: Built for streaming and concurrent execution -**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) -**Example use cases**: -- Saving conversation to database after each turn -- Implementing undo/redo functionality -- Building multi-session chatbots -- Time-travel debugging +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() -**Learn more**: -- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) -- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) -- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("✅ Done!") ---- +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -### 2. Agent - The Reasoning Loop + -**What it does**: The core reasoning engine that processes messages and decides what to do. +## OpenAI Reasoning via Responses API -**Key responsibilities**: -- Receives messages and current state -- Consults LLM to reason about next action -- Validates and executes tool calls -- Processes observations and loops until completion -- Integrates with skills for specialized behavior +> A ready-to-run example is available [here](#ready-to-run-example-openai)! -**Design decisions**: -- **Stateless**: Agent doesn't hold state, operates on Conversation -- **Extensible**: Behavior can be modified via skills -- **Provider-agnostic**: Works with any LLM through unified interface +OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) +that provides access to the model's reasoning process. +By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. -**The reasoning loop**: -1. Receive message from Conversation -2. Add message to context -3. Consult LLM with full conversation history -4. If LLM returns tool call → validate and execute tool -5. If tool returns observation → add to context, go to step 3 -6. If LLM returns response → done, return to user +### How It Works -**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: -**Example use cases**: -- Planning agents that break tasks into steps -- Code review agents with specific checks -- Agents with domain-specific reasoning patterns +```python focus={5} icon="python" wrap +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` -**Learn more**: -- Guide: [Custom Agents](/sdk/guides/agent-custom) -- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) -- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of +reasoning performed by the model. ---- +Then capture reasoning traces in your callback: -### 3. LLM - Language Model Integration +```python focus={3-4} icon="python" wrap +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` -**What it does**: Provides a provider-agnostic interface to language models. +### Understanding Reasoning Traces -**Key responsibilities**: -- Abstracts different LLM providers (OpenAI, Anthropic, etc.) -- Handles message formatting and conversion -- Manages streaming responses -- Supports tool calling and reasoning modes -- Handles retries and error recovery +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. +These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. +Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. -**Design decisions**: -- **Provider-agnostic**: Same API works with any provider -- **Streaming-first**: Built for real-time responses -- **Type-safe**: Pydantic models for all messages -- **Extensible**: Easy to add new providers +### Ready-to-run Example OpenAI -**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: -- Cost optimization (switch to cheaper models) -- Testing with different models -- Avoiding vendor lock-in -- Supporting customer choice + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + -**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation -**Example use cases**: -- Routing requests to different models based on complexity -- Implementing custom caching strategies -- Adding observability hooks +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" -**Learn more**: -- Guide: [LLM Registry](/sdk/guides/llm-registry) -- Guide: [LLM Routing](/sdk/guides/llm-routing) -- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) -- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) +from __future__ import annotations ---- +import os -### 4. Tool System - Typed Capabilities +from pydantic import SecretStr -**What it does**: Defines what agents can do through a typed action/observation pattern. +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent -**Key responsibilities**: -- Defines tool schemas (inputs and outputs) -- Validates actions before execution -- Executes tools and returns typed observations -- Generates JSON schemas for LLM tool calling -- Registers tools with the agent -**Design decisions**: -- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs -- **Schema generation**: Pydantic models auto-generate JSON schemas -- **Executor pattern**: Separation of tool definition and execution -- **Composable**: Tools can call other tools +logger = get_logger(__name__) -**The three components**: -1. **Action**: Input schema (what the tool accepts) -2. **Observation**: Output schema (what the tool returns) -3. **ToolExecutor**: Logic that transforms Action → Observation +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." -**Why this pattern?** -- Type safety catches errors early -- LLMs get accurate schemas for tool calling -- Tools are testable in isolation -- Easy to compose tools +model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +base_url = os.getenv("LLM_BASE_URL") -**When to customize**: When you need domain-specific capabilities not covered by built-in tools. +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", +) -**Example use cases**: -- Database query tools -- API integration tools -- Custom file format parsers -- Domain-specific calculators +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) -**Learn more**: -- Guide: [Custom Tools](/sdk/guides/custom-tools) -- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) +llm_messages = [] # collect raw LLM-convertible messages for inspection ---- -### 5. Workspace - Execution Abstraction +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -**What it does**: Abstracts *where* code executes (local, Docker, remote). -**Key responsibilities**: -- Provides unified interface for code execution -- Handles file operations across environments -- Manages working directories -- Supports different isolation levels +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) -**Design decisions**: -- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package -- **Environment-agnostic**: Code works the same locally or remotely -- **Lazy initialization**: Workspace setup happens on first use +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() -**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. +conversation.send_message("Now delete FACTS.txt.") +conversation.run() -**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") -**Learn more**: -- Architecture: [Workspace Architecture](/sdk/arch/workspace) -- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) -- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + ---- +## Use Cases -### 6. Events - Component Communication +**Debugging**: Understand why the agent made specific decisions or took certain actions. -**What it does**: Enables observability and debugging through event emissions. +**Transparency**: Show users how the AI arrived at its conclusions. -**Key responsibilities**: -- Defines event types (messages, actions, observations, errors) -- Emitted by Conversation, Agent, Tools -- Enables logging, debugging, and monitoring -- Supports custom event handlers +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. -**Design decisions**: -- **Immutable**: Events are snapshots, not mutable objects -- **Serializable**: Can be logged, stored, replayed -- **Type-safe**: Pydantic models for all events +**Learning**: Study how models approach complex problems. -**Why events?** They provide a timeline of what happened during agent execution. Essential for: -- Debugging agent behavior -- Understanding decision-making -- Building observability dashboards -- Implementing custom logging +## Next Steps -**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities -**Learn more**: -- Guide: [Metrics and Observability](/sdk/guides/metrics) -- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) +### LLM Registry +Source: https://docs.openhands.dev/sdk/guides/llm-registry.md ---- +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### 7. Condenser - Memory Management +> A ready-to-run example is available [here](#ready-to-run-example)! -**What it does**: Compresses conversation history when it gets too long. +Use the LLM registry to manage multiple LLM providers and dynamically switch between models. -**Key responsibilities**: -- Monitors conversation length -- Summarizes older messages -- Preserves important context -- Keeps conversation within token limits +## Using the Registry -**Design decisions**: -- **Pluggable**: Different condensing strategies -- **Automatic**: Triggered when context gets large -- **Preserves semantics**: Important information retained +You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. -**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. +```python icon="python" focus={9,10,13} +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. +# define the registry and add an LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +... +# retrieve the LLM by its usage ID +llm = llm_registry.get("agent") +``` -**Example strategies**: -- Summarize old messages -- Keep only last N turns -- Preserve task-related messages +## Ready-to-run Example -**Learn more**: -- Guide: [Context Condenser](/sdk/guides/context-condenser) -- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + ---- +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os -### 8. MCP - Model Context Protocol +from pydantic import SecretStr -**What it does**: Integrates external tool servers via Model Context Protocol. +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool -**Key responsibilities**: -- Connects to MCP-compatible tool servers -- Translates MCP tools to SDK tool format -- Manages server lifecycle -- Handles server communication -**Design decisions**: -- **Standard protocol**: Uses MCP specification -- **Transparent integration**: MCP tools look like regular tools to agents -- **Process management**: Handles server startup/shutdown +logger = get_logger(__name__) -**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -**When to use**: When you need tools that: -- Already have MCP servers (fetch, filesystem, etc.) -- Are too complex to rewrite as SDK tools -- Need to run in separate processes -- Are provided by third parties +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -**Learn more**: -- Guide: [MCP Integration](/sdk/guides/mcp) -- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) -- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) ---- +# Get LLM from registry +llm = llm_registry.get("agent") -### 9. Skills (formerly Microagents) - Behavior Modules +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] -**What it does**: Specialized modules that modify agent behavior for specific tasks. +# Agent +agent = Agent(llm=llm, tools=tools) -**Key responsibilities**: -- Provide domain-specific instructions -- Modify system prompts -- Guide agent decision-making -- Compose to create specialized agents +llm_messages = [] # collect raw LLM messages -**Design decisions**: -- **Composable**: Multiple skills can work together -- **Declarative**: Defined as configuration, not code -- **Reusable**: Share skills across agents -**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -**Example skills**: -- GitHub operations (issue creation, PRs) -- Code review guidelines -- Documentation style enforcement -- Project-specific conventions -**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -**Learn more**: -- Guide: [Agent Skills & Context](/sdk/guides/skill) -- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) +conversation.send_message("Please echo 'Hello!'") +conversation.run() ---- +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -### 10. Security - Validation & Sandboxing +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") -**What it does**: Validates inputs and enforces security constraints. +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") -**Key responsibilities**: -- Input validation -- Command sanitization -- Path traversal prevention -- Resource limits +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") -**Design decisions**: -- **Defense in depth**: Multiple validation layers -- **Fail-safe**: Rejects suspicious inputs by default -- **Configurable**: Adjust security levels as needed +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -**Why needed?** Agents execute arbitrary code and file operations. Security prevents: -- Malicious prompts escaping sandboxes -- Path traversal attacks -- Resource exhaustion -- Unintended system access + -**When to customize**: When you need domain-specific validation rules or want to adjust security policies. -**Learn more**: -- Guide: [Security and Secrets](/sdk/guides/security) -- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) +## Next Steps ---- +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs -## How Components Work Together +### Model Routing +Source: https://docs.openhands.dev/sdk/guides/llm-routing.md -### Example: User asks agent to create a file +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -``` -1. User → Conversation: "Create a file called hello.txt with 'Hello World'" +This feature is under active development and more default routers will be available in future releases. -2. Conversation → Agent: New message event +> A ready-to-run example is available [here](#ready-to-run-example)! -3. Agent → LLM: Full conversation history + available tools +### Using the built-in MultimodalRouter -4. LLM → Agent: Tool call for FileEditorTool.create() +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: -5. Agent → Tool System: Validate FileEditorAction +```python icon="python" wrap focus={13-16} +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) +``` -6. Tool System → Tool Executor: Execute action +You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. -7. Tool Executor → Workspace: Create file (local/docker/remote) +## Ready-to-run Example -8. Workspace → Tool Executor: Success + +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) + -9. Tool Executor → Tool System: FileEditorObservation (success=true) +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: -10. Tool System → Agent: Observation +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py +import os -11. Agent → LLM: Updated history with observation +from pydantic import SecretStr -12. LLM → Agent: "File created successfully" +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools -13. Agent → Conversation: Done, final response -14. Conversation → User: "File created successfully" -``` +logger = get_logger(__name__) -Throughout this flow: -- **Events** are emitted for observability -- **Condenser** may trigger if history gets long -- **Skills** influence LLM's decision-making -- **Security** validates file paths and operations -- **MCP** could provide additional tools if configured +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -## Design Patterns +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="openhands/devstral-small-2507", + base_url=base_url, + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) -### Immutability +# Tools +tools = get_default_tools() # Use our default openhands experience -All core objects are immutable. Operations return new instances: +# Agent +agent = Agent(llm=multimodal_router, tools=tools) -```python -conversation = Conversation(...) -new_conversation = conversation.add_message(message) -# conversation is unchanged, new_conversation has the message -``` +llm_messages = [] # collect raw LLM messages -**Why?** Makes debugging easier, enables time-travel, ensures serializability. -### Composition Over Inheritance +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -Agents are composed from: -- LLM provider -- Tool list -- Skill list -- Condenser strategy -- Security policy -You don't subclass Agent - you configure it. +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() +) -**Why?** More flexible, easier to test, enables runtime configuration. +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) +conversation.run() -### Type Safety +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() -Everything uses Pydantic models: -- Messages, actions, observations are typed -- Validation happens automatically -- Schemas generate from types +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], + ) +) +conversation.run() -**Why?** Catches errors early, provides IDE support, self-documenting. +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -## Next Steps +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -### For Usage Examples + -- [Getting Started](/sdk/getting-started) - Build your first agent -- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities -- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers -- [Conversation Management](/sdk/guides/convo-persistence) - State handling -### For Related Architecture +## Next Steps -- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations -- [Workspace Architecture](/sdk/arch/workspace) - Execution environments -- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs -### For Implementation Details +### LLM Streaming +Source: https://docs.openhands.dev/sdk/guides/llm-streaming.md -- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code -- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code -- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code -- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -### Security -Source: https://docs.openhands.dev/sdk/arch/security.md + +This is currently only supported for the chat completion endpoint. + -The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. +> A ready-to-run example is available [here](#ready-to-run-example)! -**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) -## Core Responsibilities +Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use +streaming callbacks to process and display tokens as they arrive from the language model. -The Security system has four primary responsibilities: -1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions -2. **Confirmation Policy** - Determine when user approval is required based on risk -3. **Action Validation** - Enforce security policies before execution -4. **Audit Trail** - Record security decisions in event history +## How It Works -## Architecture +Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the +complete response. This creates a more responsive user experience, especially for long-form content generation. -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph Interface["Abstract Interface"] - Base["SecurityAnalyzerBase
Abstract analyzer"] - end - - subgraph Implementations["Concrete Analyzers"] - LLM["LLMSecurityAnalyzer
Inline risk prediction"] - NoOp["NoOpSecurityAnalyzer
No analysis"] - end - - subgraph Risk["Risk Levels"] - Low["LOW
Safe operations"] - Medium["MEDIUM
Moderate risk"] - High["HIGH
Dangerous ops"] - Unknown["UNKNOWN
Unanalyzed"] - end - - subgraph Policy["Confirmation Policy"] - Check["should_require_confirmation()"] - Mode["Confirmation Mode"] - Decision["Require / Allow"] - end - - Base --> LLM - Base --> NoOp - - Implementations --> Low - Implementations --> Medium - Implementations --> High - Implementations --> Unknown - - Low --> Check - Medium --> Check - High --> Check - Unknown --> Check - - Check --> Mode - Mode --> Decision - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - - class Base primary - class LLM secondary - class High danger - class Check tertiary -``` + + + ### Enable Streaming on LLM + Configure the LLM with streaming enabled: -### Key Components + ```python focus={6} icon="python" wrap + llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, # Enable streaming + ) + ``` + + + ### Define Token Callback + Create a callback function that processes streaming chunks as they arrive: -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | -| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | -| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | -| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | -| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | + ```python icon="python" wrap + def on_token(chunk: ModelResponseStream) -> None: + """Process each streaming chunk as it arrives.""" + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + content = getattr(delta, "content", None) + if isinstance(content, str): + sys.stdout.write(content) + sys.stdout.flush() + ``` -## Risk Levels + The callback receives a `ModelResponseStream` object containing: + - **`choices`**: List of response choices from the model + - **`delta`**: Incremental content changes for each choice + - **`content`**: The actual text tokens being streamed + + + ### Register Callback with Conversation -Security analyzers return one of four risk levels: + Pass your token callback to the conversation: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - Action["ActionEvent"] - Analyze["Security Analyzer"] - - subgraph Levels["Risk Levels"] - Low["LOW
Read-only, safe"] - Medium["MEDIUM
Modify files"] - High["HIGH
Delete, execute"] - Unknown["UNKNOWN
Not analyzed"] - end - - Action --> Analyze - Analyze --> Low - Analyze --> Medium - Analyze --> High - Analyze --> Unknown - - style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px - style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px - style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px -``` + ```python focus={3} icon="python" wrap + conversation = Conversation( + agent=agent, + token_callbacks=[on_token], # Register streaming callback + workspace=os.getcwd(), + ) + ``` -### Risk Level Definitions + The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers + if needed (e.g., one for display, another for logging). +
+
-| Level | Characteristics | Examples | -|-------|----------------|----------| -| **LOW** | Read-only, no state changes | File reading, directory listing, search | -| **MEDIUM** | Modifies user data | File editing, creating files, API calls | -| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | -| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | +## Ready-to-run Example -## Security Analyzers + +This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) + -### LLMSecurityAnalyzer +```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py +import os +import sys +from typing import Literal -Leverages the LLM's inline risk assessment during action generation: +from pydantic import SecretStr -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Schema["Tool Schema
+ security_risk param"] - LLM["LLM generates action
with security_risk"] - ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] - Extract["Extract security_risk
from arguments"] - ActionEvent["ActionEvent
with security_risk set"] - Analyzer["LLMSecurityAnalyzer
returns security_risk"] - - Schema --> LLM - LLM --> ToolCall - ToolCall --> Extract - Extract --> ActionEvent - ActionEvent --> Analyzer - - style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +from openhands.sdk import ( + Conversation, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.sdk.llm.streaming import ModelResponseStream +from openhands.tools.preset.default import get_default_agent -**Analysis Process:** -1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema -2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments -3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments -4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` -5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level -6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step +logger = get_logger(__name__) -**Example Tool Call:** -```json -{ - "name": "execute_bash", - "arguments": { - "command": "rm -rf /tmp/cache", - "security_risk": "HIGH" - } -} -``` -The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +if not api_key: + raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") -**Configuration:** -- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent -- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools -- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, +) -### NoOpSecurityAnalyzer +agent = get_default_agent(llm=llm, cli_mode=True) -Passthrough analyzer that skips analysis: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Action["ActionEvent"] - NoOp["NoOpSecurityAnalyzer"] - Unknown["SecurityRisk.UNKNOWN"] - - Action --> NoOp --> Unknown - - style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px -``` +# Define streaming states +StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] +# Track state across on_token calls for boundary detection +_current_state: StreamingState | None = None -**Use Case:** Development, trusted environments, or when confirmation mode handles all actions -## Confirmation Policy +def on_token(chunk: ModelResponseStream) -> None: + """ + Handle all types of streaming tokens including content, + tool calls, and thinking blocks with dynamic boundary detection. + """ + global _current_state -The confirmation policy determines when user approval is required. There are three policy implementations: + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + # Handle thinking blocks (reasoning content) + reasoning_content = getattr(delta, "reasoning_content", None) + if isinstance(reasoning_content, str) and reasoning_content: + if _current_state != "thinking": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("THINKING: ") + _current_state = "thinking" + sys.stdout.write(reasoning_content) + sys.stdout.flush() -**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) + # Handle regular content + content = getattr(delta, "content", None) + if isinstance(content, str) and content: + if _current_state != "content": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("CONTENT: ") + _current_state = "content" + sys.stdout.write(content) + sys.stdout.flush() -### Policy Types + # Handle tool calls + tool_calls = getattr(delta, "tool_calls", None) + if tool_calls: + for tool_call in tool_calls: + tool_name = ( + tool_call.function.name if tool_call.function.name else "" + ) + tool_args = ( + tool_call.function.arguments + if tool_call.function.arguments + else "" + ) + if tool_name: + if _current_state != "tool_name": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL NAME: ") + _current_state = "tool_name" + sys.stdout.write(tool_name) + sys.stdout.flush() + if tool_args: + if _current_state != "tool_args": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL ARGS: ") + _current_state = "tool_args" + sys.stdout.write(tool_args) + sys.stdout.flush() -| Policy | Behavior | Use Case | -|--------|----------|----------| -| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | -| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | -| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | -### ConfirmRisky (Default Policy) +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + token_callbacks=[on_token], +) -The most flexible policy with configurable thresholds: +story_prompt = ( + "Tell me a long story about LLM streaming, write it a file, " + "make sure it has multiple paragraphs. " +) +conversation.send_message(story_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Risk["SecurityRisk"] - CheckUnknown{"Risk ==
UNKNOWN?"} - UseConfirmUnknown{"confirm_unknown
setting?"} - CheckThreshold{"risk.is_riskier
(threshold)?"} - - Confirm["Require Confirmation"] - Allow["Allow Execution"] - - Risk --> CheckUnknown - CheckUnknown -->|Yes| UseConfirmUnknown - CheckUnknown -->|No| CheckThreshold - - UseConfirmUnknown -->|True| Confirm - UseConfirmUnknown -->|False| Allow - - CheckThreshold -->|Yes| Confirm - CheckThreshold -->|No| Allow - - style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px - style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px +cleanup_prompt = ( + "Thank you. Please delete the streaming story file now that I've read it, " + "then confirm the deletion." +) +conversation.send_message(cleanup_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -**Configuration:** -- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required - - Cannot be set to `UNKNOWN` - - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` -- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation + -### Confirmation Rules by Policy +## Next Steps -#### ConfirmRisky with threshold=HIGH (Default) +- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully +- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI -| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | -|------------|----------------------------------|-------------------------| -| **LOW** | ✅ Allow | ✅ Allow | -| **MEDIUM** | ✅ Allow | ✅ Allow | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | +### LLM Subscriptions +Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions.md -#### ConfirmRisky with threshold=MEDIUM +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | -|------------|------------------------|-------------------------| -| **LOW** | ✅ Allow | ✅ Allow | -| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. + -#### ConfirmRisky with threshold=LOW +> A ready-to-run example is available [here](#ready-to-run-example)! -| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | -|------------|------------------------|-------------------------| -| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | -| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | -| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | -| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | +Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. -**Key Rules:** -- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` -- **UNKNOWN handling** is configurable via `confirm_unknown` flag -- **Threshold cannot be UNKNOWN** - validated at policy creation time +## How It Works + + + ### Call subscription_login() -## Component Relationships + The `LLM.subscription_login()` class method handles the entire authentication flow: -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Security["Security Analyzer"] - Agent["Agent"] - Conversation["Conversation"] - Tools["Tools"] - MCP["MCP Tools"] - - Agent -->|Validates actions| Security - Security -->|Checks| Tools - Security -->|Uses hints| MCP - Conversation -->|Pauses for confirmation| Agent - - style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` + ```python icon="python" + from openhands.sdk import LLM -**Relationship Characteristics:** -- **Agent → Security**: Validates actions before execution -- **Security → Tools**: Examines tool characteristics (annotations) -- **Security → MCP**: Uses MCP hints for risk assessment -- **Conversation → Agent**: Pauses for user confirmation when required -- **Optional Component**: Security analyzer can be disabled for trusted environments + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` -## See Also + On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. + + + ### Use the LLM -- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers -- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints -- **[Security Guide](/sdk/guides/security)** - Configuring security policies + Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. + + -### Skill -Source: https://docs.openhands.dev/sdk/arch/skill.md +## Supported Models -The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. +The following models are available via ChatGPT subscription: -**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) +| Model | Description | +|-------|-------------| +| `gpt-5.2-codex` | Latest Codex model (default) | +| `gpt-5.2` | GPT-5.2 base model | +| `gpt-5.1-codex-max` | High-capacity Codex model | +| `gpt-5.1-codex-mini` | Lightweight Codex model | -## Core Responsibilities +## Configuration Options -The Skill system has four primary responsibilities: +### Force Fresh Login -1. **Context Injection** - Add specialized prompts to agent context based on triggers -2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) -3. **MCP Integration** - Load MCP tools associated with repository skills -4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats +If your cached credentials become stale or you want to switch accounts: -## Architecture +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + force_login=True, # Always perform fresh OAuth login +) +``` -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% -flowchart TB - subgraph Types["Skill Types"] - Repo["Repository Skill
trigger: None"] - Knowledge["Knowledge Skill
trigger: KeywordTrigger"] - Task["Task Skill
trigger: TaskTrigger"] - end - - subgraph Triggers["Trigger Evaluation"] - Always["Always Active
Repository guidelines"] - Keyword["Keyword Match
String matching on user messages"] - TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] - end - - subgraph Content["Skill Content"] - Markdown["Markdown with Frontmatter"] - MCPTools["MCP Tools Config
Repo skills only"] - Inputs["Input Metadata
Task skills only"] - end - - subgraph Integration["Agent Integration"] - Context["Agent Context"] - Prompt["System Prompt"] - end - - Repo --> Always - Knowledge --> Keyword - Task --> TaskMatch - - Always --> Markdown - Keyword --> Markdown - TaskMatch --> Markdown - - Repo -.->|Optional| MCPTools - Task -.->|Requires| Inputs - - Markdown --> Context - MCPTools --> Context - Context --> Prompt - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Repo,Knowledge,Task primary - class Always,Keyword,TaskMatch secondary - class Context tertiary +### Disable Browser Auto-Open + +For headless environments or when you prefer to manually open the URL: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + open_browser=False, # Prints URL to console instead +) ``` -### Key Components +### Check Subscription Mode -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | -| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | -| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | -| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | -| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | +Verify that the LLM is using subscription-based authentication: -## Skill Types +```python icon="python" +llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") +print(f"Using subscription: {llm.is_subscription}") # True +``` -### Repository Skills +## Credential Storage -Always-active, repository-specific guidelines. +Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. -**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. +## Ready-to-run Example -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - File["AGENTS.md"] - Parse["Parse Frontmatter"] - Skill["Skill(trigger=None)"] - Context["Always in Context"] - - File --> Parse - Parse --> Skill - Skill --> Context - - style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` + +This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) + -**Characteristics:** -- **Trigger:** `None` (always active) -- **Purpose:** Project conventions, coding standards, architecture rules -- **MCP Tools:** Can include MCP tool configuration -- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) +```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py +"""Example: Using ChatGPT subscription for Codex models. -**Example Files (permanent context):** -- `AGENTS.md` - General agent instructions -- `GEMINI.md` - Gemini-specific instructions -- `CLAUDE.md` - Claude-specific instructions +This example demonstrates how to use your ChatGPT Plus/Pro subscription +to access OpenAI's Codex models without consuming API credits. -**Other supported formats:** -- `.cursorrules` - Cursor IDE guidelines -- `agents.md` / `agent.md` - General agent instructions +The subscription_login() method handles: +- OAuth PKCE authentication flow +- Credential caching (~/.openhands/auth/) +- Automatic token refresh + +Supported models: +- gpt-5.2-codex +- gpt-5.2 +- gpt-5.1-codex-max +- gpt-5.1-codex-mini + +Requirements: +- Active ChatGPT Plus or Pro subscription +- Browser access for initial OAuth login +""" + +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -### Knowledge Skills -Keyword-triggered skills for specialized domains: +# First time: Opens browser for OAuth login +# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" +) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - User["User Message"] - Check["Check Keywords"] - Match{"Match?"} - Activate["Activate Skill"] - Skip["Skip Skill"] - Context["Add to Context"] - - User --> Check - Check --> Match - Match -->|Yes| Activate - Match -->|No| Skip - Activate --> Context - - style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +# Alternative: Force a fresh login (useful if credentials are stale) +# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) -**Characteristics:** -- **Trigger:** `KeywordTrigger` with regex patterns -- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") -- **Activation:** Keywords detected in user messages -- **Location:** System or user-defined knowledge base +# Alternative: Disable auto-opening browser (prints URL to console instead) +# llm = LLM.subscription_login( +# vendor="openai", model="gpt-5.2-codex", open_browser=False +# ) -**Trigger Example:** -```yaml ---- -name: kubernetes -trigger: - type: keyword - keywords: ["kubernetes", "k8s", "kubectl"] ---- -``` +# Verify subscription mode is active +print(f"Using subscription mode: {llm.is_subscription}") -### Task Skills +# Use the LLM with an agent as usual +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) -Keyword-triggered skills with structured inputs for guided workflows: +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - User["User Message"] - Match{"Keyword
Match?"} - Inputs["Collect User Inputs"] - Template["Apply Template"] - Context["Add to Context"] - Skip["Skip Skill"] - - User --> Match - Match -->|Yes| Inputs - Match -->|No| Skip - Inputs --> Template - Template --> Context - - style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +conversation.send_message("List the files in the current directory.") +conversation.run() +print("Done!") ``` -**Characteristics:** -- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) -- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) -- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) -- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) -- **Location:** System-defined or custom task templates + -**Trigger Example:** -```yaml ---- -name: bug_fix -triggers: ["/bug_fix", "fix bug", "bug report"] -inputs: - - name: bug_description - description: "Describe the bug" - required: true ---- -``` +## Next Steps -**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token +- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces -## Trigger Evaluation +### Model Context Protocol +Source: https://docs.openhands.dev/sdk/guides/mcp.md -Skills are evaluated at different points in the agent lifecycle: +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Start["Agent Step Start"] - - Repo["Check Repository Skills
trigger: None"] - AddRepo["Always Add to Context"] - - Message["Check User Message"] - Keyword["Match Keyword Triggers"] - AddKeyword["Add Matched Skills"] - - TaskType["Check Task Type"] - TaskMatch["Match Task Triggers"] - AddTask["Add Task Skill"] - - Build["Build Agent Context"] - - Start --> Repo - Repo --> AddRepo - - Start --> Message - Message --> Keyword - Keyword --> AddKeyword - - Start --> TaskType - TaskType --> TaskMatch - TaskMatch --> AddTask - - AddRepo --> Build - AddKeyword --> Build - AddTask --> Build - - style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` + + ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. + Read more about MCP [here](https://modelcontextprotocol.io/). + -**Evaluation Rules:** -| Trigger Type | Evaluation Point | Activation Condition | -|--------------|------------------|----------------------| -| **None** | Every step | Always active | -| **KeywordTrigger** | On user message | Keyword/string match in message | -| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | -**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. +## Basic MCP Usage -## MCP Tool Integration +> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! -Repository skills can include MCP tool configurations: + + + ### MCP Configuration + Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Skill["Repository Skill"] - MCPConfig["mcp_tools Config"] - Client["MCP Client"] - Tools["Tool Registry"] - - Skill -->|Contains| MCPConfig - MCPConfig -->|Spawns| Client - Client -->|Registers| Tools - - style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` + ```python mcp_config icon="python" wrap focus={3-10} + mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "repomix": { + "command": "npx", + "args": ["-y", "repomix@1.4.2", "--mcp"] + }, + } + } + ``` + + + ### Tool Filtering + Use `filter_tools_regex` to control which MCP tools are available to the agent -**MCP Configuration Format:** + ```python filter_tools_regex focus={4-5} icon="python" + agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", + ) + ``` + + -Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): +## MCP with OAuth -```yaml ---- -name: repo_skill -mcp_tools: - mcpServers: - filesystem: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] ---- +> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! + +For MCP servers requiring OAuth authentication: +- Configure OAuth-enabled MCP servers by specifying the URL and auth type +- The SDK automatically handles the OAuth flow when first connecting +- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) +- User will be prompted to authenticate +- Access tokens are securely stored and automatically refreshed by FastMCP as needed + +```python mcp_config focus={5} icon="python" wrap +mcp_config = { + "mcpServers": { + "Notion": { + "url": "https://mcp.notion.com/mcp", + "auth": "oauth" + } + } +} ``` -**Workflow:** -1. **Load Skill:** Parse markdown file with frontmatter -2. **Extract MCP Config:** Read `mcp_tools` field -3. **Spawn MCP Servers:** Create MCP clients for each server -4. **Register Tools:** Add MCP tools to agent's tool registry -5. **Inject Context:** Add skill content to agent prompt +## Ready-to-Run Basic MCP Usage Example -## Skill File Format + +This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) + -Skills are defined in markdown files with YAML frontmatter: +Here's an example integrating MCP servers with an agent: + +```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py +import os -```markdown ---- -name: skill_name -trigger: - type: keyword - keywords: ["pattern1", "pattern2"] ---- +from pydantic import SecretStr -# Skill Content +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -This is the instruction text that will be added to the agent's context. -``` -**Frontmatter Fields:** +logger = get_logger(__name__) -| Field | Required | Description | -|-------|----------|-------------| -| **name** | Yes | Unique skill identifier | -| **trigger** | Yes* | Activation trigger (`null` for always active) | -| **mcp_tools** | No | MCP server configuration (repo skills only) | -| **inputs** | No | User input metadata (task skills only) | +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -*Repository skills use `trigger: null` (or omit trigger field) +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -## Component Relationships +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, + } +} +# Agent +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + # This regex filters out all repomix tools except pack_codebase + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", +) -### How Skills Integrate +llm_messages = [] # collect raw LLM messages -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Skills["Skill System"] - Context["Agent Context"] - Agent["Agent"] - MCP["MCP Client"] - - Skills -->|Injects content| Context - Skills -.->|Spawns tools| MCP - Context -->|System prompt| Agent - MCP -->|Tool| Agent - - style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` -**Relationship Characteristics:** -- **Skills → Agent Context**: Active skills contribute their content to system prompt -- **Skills → MCP**: Repository skills can spawn MCP servers and register tools -- **Context → Agent**: Combined skill content becomes part of agent's instructions -- **Skills Lifecycle**: Loaded at conversation start, evaluated each step +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -## See Also -- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context -- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management -- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) -### Tool System & MCP -Source: https://docs.openhands.dev/sdk/arch/tool-system.md +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() -The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. +conversation.send_message("Great! Now delete that file.") +conversation.run() -**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -## Core Responsibilities +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -The Tool System has four primary responsibilities: + -1. **Type Safety** - Enforce action/observation schemas via Pydantic models -2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas -3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs -4. **Tool Registry** - Discover and resolve tools by name or pattern +## Ready-to-Run MCP with OAuth Example -## Tool System + +This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) + -### Architecture Overview +```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py +import os -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph Definition["Tool Definition"] - Action["Action
Input schema"] - Observation["Observation
Output schema"] - Executor["Executor
Business logic"] - end - - subgraph Framework["Tool Framework"] - Base["ToolBase
Abstract base"] - Impl["Tool Implementation
Concrete tool"] - Registry["Tool Registry
Spec → Tool"] - end +from pydantic import SecretStr - Agent["Agent"] - LLM["LLM"] - ToolSpec["Tool Spec
name + params"] +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool - Base -.->|Extends| Impl - - ToolSpec -->|resolve_tool| Registry - Registry -->|Create instances| Impl - Impl -->|Available in| Agent - Impl -->|Generate schema| LLM - LLM -->|Generate tool call| Agent - Agent -->|Parse & validate| Action - Agent -->|Execute via Tool.\_\_call\_\_| Executor - Executor -->|Return| Observation - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Base primary - class Action,Observation,Executor secondary - class Registry tertiary -``` -### Key Components +logger = get_logger(__name__) -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | -| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | -| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | -| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | -| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | -| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | -| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | -| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -### Action-Observation Pattern +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] -The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. +mcp_config = { + "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} +} +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - subgraph Agent["Agent Layer"] - ToolCall["MessageToolCall
from LLM"] - ParseJSON["Parse JSON
arguments"] - CreateAction["tool.action_from_arguments()
Pydantic validation"] - WrapAction["ActionEvent
wraps Action"] - WrapObs["ObservationEvent
wraps Observation"] - Error["AgentErrorEvent"] - end - - subgraph ToolSystem["Tool System"] - ActionType["Action
Pydantic model"] - ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] - Execute["ToolExecutor
business logic"] - ObsType["Observation
Pydantic model"] - end - - ToolCall --> ParseJSON - ParseJSON -->|Valid JSON| CreateAction - ParseJSON -->|Invalid JSON| Error - CreateAction -->|Valid| ActionType - CreateAction -->|Invalid| Error - ActionType --> WrapAction - ActionType --> ToolCall2 - ToolCall2 --> Execute - Execute --> ObsType - ObsType --> WrapObs - - style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px - style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px -``` +llm_messages = [] # collect raw LLM messages -**Tool System Boundary:** -- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance -- **Output**: `Observation` instance with structured result -- **No knowledge of**: Events, LLM messages, conversation state -### Tool Definition +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -Tools are defined using two patterns depending on complexity: -#### Pattern 1: Direct Instantiation (Simple Tools) +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], +) -For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): +logger.info("Starting conversation with MCP integration...") +conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") +conversation.run() -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% -flowchart LR - Action["Define Action
with visualize"] - Obs["Define Observation
with to_llm_content"] - Exec["Define Executor
stateless logic"] - Tool["ToolDefinition(...,
executor=Executor())"] - - Action --> Tool - Obs --> Tool - Exec --> Tool - - style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") ``` -**Components:** -1. **Action** - Pydantic model with `visualize` property for display -2. **Observation** - Pydantic model with `to_llm_content` property for LLM -3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` -4. **ToolDefinition** - Direct instantiation with executor instance + -#### Pattern 2: Subclass with Factory (Stateful Tools) +## Next Steps -For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): +- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage +- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% -flowchart LR - Action["Define Action
with visualize"] - Obs["Define Observation
with to_llm_content"] - Exec["Define Executor
with \_\_init\_\_ and state"] - Subclass["class MyTool(ToolDefinition)
with create() method"] - Instance["Return [MyTool(...,
executor=instance)]"] - - Action --> Subclass - Obs --> Subclass - Exec --> Subclass - Subclass --> Instance - - style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### Metrics Tracking +Source: https://docs.openhands.dev/sdk/guides/metrics.md -**Components:** -1. **Action/Observation** - Same as Pattern 1 -2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup -3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method -4. **Factory Method** - Returns sequence of configured tool instances +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - subgraph Pattern1["Pattern 1: Direct Instantiation"] - P1A["Define Action/Observation
with visualize/to_llm_content"] - P1E["Define ToolExecutor
with \_\_call\_\_()"] - P1T["ToolDefinition(...,
executor=Executor())"] - end - - subgraph Pattern2["Pattern 2: Subclass with Factory"] - P2A["Define Action/Observation
with visualize/to_llm_content"] - P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] - P2C["class MyTool(ToolDefinition)
@classmethod create()"] - P2I["Return [MyTool(...,
executor=instance)]"] - end - - P1A --> P1E - P1E --> P1T - - P2A --> P2E - P2E --> P2C - P2C --> P2I - - style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +## Overview -**Key Design Elements:** +The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). -| Component | Purpose | Requirements | -|-----------|---------|--------------| -| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | -| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | -| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | -| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | +## Getting Metrics from Individual LLMs -**When to Use Each Pattern:** +> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! -| Pattern | Use Case | Examples | -|---------|----------|----------| -| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | -| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | +Track token usage, costs, and performance metrics from LLM interactions: -### Tool Annotations +### Accessing Individual LLM Metrics -Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: +Access metrics directly from the LLM object after running the conversation: -| Field | Meaning | Examples | -|-------|---------|----------| -| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | -| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | -| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | -| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | +```python icon="python" focus={3-4} +conversation.run() -**Key Behaviors:** -- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` -- Annotations help LLMs reason about tool safety and side effects +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` -### Tool Registry +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: -The registry enables **dynamic tool discovery** and instantiation from tool specifications: +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - ToolSpec["Tool Spec
name + params"] - - subgraph Registry["Tool Registry"] - Resolver["Resolver
name → factory"] - Factory["Factory
create(params)"] - end - - Instance["Tool Instance
with executor"] - Agent["Agent"] - - ToolSpec -->|"resolve_tool(spec)"| Resolver - Resolver -->|Lookup factory| Factory - Factory -->|"create(**params)"| Instance - Instance -->|Used by| Agent - - style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` + + For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + -**Resolution Workflow:** +### Ready-to-run Example (LLM metrics) + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + -1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) -2. **Resolver Lookup** - Registry finds the registered resolver for the tool name -3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state -4. **Instance Creation** - Tool instance(s) are created with configured executors -5. **Agent Usage** - Instances are added to the agent's tools_map for execution +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os -**Registration Types:** +from pydantic import SecretStr -| Type | Registration | Resolver Behavior | -|------|-------------|-------------------| -| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | -| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | -| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -### File Organization -Tools follow a consistent file structure for maintainability: +logger = get_logger(__name__) -``` -openhands-tools/openhands/tools/my_tool/ -├── __init__.py # Export MyTool -├── definition.py # Action, Observation, MyTool(ToolDefinition) -├── impl.py # MyExecutor(ToolExecutor) -└── [other modules] # Tool-specific utilities -``` +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -**File Responsibilities:** +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] -| File | Contains | Purpose | -|------|----------|---------| -| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | -| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | -| `__init__.py` | Tool exports | Package interface | +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} -**Benefits:** -- **Separation of Concerns** - Public API separate from implementation -- **Avoid Circular Imports** - Import `impl` only inside `create()` method -- **Consistency** - All tools follow same structure for discoverability +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) -## MCP Integration +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() -The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. +conversation.send_message("Great! Now delete that file.") +conversation.run() -**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -### Architecture Overview +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% -flowchart TB - subgraph External["External MCP Server"] - Server["MCP Server
stdio/HTTP"] - ExtTools["External Tools"] - end - - subgraph Bridge["MCP Integration Layer"] - MCPClient["MCPClient
Sync/Async bridge"] - Convert["Schema Conversion
MCP → MCPToolDefinition"] - MCPExec["MCPToolExecutor
Bridges to MCP calls"] - end - - subgraph Agent["Agent System"] - ToolsMap["tools_map
str -> ToolDefinition"] - AgentLogic["Agent Execution"] - end - - Server -.->|Spawns| ExtTools - MCPClient --> Server - Server --> Convert - Convert -->|create_mcp_tools| MCPExec - MCPExec -->|Added during
agent.initialize| ToolsMap - ToolsMap --> AgentLogic - AgentLogic -->|Tool call| MCPExec - MCPExec --> MCPClient - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class MCPClient primary - class Convert,MCPExec secondary - class Server,ExtTools external +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` -### Key Components + -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | MCP server connection | Extends FastMCP with sync/async bridge | -| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Tool wrapper | Wraps MCP tools as SDK `ToolDefinition` with dynamic validation | -| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP tool calls via MCPClient | -| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Generic action wrapper | Simple `dict[str, Any]` wrapper for MCP tool arguments | -| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results as observations with content blocks | -| **[`_create_mcp_action_type()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Dynamic schema | Runtime Pydantic model generated from MCP `inputSchema` for validation | +## Using LLM Registry for Cost Tracking -### Sync/Async Bridge +> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! -MCP protocol is asynchronous, but SDK tools execute synchronously. The bridge pattern in [client.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py) solves this: +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Sync["Sync Tool Execution"] - Bridge["call_async_from_sync()"] - Loop["Background Event Loop"] - Async["Async MCP Call"] - Result["Return Result"] - - Sync --> Bridge - Bridge --> Loop - Loop --> Async - Async --> Result - Result --> Sync - - style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Loop fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +### How the LLM Registry Works -**Bridge Features:** -- **Background Event Loop** - Executes async code from sync contexts -- **Timeout Support** - Configurable timeouts for MCP operations -- **Error Handling** - Wraps MCP errors in observations -- **Connection Pooling** - Reuses connections across tool calls +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: -### Tool Discovery Flow +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID -**Source:** [`create_mcp_tools()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/utils.py) | [`agent._initialize()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py) +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart TB - Config["MCP Server Config
command + args"] - Spawn["Spawn Server Process
MCPClient"] - List["List Available Tools
client.list_tools()"] - - subgraph Convert["For Each MCP Tool"] - Store["Store MCP metadata
name, description, inputSchema"] - CreateExec["Create MCPToolExecutor
bound to tool + client"] - Def["Create MCPToolDefinition
generic MCPToolAction type"] - end - - Register["Add to Agent's tools_map
bypasses ToolRegistry"] - Ready["Tools Available
Dynamic models created on-demand"] - - Config --> Spawn - Spawn --> List - List --> Store - Store --> CreateExec - CreateExec --> Def - Def --> Register - Register --> Ready - - style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Def fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +### Ready-to-run Example (LLM Registry) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + -**Discovery Steps:** -1. **Spawn Server** - Launch MCP server via stdio protocol (using `MCPClient`) -2. **List Tools** - Call MCP `tools/list` endpoint to retrieve available tools -3. **Parse Schemas** - Extract tool names, descriptions, and `inputSchema` from MCP response -4. **Create Definitions** - For each tool, call `MCPToolDefinition.create()` which: - - Creates an `MCPToolExecutor` instance bound to the tool name and client - - Wraps the MCP tool metadata in `MCPToolDefinition` - - Uses generic `MCPToolAction` as the action type (NOT dynamic models yet) -5. **Add to Agent** - All `MCPToolDefinition` instances are added to agent's `tools_map` during `initialize()` (bypasses ToolRegistry) -6. **Lazy Validation** - Dynamic Pydantic models are generated lazily when: - - `action_from_arguments()` is called (argument validation) - - `to_openai_tool()` is called (schema export to LLM) -**Schema Handling:** +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os -| MCP Schema | SDK Integration | When Used | -|------------|----------------|-----------| -| `name` | Tool name (stored in `MCPToolDefinition`) | Discovery, execution | -| `description` | Tool description for LLM | Discovery, LLM prompt | -| `inputSchema` | Stored in `mcp_tool.inputSchema` | Lazy model generation | -| `inputSchema` fields | Converted to Pydantic fields via `Schema.from_mcp_schema()` | Validation, schema export | -| `annotations` | Mapped to `ToolAnnotations` | Security analysis, LLM hints | +from pydantic import SecretStr -### MCP Server Configuration +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool -MCP servers are configured via the `mcp_config` field on the `Agent` class. Configuration follows [FastMCP config format](https://gofastmcp.com/clients/client#configuration-format): -```python -from openhands.sdk import Agent +logger = get_logger(__name__) -agent = Agent( - mcp_config={ - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "filesystem": { - "command": "npx", - "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] - } - } - } +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), ) -``` -## Component Relationships +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart TB - subgraph Sources["Tool Sources"] - Native["Native Tools"] - MCP["MCP Tools"] - end - - Registry["Tool Registry
resolve_tool"] - ToolsMap["Agent.tools_map
Merged tool dict"] - - subgraph AgentSystem["Agent System"] - Agent["Agent Logic"] - LLM["LLM"] - end - - Security["Security Analyzer"] - Conversation["Conversation State"] - - Native -->|register_tool| Registry - Registry --> ToolsMap - MCP -->|create_mcp_tools| ToolsMap - ToolsMap -->|Provide schemas| LLM - Agent -->|Execute tools| ToolsMap - ToolsMap -.->|Action risk| Security - ToolsMap -.->|Read state| Conversation - - style ToolsMap fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +# Get LLM from registry +llm = llm_registry.get("agent") -**Relationship Characteristics:** -- **Native → Registry → tools_map**: Native tools resolved via `ToolRegistry` -- **MCP → tools_map**: MCP tools bypass registry, added directly during `initialize()` -- **tools_map → LLM**: Generate schemas describing all available capabilities -- **Agent → tools_map**: Execute actions, receive observations -- **tools_map → Conversation**: Read state for context-aware execution -- **tools_map → Security**: Tool annotations inform risk assessment +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) -## See Also +llm_messages = [] # collect raw LLM messages -- **[Agent Architecture](/sdk/arch/agent)** - How agents select and execute tools -- **[Events](/sdk/arch/events)** - ActionEvent and ObservationEvent structures -- **[Security Analyzer](/sdk/arch/security)** - Action risk assessment -- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills -- **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools -- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library -### Workspace -Source: https://docs.openhands.dev/sdk/arch/workspace.md +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) -The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. -**Source:** [`openhands/sdk/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) -## Core Responsibilities +conversation.send_message("Please echo 'Hello!'") +conversation.run() -The Workspace system has four primary responsibilities: +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") -1. **Execution Abstraction** - Unified interface for command execution across environments -2. **File Operations** - Upload, download, and manipulate files in workspace -3. **Resource Management** - Context manager protocol for setup/teardown -4. **Environment Isolation** - Separate agent execution from host system +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") -## Architecture +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 60}} }%% -flowchart TB - subgraph Interface["Abstract Interface"] - Base["BaseWorkspace
Abstract base class"] - end - - subgraph Implementations["Concrete Implementations"] - Local["LocalWorkspace
Direct subprocess"] - Remote["RemoteWorkspace
HTTP API calls"] - end - - subgraph Operations["Core Operations"] - Command["execute_command()"] - Upload["file_upload()"] - Download["file_download()"] - Context["__enter__ / __exit__"] - end - - subgraph Targets["Execution Targets"] - Process["Local Process"] - Container["Docker Container"] - Server["Remote Server"] - end - - Base --> Local - Base --> Remote - - Base -.->|Defines| Operations - - Local --> Process - Remote --> Container - Remote --> Server - - classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px - - class Base primary - class Local,Remote secondary - class Command,Upload tertiary +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` + -### Key Components +### Getting Aggregated Conversation Costs -| Component | Purpose | Design | -|-----------|---------|--------| -| **[`BaseWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)** | Abstract interface | Defines execution and file operation contracts | -| **[`LocalWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/local.py)** | Local execution | Subprocess-based command execution | -| **[`RemoteWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/remote/base.py)** | Remote execution | HTTP API-based execution via agent-server | -| **[`CommandResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | Execution output | Structured result with stdout, stderr, exit_code | -| **[`FileOperationResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | File op outcome | Success status and metadata | + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + -## Workspace Types +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. -### Local vs Remote Execution +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os +from pydantic import SecretStr +from tabulate import tabulate -| Aspect | LocalWorkspace | RemoteWorkspace | -|--------|----------------|-----------------| -| **Execution** | Direct subprocess | HTTP → agent-server | -| **Isolation** | Process-level | Container/VM-level | -| **Performance** | Fast (no network) | Network overhead | -| **Security** | Host system access | Sandboxed | -| **Use Case** | Development, CLI | Production, web apps | +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.terminal import TerminalTool -## Core Operations -### Command Execution +logger = get_logger(__name__) -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% -flowchart LR - Tool["Tool invokes
execute_command()"] - - Decision{"Workspace
type?"} - - LocalExec["subprocess.run()
Direct execution"] - RemoteExec["POST /command
HTTP API"] - - Result["CommandResult
stdout, stderr, exit_code"] - - Tool --> Decision - Decision -->|Local| LocalExec - Decision -->|Remote| RemoteExec - - LocalExec --> Result - RemoteExec --> Result - - style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style LocalExec fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style RemoteExec fill:#fff4df,stroke:#b7791f,stroke-width:2px -``` +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") -**Command Result Structure:** +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -| Field | Type | Description | -|-------|------|-------------| -| **stdout** | str | Standard output stream | -| **stderr** | str | Standard error stream | -| **exit_code** | int | Process exit code (0 = success) | -| **timeout** | bool | Whether command timed out | -| **duration** | float | Execution time in seconds | +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model=model, + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` -### File Operations + -| Operation | Local Implementation | Remote Implementation | -|-----------|---------------------|----------------------| -| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | -| **Download** | `shutil.copy()` | `GET /file/download` stream | -| **Result** | `FileOperationResult` | `FileOperationResult` | +### Understanding Conversation Stats -## Resource Management +The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: -Workspaces use **context manager** for safe resource handling: +#### Key Methods and Properties -**Lifecycle Hooks:** +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. -| Phase | LocalWorkspace | RemoteWorkspace | -|-------|----------------|-----------------| -| **Enter** | Create working directory | Connect to agent-server, verify | -| **Use** | Execute commands | Proxy commands via HTTP | -| **Exit** | No cleanup (persistent) | Disconnect, optionally stop container | +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. -## Remote Workspace Extensions +```python icon="python" focus={2, 6, 10} +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") -The SDK provides remote workspace implementations in `openhands-workspace` package: +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 50}} }%% -flowchart TB - Base["RemoteWorkspace
SDK base class"] - - Docker["DockerWorkspace
Auto-spawn containers"] - API["RemoteAPIWorkspace
Connect to existing server"] - - Base -.->|Extended by| Docker - Base -.->|Extended by| API - - Docker -->|Creates| Container["Docker Container
with agent-server"] - API -->|Connects| Server["Remote Agent Server"] - - style Base fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Docker fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px - style API fill:#fff4df,stroke:#b7791f,stroke-width:2px +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") ``` -**Implementation Comparison:** +## Next Steps -| Type | Setup | Isolation | Use Case | -|------|-------|-----------|----------| -| **LocalWorkspace** | Immediate | Process | Development, trusted code | -| **DockerWorkspace** | Spawn container | Container | Multi-user, untrusted code | -| **RemoteAPIWorkspace** | Connect to URL | Remote server | Distributed systems, cloud | +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models -**Source:** -- **DockerWorkspace**: [`openhands-workspace/openhands/workspace/docker`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/docker) -- **RemoteAPIWorkspace**: [`openhands-workspace/openhands/workspace/remote_api`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/remote_api) +### Observability & Tracing +Source: https://docs.openhands.dev/sdk/guides/observability.md -## Component Relationships +> A full setup example is available [here](#example:-full-setup)! -### How Workspace Integrates +## Overview -```mermaid -%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% -flowchart LR - Workspace["Workspace"] - Conversation["Conversation"] - AgentServer["Agent Server"] - - Conversation -->|Configures| Workspace - Workspace -.->|Remote type| AgentServer - - style Workspace fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px - style Conversation fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px -``` +The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: -**Relationship Characteristics:** -- **Conversation → Workspace**: Conversation factory uses workspace type to select LocalConversation or RemoteConversation -- **Workspace → Agent Server**: RemoteWorkspace delegates operations to agent-server API -- **Tools Independence**: Tools run in the same environment as workspace +- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support +- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing +- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more -## See Also +The SDK automatically traces: +- Agent execution steps +- Tool calls and executions +- LLM API calls (via LiteLLM integration) +- Browser automation sessions (when using browser-use) +- Conversation lifecycle events -- **[Conversation Architecture](/sdk/arch/conversation)** - How workspace type determines conversation implementation -- **[Agent Server](/sdk/arch/agent-server)** - Remote execution API -- **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution +## Quick Start -### FAQ -Source: https://docs.openhands.dev/sdk/faq.md +Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. -## How do I use AWS Bedrock with the SDK? +### Using Laminar -**Yes, the OpenHands SDK supports AWS Bedrock through LiteLLM.** +[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: -Since LiteLLM requires `boto3` for Bedrock requests, you need to install it alongside the SDK. +```bash icon="terminal" wrap +# Set your Laminar project API key +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` - +That's it! Run your agent code normally and traces will be sent to Laminar automatically. -### Step 1: Install boto3 +### Using Honeycomb or Other OTLP Backends -Install the SDK with boto3: +For Honeycomb, Jaeger, or any other OTLP-compatible backend: -```bash -# Using pip -pip install openhands-sdk boto3 +```bash icon="terminal" wrap +# Required: Set the OTLP endpoint +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" -# Using uv -uv pip install openhands-sdk boto3 +# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" -# Or when installing as a CLI tool -uv tool install openhands --with boto3 +# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it ``` -### Step 2: Configure Authentication +### Alternative Configuration Methods -You have two authentication options: +You can also use these alternative environment variable formats: -**Option A: API Key Authentication (Recommended)** +```bash icon="terminal" wrap +# Short form for endpoint +export OTEL_ENDPOINT="http://localhost:4317" -Use the `AWS_BEARER_TOKEN_BEDROCK` environment variable: +# Alternative header format +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" -```bash -export AWS_BEARER_TOKEN_BEDROCK="your-bedrock-api-key" +# Alternative protocol specification +export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" ``` -**Option B: AWS Credentials** +## How It Works -Use traditional AWS credentials: +The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: -```bash -export AWS_ACCESS_KEY_ID="your-access-key" -export AWS_SECRET_ACCESS_KEY="your-secret-key" -export AWS_REGION_NAME="us-west-2" -``` +1. **Detects Configuration**: Checks for OTEL environment variables on startup +2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter +3. **Instruments Code**: Automatically wraps key functions with tracing decorators +4. **Captures Context**: Associates traces with conversation IDs for session grouping +5. **Exports Spans**: Sends trace data to your configured backend -### Step 3: Configure the Model +### What Gets Traced -Use the `bedrock/` prefix for your model name: +The SDK automatically instruments these components: -```python -from openhands.sdk import LLM, Agent +- **`agent.step`** - Each iteration of the agent's execution loop +- **Tool Executions** - Individual tool calls with input/output capture +- **LLM Calls** - API requests to language models via LiteLLM +- **Conversation Lifecycle** - Message sending, conversation runs, and title generation +- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) -llm = LLM( - model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0", - # api_key is read from AWS_BEARER_TOKEN_BEDROCK automatically -) -``` +### Trace Hierarchy -For cross-region inference profiles, include the region prefix: +Traces are organized hierarchically: -```python -llm = LLM( - model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0", # US region - # or - model="bedrock/apac.anthropic.claude-sonnet-4-20250514-v1:0", # APAC region -) -``` + + + + + + + + + + + + + - +Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single +conversation together in your observability platform. -For more details on Bedrock configuration options, see the [LiteLLM Bedrock documentation](https://docs.litellm.ai/docs/providers/bedrock). +Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. -## Does the agent SDK support parallel tool calling? +## Configuration Reference -**Yes, the OpenHands SDK supports parallel tool calling by default.** +### Environment Variables -The SDK automatically handles parallel tool calls when the underlying LLM (like Claude or GPT-4) returns multiple tool calls in a single response. This allows agents to execute multiple independent actions before the next LLM call. +The SDK checks for these environment variables (in order of precedence): - -When the LLM generates multiple tool calls in parallel, the SDK groups them using a shared `llm_response_id`: +| Variable | Description | Example | +|----------|-------------|---------| +| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | +| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | +| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | +| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | +| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | +| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | +| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | -```python -ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) -ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) -# Combined into: Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +### Header Format + +Headers should be comma-separated `key=value` pairs with URL encoding for special characters: + +```bash icon="terminal" wrap +# Single header +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" + +# Multiple headers +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" ``` -Multiple `ActionEvent`s with the same `llm_response_id` are grouped together and combined into a single LLM message with multiple `tool_calls`. Only the first event's thought/reasoning is included. The parallel tool calling implementation can be found in the [Events Architecture](/sdk/arch/events#event-types) for detailed explanation of how parallel function calling works, the [`prepare_llm_messages` in utils.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/utils.py) which groups ActionEvents by `llm_response_id` when converting events to LLM messages, the [agent step method](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py#L200-L300) where actions are created with shared `llm_response_id`, and the [`ActionEvent` class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py) which includes the `llm_response_id` field. For more details, see the **[Events Architecture](/sdk/arch/events)** for a deep dive into the event system and parallel function calling, the **[Tool System](/sdk/arch/tool-system)** for understanding how tools work with the agent, and the **[Agent Architecture](/sdk/arch/agent)** for how agents process and execute actions. - +### Protocol Options -## Does the agent SDK support image content? +The SDK supports both HTTP and gRPC protocols: + +- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) +- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) + +## Platform-Specific Configuration + +### Laminar Setup + +1. Sign up at [laminar.sh](https://laminar.sh/) +2. Create a project and copy your API key +3. Set the environment variable: + +```bash icon="terminal" wrap +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` -**Yes, the OpenHands SDK fully supports image content for vision-capable LLMs.** +**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. -The SDK supports both HTTP/HTTPS URLs and base64-encoded images through the `ImageContent` class. +### Honeycomb Setup - +1. Sign up at [honeycomb.io](https://www.honeycomb.io/) +2. Get your API key from the account settings +3. Configure the environment: -### Check Vision Support +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` -Before sending images, verify your LLM supports vision: +### Jaeger Setup -```python -from openhands.sdk import LLM -from pydantic import SecretStr +For local development with Jaeger: -llm = LLM( - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("your-api-key"), - usage_id="my-agent" -) +```bash icon="terminal" wrap +# Start Jaeger all-in-one container +docker run -d --name jaeger \ + -p 4317:4317 \ + -p 16686:16686 \ + jaegertracing/all-in-one:latest -# Check if vision is active -assert llm.vision_is_active(), "Model does not support vision" +# Configure SDK +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" ``` -### Using HTTP URLs +Access the Jaeger UI at http://localhost:16686 -```python -from openhands.sdk import ImageContent, Message, TextContent +### Generic OTLP Collector -message = Message( - role="user", - content=[ - TextContent(text="What do you see in this image?"), - ImageContent(image_urls=["https://example.com/image.png"]), - ], -) -``` +For other backends, use their OTLP endpoint: -### Using Base64 Images +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` -Base64 images are supported using data URLs: +## Advanced Usage -```python -import base64 -from openhands.sdk import ImageContent, Message, TextContent +### Disabling Observability -# Read and encode an image file -with open("my_image.png", "rb") as f: - image_base64 = base64.b64encode(f.read()).decode("utf-8") +To disable tracing, simply unset all OTEL environment variables: -# Create message with base64 image -message = Message( - role="user", - content=[ - TextContent(text="Describe this image"), - ImageContent(image_urls=[f"data:image/png;base64,{image_base64}"]), - ], -) +```bash icon="terminal" wrap +unset LMNR_PROJECT_API_KEY +unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT +unset OTEL_EXPORTER_OTLP_ENDPOINT +unset OTEL_ENDPOINT ``` -### Supported Image Formats +The SDK will automatically skip all tracing instrumentation with minimal overhead. -The data URL format is: `data:;base64,` +### Custom Span Attributes -Supported MIME types: -- `image/png` -- `image/jpeg` -- `image/gif` -- `image/webp` -- `image/bmp` +The SDK automatically adds these attributes to spans: -### Built-in Image Support +- **`conversation_id`** - UUID of the conversation +- **`tool_name`** - Name of the tool being executed +- **`action.kind`** - Type of action being performed +- **`session_id`** - Groups all traces from one conversation -Several SDK tools automatically handle images: +### Debugging Tracing Issues -- **FileEditorTool**: When viewing image files (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`), they're automatically converted to base64 and sent to the LLM -- **BrowserUseTool**: Screenshots are captured and sent as base64 images -- **MCP Tools**: Image content from MCP tool results is automatically converted to base64 data URLs +If traces aren't appearing in your observability platform: -### Disabling Vision +1. **Verify Environment Variables**: + ```python icon="python" wrap + import os -To disable vision for cost reduction (even on vision-capable models): + otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') + otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') -```python -llm = LLM( - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("your-api-key"), - usage_id="my-agent", - disable_vision=True, # Images will be filtered out -) -``` + print(f"OTEL Endpoint: {otel_endpoint}") + print(f"OTEL Headers: {otel_headers}") + ``` - +2. **Check SDK Logs**: The SDK logs observability initialization at debug level: + ```python icon="python" wrap + import logging -For a complete example, see the [image input example](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) in the SDK repository. + logging.basicConfig(level=logging.DEBUG) + ``` -## How do I handle MessageEvent in one-off tasks? +3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: + ```bash icon="terminal" wrap + curl -v https://api.honeycomb.io:443/v1/traces + ``` -**The SDK provides utilities to automatically respond to agent messages when running tasks end-to-end.** +4. **Validate Headers**: Check that authentication headers are properly URL-encoded -When running one-off tasks, some models may send a `MessageEvent` (proposing an action or asking for confirmation) instead of directly using tools. This causes `conversation.run()` to return, even though the agent hasn't finished the task. +## Troubleshooting - +### Traces Not Appearing -When an agent sends a message (via `MessageEvent`) instead of using the `finish` tool, the conversation ends because it's waiting for user input. In automated pipelines, there's no human to respond, so the task appears incomplete. +**Problem**: No traces showing up in observability platform -**Key event types:** -- `ActionEvent`: Agent uses a tool (terminal, file editor, etc.) -- `MessageEvent`: Agent sends a text message (waiting for user response) -- `FinishAction`: Agent explicitly signals task completion +**Solutions**: +- Verify environment variables are set correctly +- Check network connectivity to OTLP endpoint +- Ensure authentication headers are valid +- Look for SDK initialization logs at debug level -The solution is to automatically send a "fake user response" when the agent sends a message, prompting it to continue. +### High Trace Volume - +**Problem**: Too many spans being generated - +**Solutions**: +- Configure sampling at the collector level +- For Laminar with non-browser tools, browser instrumentation is automatically disabled +- Use backend-specific filtering rules -The [`run_conversation_with_fake_user_response`](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) function wraps your conversation and automatically handles agent messages: +### Performance Impact -```python -from openhands.sdk.conversation.state import ConversationExecutionStatus -from openhands.sdk.event import ActionEvent, MessageEvent -from openhands.sdk.tool.builtins.finish import FinishAction +**Problem**: Concerned about tracing overhead -def run_conversation_with_fake_user_response(conversation, max_responses: int = 10): - """Run conversation, auto-responding to agent messages until finish or limit.""" - for _ in range(max_responses): - conversation.run() - if conversation.state.execution_status != ConversationExecutionStatus.FINISHED: - break - events = list(conversation.state.events) - # Check if agent used finish tool - if any(isinstance(e, ActionEvent) and isinstance(e.action, FinishAction) for e in reversed(events)): - break - # Check if agent sent a message (needs response) - if not any(isinstance(e, MessageEvent) and e.source == "agent" for e in reversed(events)): - break - # Send continuation prompt - conversation.send_message( - "Please continue. Use the finish tool when done. DO NOT ask for human help." - ) -``` +**Solutions**: +- Tracing has minimal overhead when properly configured +- Disable tracing in development by unsetting environment variables +- Use asynchronous exporters (default in most OTLP configurations) - +## Example: Full Setup - + +This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) + -```python -from openhands.sdk import Agent, Conversation, LLM -from openhands.workspace import DockerWorkspace -from openhands.tools.preset.default import get_default_tools +```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py +""" +Observability & Laminar example -llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key="...") -agent = Agent(llm=llm, tools=get_default_tools()) -workspace = DockerWorkspace() -conversation = Conversation(agent=agent, workspace=workspace, max_iteration_per_run=100) +This example demonstrates enabling OpenTelemetry tracing with Laminar in the +OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. +""" -conversation.send_message("Fix the bug in src/utils.py") -run_conversation_with_fake_user_response(conversation, max_responses=10) -# Results available in conversation.state.events -``` +import os - +from pydantic import SecretStr - -**Pro tip:** Add a hint to your task prompt: -> "If you're 100% done with the task, use the finish action. Otherwise, keep going until you're finished." +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.terminal import TerminalTool -This encourages the agent to use the finish tool rather than asking for confirmation. - -For the full implementation used in OpenHands benchmarks, see the [fake_user_response.py](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) module. +# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: +# export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# For non-Laminar OTLP backends, set OTEL_* variables instead. -## More questions? +# Configure LLM and Agent +api_key = os.getenv("LLM_API_KEY") +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key) if api_key else None, + base_url=base_url, + usage_id="agent", +) -If you have additional questions: +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +# Create conversation and run a simple task +conversation = Conversation(agent=agent, workspace=".") +conversation.send_message("List the files in the current directory and print them.") +conversation.run() +print( + "All done! Check your Laminar dashboard for traces " + "(session is the conversation UUID)." +) +``` + +```bash Running the Example +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/27_observability_laminar.py +``` -- **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community -- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion +## Next Steps -### Getting Started -Source: https://docs.openhands.dev/sdk/getting-started.md +- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces +- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application +- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions -The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. +### Plugins +Source: https://docs.openhands.dev/sdk/guides/plugins.md -## Prerequisites +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -Install the **[uv package manager](https://docs.astral.sh/uv/)** (version 0.8.13+): +Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: -```bash -curl -LsSf https://astral.sh/uv/install.sh | sh -``` +- **Skills**: Specialized knowledge and workflows +- **Hooks**: Event handlers for tool lifecycle +- **MCP Config**: External tool server configurations +- **Agents**: Specialized agent definitions +- **Commands**: Slash commands -## Installation +The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). -### Step 1: Acquire an LLM API Key +## Plugin Structure -The SDK requires an LLM API key from any [LiteLLM-supported provider](https://docs.litellm.ai/docs/providers). See our [recommended models](/openhands/usage/llms/llms) for best results. + +See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. + - - - Bring your own API key from providers like: - - [Anthropic](https://console.anthropic.com/) - - [OpenAI](https://platform.openai.com/) - - [Other LiteLLM-supported providers](https://docs.litellm.ai/docs/providers) +A plugin follows this directory structure: - Example: - ```bash - export LLM_API_KEY="your-api-key" - uv run python examples/01_standalone_sdk/01_hello_world.py - ``` - + + + + + + + + + + + + + + + + + + + + + + + - - Sign up for [OpenHands Cloud](https://app.all-hands.dev) and get an LLM API key from the [API keys page](https://app.all-hands.dev/settings/api-keys). This gives you access to models verified to work well with OpenHands, with no markup. +Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. - Example: - ```bash - export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" - uv run python examples/01_standalone_sdk/01_hello_world.py - ``` +### Plugin Manifest - [Learn more →](/openhands/usage/llms/openhands-llms) - +The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: - - If you have a ChatGPT Plus or Pro subscription, you can use `LLM.subscription_login()` to authenticate with your ChatGPT account and access Codex models without consuming API credits. +```json icon="file-code" wrap +{ + "name": "code-quality", + "version": "1.0.0", + "description": "Code quality tools and workflows", + "author": "openhands", + "license": "MIT", + "repository": "https://github.com/example/code-quality-plugin" +} +``` - ```python - from openhands.sdk import LLM +### Skills - llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") - ``` +Skills are defined in markdown files with YAML frontmatter: - [Learn more →](/sdk/guides/llm-subscriptions) - - +```markdown icon="file-code" +--- +name: python-linting +description: Instructions for linting Python code +trigger: + type: keyword + keywords: + - lint + - linting + - code quality +--- -> Tip: Model name prefixes depend on your provider -> -> - If you bring your own provider key (Anthropic/OpenAI/etc.), use that provider's model name, e.g. `anthropic/claude-sonnet-4-5-20250929` -OpenHands supports [dozens of models](https://docs.openhands.dev/sdk/arch/llm#llm-providers), you can choose the model you want to try. -> - If you use OpenHands Cloud, use `openhands/`-prefixed models, e.g. `openhands/claude-sonnet-4-5-20250929` -> -> Many examples in the docs read the model from the `LLM_MODEL` environment variable. You can set it like: -> -> ```bash -> export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" # for OpenHands Provider -> ``` +# Python Linting Skill -**Set Your API Key:** +Run ruff to check for issues: -```bash -export LLM_API_KEY=your-api-key-here +\`\`\`bash +ruff check . +\`\`\` ``` -### Step 2: Install the SDK +### Hooks - - - ```bash - pip install openhands-sdk # Core SDK (openhands.sdk) - pip install openhands-tools # Built-in tools (openhands.tools) - # Optional: required for sandboxed workspaces in Docker or remote servers - pip install openhands-workspace # Workspace backends (openhands.workspace) - pip install openhands-agent-server # Remote agent server (openhands.agent_server) - ``` - +Hooks are defined in `hooks/hooks.json`: - - ```bash - # Clone the repository - git clone https://github.com/OpenHands/software-agent-sdk.git - cd software-agent-sdk +```json icon="file-code" wrap +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "file_editor", + "hooks": [ + { + "type": "command", + "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", + "timeout": 5 + } + ] + } + ] + } +} +``` - # Install dependencies and setup development environment - make build - ``` - - +### MCP Configuration +MCP servers are configured in `.mcp.json`: -### Step 3: Run Your First Agent +```json wrap icon="file-code" +{ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} +``` -Here's a complete example that creates an agent and asks it to perform a simple task: +## Using Plugin Components -```python icon="python" expandable examples/01_standalone_sdk/01_hello_world.py -import os +> The ready-to-run example is available [here](#ready-to-run-example)! -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +Brief explanation on how to use a plugin with an agent. + + + ### Loading a Plugin + First, load the desired plugins. -llm = LLM( - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - api_key=os.getenv("LLM_API_KEY"), - base_url=os.getenv("LLM_BASE_URL", None), -) + ```python icon="python" + from openhands.sdk.plugin import Plugin -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], -) + # Load a single plugin + plugin = Plugin.load("/path/to/plugin") -cwd = os.getcwd() -conversation = Conversation(agent=agent, workspace=cwd) + # Load all plugins from a directory + plugins = Plugin.load_all("/path/to/plugins") + ``` + + + ### Accessing Components + You can access the different plugin components to see which ones are available. -conversation.send_message("Write 3 facts about the current project into FACTS.txt.") -conversation.run() -print("All done!") -``` + ```python icon="python" + # Skills + for skill in plugin.skills: + print(f"Skill: {skill.name}") -Run the example: + # Hooks configuration + if plugin.hooks: + print(f"Hooks configured: {plugin.hooks}") -```bash -# Using a direct provider key (Anthropic/OpenAI/etc.) -uv run python examples/01_standalone_sdk/01_hello_world.py -``` + # MCP servers + if plugin.mcp_config: + servers = plugin.mcp_config.get("mcpServers", {}) + print(f"MCP servers: {list(servers.keys())}") + ``` + + + ### Using with an Agent + You can now feed your agent with your preferred plugin. + + ```python focus={3,10,17} icon="python" + # Create agent context with plugin skills + agent_context = AgentContext( + skills=plugin.skills, + ) -```bash -# Using OpenHands Cloud -export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" -uv run python examples/01_standalone_sdk/01_hello_world.py -``` + # Create agent with plugin MCP config + agent = Agent( + llm=llm, + tools=tools, + mcp_config=plugin.mcp_config or {}, + agent_context=agent_context, + ) -You should see the agent understand your request, explore the project, and create a file with facts about it. + # Create conversation with plugin hooks + conversation = Conversation( + agent=agent, + hook_config=plugin.hooks, + ) + ``` + + -## Core Concepts +## Ready-to-run Example -**Agent**: An AI-powered entity that can reason, plan, and execute actions using tools. + +This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) + -**Tools**: Capabilities like executing bash commands, editing files, or browsing the web. +```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py +"""Example: Loading Plugins via Conversation -**Workspace**: The execution environment where agents operate (local, Docker, or remote). +Demonstrates the recommended way to load plugins using the `plugins` parameter +on Conversation. Plugins bundle skills, hooks, and MCP config together. -**Conversation**: Manages the interaction lifecycle between you and the agent. +For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins +""" -## Basic Workflow +import os +import sys +import tempfile +from pathlib import Path -1. **Configure LLM**: Choose model and provide API key -2. **Create Agent**: Use preset or custom configuration -3. **Add Tools**: Enable capabilities (bash, file editing, etc.) -4. **Start Conversation**: Create conversation context -5. **Send Message**: Provide task description -6. **Run Agent**: Agent executes until task completes or stops -7. **Get Result**: Review agent's output and actions +from pydantic import SecretStr +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.plugin import PluginSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -## Try More Examples -The repository includes 24+ examples demonstrating various capabilities: +# Locate example plugin directory +script_dir = Path(__file__).parent +plugin_path = script_dir / "example_plugins" / "code-quality" -```bash -# Simple hello world -uv run python examples/01_standalone_sdk/01_hello_world.py +# Define plugins to load +# Supported sources: local path, "github:owner/repo", or git URL +# Optional: ref (branch/tag/commit), repo_path (for monorepos) +plugins = [ + PluginSource(source=str(plugin_path)), + # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), + # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), +] -# Custom tools -uv run python examples/01_standalone_sdk/02_custom_tools.py +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Set LLM_API_KEY to run this example") + print("EXAMPLE_COST: 0") + sys.exit(0) -# With skills -uv run python examples/01_standalone_sdk/03_activate_microagent.py +# Configure LLM and Agent +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="plugin-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) +agent = Agent( + llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] +) -# See all examples -ls examples/01_standalone_sdk/ -``` +# Create conversation with plugins - skills, MCP config, and hooks are merged +# Note: Plugins are loaded lazily on first send_message() or run() call +with tempfile.TemporaryDirectory() as tmpdir: + conversation = Conversation( + agent=agent, + workspace=tmpdir, + plugins=plugins, + ) + # Test: The "lint" keyword triggers the python-linting skill + # This first send_message() call triggers lazy plugin loading + conversation.send_message("How do I lint Python code? Brief answer please.") -## Next Steps + # Verify skills were loaded from the plugin (after lazy loading) + skills = ( + conversation.agent.agent_context.skills + if conversation.agent.agent_context + else [] + ) + print(f"Loaded {len(skills)} skill(s) from plugins") -### Explore Documentation + conversation.run() -- **[SDK Architecture](/sdk/arch/sdk)** - Deep dive into components -- **[Tool System](/sdk/arch/tool-system)** - Available tools -- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environments -- **[LLM Configuration](/sdk/arch/llm)** - Deep dive into language model configuration + print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` -### Build Custom Solutions + -- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools to expand agent capabilities -- **[MCP Integration](/sdk/guides/mcp)** - Connect to external tools via Model Context Protocol -- **[Docker Workspaces](/sdk/guides/agent-server/docker-sandbox)** - Sandbox agent execution in containers -### Get Help +## Next Steps -- **[Slack Community](https://openhands.dev/joinslack)** - Ask questions and share projects -- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features -- **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples +- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers +- **[Hooks](/sdk/guides/hooks)** - Understand hook event types +- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers -### Browser Use -Source: https://docs.openhands.dev/sdk/guides/agent-browser-use.md +### Secret Registry +Source: https://docs.openhands.dev/sdk/guides/secrets.md import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; > A ready-to-run example is available [here](#ready-to-run-example)! -The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built -on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, -and extracting content - all through natural language instructions. +The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. +It automatically detects secret references in bash commands, injects them as environment variables when needed, +and masks secret values in command outputs to prevent accidental exposure. -## How It Works +### Injecting Secrets -The [ready-to-run example](#ready-to-run-example) demonstrates combining multiple tools to create a capable web research agent: +Use the `update_secrets()` method to add secrets to your conversation. -1. **BrowserToolSet**: Provides automated browser control for web interaction -2. **FileEditorTool**: Allows the agent to read and write files if needed -3. **BashTool**: Enables command-line operations for additional functionality -The agent uses these tools to: -- Navigate to specified URLs -- Interact with web page elements (clicking, scrolling, etc.) -- Extract and analyze content from web pages -- Summarize information from multiple sources +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: -In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. +```python focus={4,11} icon="python" wrap +from openhands.sdk.conversation.secret_source import SecretSource -## Customization +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) -For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually -register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual -tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. -This gives you fine-grained control over which browser capabilities are exposed to the agent. +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +``` ## Ready-to-run Example -This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) -```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py import os from pydantic import SecretStr @@ -19412,18 +20757,13 @@ from openhands.sdk import ( LLM, Agent, Conversation, - Event, - LLMConvertibleEvent, - get_logger, ) +from openhands.sdk.secret import SecretSource from openhands.sdk.tool import Tool -from openhands.tools.browser_use import BrowserToolSet from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool -logger = get_logger(__name__) - # Configure LLM api_key = os.getenv("LLM_API_KEY") assert api_key is not None, "LLM_API_KEY environment variable is not set." @@ -19437,104 +20777,190 @@ llm = LLM( ) # Tools -cwd = os.getcwd() tools = [ - Tool( - name=TerminalTool.name, - ), + Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name), - Tool(name=BrowserToolSet.name), ] -# If you need fine-grained browser control, you can manually register individual browser -# tools by creating a BrowserToolExecutor and providing factories that return customized -# Tool instances before constructing the Agent. - # Agent agent = Agent(llm=llm, tools=tools) - -llm_messages = [] # collect raw LLM messages +conversation = Conversation(agent) -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} ) -conversation.send_message( - "Could you go to https://openhands.dev/ blog page and summarize main " - "points of the latest blog?" -) +conversation.send_message("just echo $SECRET_TOKEN") + conversation.run() -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") + +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP +- **[Security Analyzer](/sdk/guides/security)** - Add security validation + +### Security & Action Confirmation +Source: https://docs.openhands.dev/sdk/guides/security.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user +approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. + +## Confirmation Policy +> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! + +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python icon="python" focus={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) ``` - - -## Next Steps +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) -- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools -- **[MCP Integration](/sdk/guides/mcp)** - Connect external services +### Custom Confirmation Handler -### Creating Custom Agent -Source: https://docs.openhands.dev/sdk/guides/agent-custom.md +Implement your approval logic by checking conversation status: -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +```python icon="python" focus={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` -This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. +### Rejecting Actions - -This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) - +Provide feedback when rejecting to help the agent try a different approach: +```python icon="python" focus={2-5} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky." + "Please try a safer approach." + ) +``` -The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. +### Ready-to-run Example Confirmation -```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py -#!/usr/bin/env python3 -""" -Planning Agent Workflow Example + +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + -This example demonstrates a two-stage workflow: -1. Planning Agent: Analyzes the task and creates a detailed implementation plan -2. Execution Agent: Implements the plan with full editing capabilities +Require user approval before executing agent actions: -The task: Create a Python web scraper that extracts article titles and URLs -from a news website, handles rate limiting, and saves results to JSON. -""" +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" import os -import tempfile -from pathlib import Path +import signal +from collections.abc import Callable from pydantic import SecretStr -from openhands.sdk import LLM, Conversation -from openhands.sdk.llm import content_to_str +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer from openhands.tools.preset.default import get_default_agent -from openhands.tools.preset.planning import get_planning_agent -def get_event_content(event): - """Extract content from an event.""" - if hasattr(event, "llm_message"): - return "".join(content_to_str(event.llm_message.content)) - return str(event) +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) -"""Run the planning agent workflow example.""" +def _print_action_preview(pending_actions) -> None: + print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing actions…") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("▶️ Running conversation.run()…") + conversation.run() -# Create a temporary workspace -workspace_dir = Path(tempfile.mkdtemp()) -print(f"Working in: {workspace_dir}") # Configure LLM api_key = os.getenv("LLM_API_KEY") @@ -19542,572 +20968,621 @@ assert api_key is not None, "LLM_API_KEY environment variable is not set." model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") base_url = os.getenv("LLM_BASE_URL") llm = LLM( + usage_id="agent", model=model, base_url=base_url, api_key=SecretStr(api_key), - usage_id="agent", ) -# Task description -task = """ -Create a Python web scraper with the following requirements: -- Scrape article titles and URLs from a news website -- Handle HTTP errors gracefully with retry logic -- Save results to a JSON file with timestamp -- Use requests and BeautifulSoup for scraping - -Do NOT ask for any clarifying questions. Directly create your implementation plan. -""" - -print("=" * 80) -print("PHASE 1: PLANNING") -print("=" * 80) - -# Create Planning Agent with read-only tools -planning_agent = get_planning_agent(llm=llm) +agent = get_default_agent(llm=llm) +conversation = Conversation(agent=agent, workspace=os.getcwd()) -# Create conversation for planning -planning_conversation = Conversation( - agent=planning_agent, - workspace=str(workspace_dir), -) +# Conditionally add security analyzer based on environment variable +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") + conversation.set_security_analyzer(LLMSecurityAnalyzer()) -# Run planning phase -print("Planning Agent is analyzing the task and creating implementation plan...") -planning_conversation.send_message( - f"Please analyze this web scraping task and create a detailed " - f"implementation plan:\n\n{task}" -) -planning_conversation.run() +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) -print("\n" + "=" * 80) -print("PLANNING COMPLETE") -print("=" * 80) -print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) -print("\n" + "=" * 80) -print("PHASE 2: EXECUTION") -print("=" * 80) +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) -# Create Execution Agent with full editing capabilities -execution_agent = get_default_agent(llm=llm, cli_mode=True) +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() -# Create conversation for execution -execution_conversation = Conversation( - agent=execution_agent, - workspace=str(workspace_dir), +conversation.send_message( + "Please delete any file that was created during this conversation." ) +conversation.run() -# Prepare execution prompt with reference to the plan file -execution_prompt = f""" -Please implement the web scraping project according to the implementation plan. - -The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md - -Please read the plan from PLAN.md and implement all components according to it. - -Create all necessary files, implement the functionality, and ensure everything -works together properly. -""" - -print("Execution Agent is implementing the plan...") -execution_conversation.send_message(execution_prompt) -execution_conversation.run() - -# Get the last message from the conversation -execution_result = execution_conversation.state.events[-1] - -print("\n" + "=" * 80) -print("EXECUTION RESULT:") -print("=" * 80) -print(get_event_content(execution_result)) - -print("\n" + "=" * 80) -print("WORKFLOW COMPLETE") -print("=" * 80) -print(f"Project files created in: {workspace_dir}") - -# List created files -print("\nCreated files:") -for file_path in workspace_dir.rglob("*"): - if file_path.is_file(): - print(f" - {file_path.relative_to(workspace_dir)}") - -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets execution_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") ``` - + -## Anatomy of a Custom Agent +--- -The planning agent demonstrates the two key components for creating specialized agent: +## Security Analyzer -### 1. Custom Tool Selection +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: -Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined -```python icon="python" +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. -def register_planning_tools() -> None: - """Register the planning agent tools.""" - from openhands.tools.glob import GlobTool - from openhands.tools.grep import GrepTool - from openhands.tools.planning_file_editor import PlanningFileEditorTool +### LLM Security Analyzer - register_tool("GlobTool", GlobTool) - logger.debug("Tool: GlobTool registered.") - register_tool("GrepTool", GrepTool) - logger.debug("Tool: GrepTool registered.") - register_tool("PlanningFileEditorTool", PlanningFileEditorTool) - logger.debug("Tool: PlanningFileEditorTool registered.") +> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. -def get_planning_tools() -> list[Tool]: - """Get the planning agent tool specifications. +#### Security Analyzer Configuration - Returns: - List of tools optimized for planning and analysis tasks, including - file viewing and PLAN.md editing capabilities for advanced - code discovery and navigation. - """ - register_planning_tools() +Create an LLM-based security analyzer to review actions before execution: - return [ - Tool(name="GlobTool"), - Tool(name="GrepTool"), - Tool(name="PlanningFileEditorTool"), - ] +```python icon="python" focus={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) ``` -The planning agent uses: -- **GlobTool**: For discovering files and directories matching patterns -- **GrepTool**: For searching specific content across files -- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only - -This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent -### 2. System Prompt Customization +#### Ready-to-run Example Security Analyzer -Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: -1. **Objective**: Clear goal statement -2. **Context Summary**: Relevant system components and constraints -3. **Approach Overview**: High-level strategy and rationale -4. **Implementation Steps**: Detailed step-by-step execution plan -5. **Testing and Validation**: Verification methods and success criteria + +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + -### Complete Implementation Reference +Automatically analyze agent actions for security risks before execution: -For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) -## Next Steps +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" -- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case -- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management -- **[MCP Integration](/sdk/guides/mcp)** - Add MCP +import os +import signal +from collections.abc import Callable -### Sub-Agent Delegation -Source: https://docs.openhands.dev/sdk/guides/agent-delegation.md +from pydantic import SecretStr -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -> A ready-to-run example is available [here](#ready-to-run-example)! -## Overview +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) -Agent delegation allows a main agent to spawn multiple sub-agents and delegate tasks to them for parallel processing. Each sub-agent runs independently with its own conversation context and returns results that the main agent can consolidate and process further. -This pattern is useful when: -- Breaking down complex problems into independent subtasks -- Processing multiple related tasks in parallel -- Separating concerns between different specialized sub-agents -- Improving throughput for parallelizable work +def _print_blocked_actions(pending_actions) -> None: + print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") -## How It Works -The delegation system consists of two main operations: +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False -### 1. Spawning Sub-Agents + if ans in ("yes", "y"): + print("✅ Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") -Before delegating work, the agent must first spawn sub-agents with meaningful identifiers: -```python icon="python" wrap -# Agent uses the delegate tool to spawn sub-agents -{ - "command": "spawn", - "ids": ["lodging", "activities"] -} -``` +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set execution_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue -Each spawned sub-agent: -- Gets a unique identifier that the agent specify (e.g., "lodging", "activities") -- Inherits the same LLM configuration as the parent agent -- Operates in the same workspace as the main agent -- Maintains its own independent conversation context + print("▶️ Running conversation.run()...") + conversation.run() -### 2. Delegating Tasks -Once sub-agents are spawned, the agent can delegate tasks to them: +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) -```python icon="python" wrap -# Agent uses the delegate tool to assign tasks -{ - "command": "delegate", - "tasks": { - "lodging": "Find the best budget-friendly areas to stay in London", - "activities": "List top 5 must-see attractions and hidden gems in London" - } -} -``` +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] -The delegate operation: -- Runs all sub-agent tasks in parallel using threads -- Blocks until all sub-agents complete their work -- Returns a single consolidated observation with all results -- Handles errors gracefully and reports them per sub-agent +# Agent +agent = Agent(llm=llm, tools=tools) -## Setting Up the DelegateTool +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) +conversation.set_confirmation_policy(ConfirmRisky()) - - - ### Register the Tool +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() - ```python icon="python" wrap - from openhands.sdk.tool import register_tool - from openhands.tools.delegate import DelegateTool +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` - register_tool("DelegateTool", DelegateTool) - ``` - - - ### Add to Agent Tools + - ```python icon="python" wrap - from openhands.sdk import Tool - from openhands.tools.preset.default import get_default_tools +### Custom Security Analyzer Implementation - tools = get_default_tools(enable_browser=False) - tools.append(Tool(name="DelegateTool")) +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. - agent = Agent(llm=llm, tools=tools) - ``` - - - ### Configure Maximum Sub-Agents (Optional) +#### Creating a Custom Analyzer - The user can limit the maximum number of concurrent sub-agents: +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: - ```python icon="python" wrap - from openhands.tools.delegate import DelegateTool +```python icon="python" focus={5, 8} +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent - class CustomDelegateTool(DelegateTool): - @classmethod - def create(cls, conv_state, max_children: int = 3): - # Only allow up to 3 sub-agents - return super().create(conv_state, max_children=max_children) +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" - register_tool("DelegateTool", CustomDelegateTool) - ``` - - + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` -## Tool Commands + + For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + -### spawn -Initialize sub-agents with meaningful identifiers. +--- -**Parameters:** -- `command`: `"spawn"` -- `ids`: List of string identifiers (e.g., `["research", "implementation", "testing"]`) +## Configurable Security Policy -**Returns:** -A message indicating the sub-agents were successfully spawned. +> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! -**Example:** -```python icon="python" wrap -{ - "command": "spawn", - "ids": ["research", "implementation", "testing"] -} -``` +Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. -### delegate -Send tasks to specific sub-agents and wait for results. +### Using Custom Security Policies -**Parameters:** -- `command`: `"delegate"` -- `tasks`: Dictionary mapping sub-agent IDs to task descriptions +You can provide a custom security policy template when creating an agent: -**Returns:** -A consolidated message containing all results from the sub-agents. +```python focus={9-13} icon="python" +from openhands.sdk import Agent, LLM -**Example:** -```python icon="python" wrap -{ - "command": "delegate", - "tasks": { - "research": "Find best practices for async code", - "implementation": "Refactor the MyClass class", - "testing": "Write unit tests for the refactored code" - } -} +llm = LLM( + usage_id="agent", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), +) + +# Provide a custom security policy template file +agent = Agent( + llm=llm, + tools=tools, + security_policy_filename="my_security_policy.j2", +) ``` -## Ready-to-run Example +Custom security policies allow you to: +- Define organization-specific risk assessment guidelines +- Set custom thresholds for security risk levels +- Add domain-specific security rules +- Tailor risk evaluation to your use case + +The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. + +### Ready-to-run Example Security Policy -This example is available on GitHub: [examples/01_standalone_sdk/25_agent_delegation.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_agent_delegation.py) +Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) -```python icon="python" expandable examples/01_standalone_sdk/25_agent_delegation.py -""" -Agent Delegation Example +Define custom security risk guidelines for your agent: -This example demonstrates the agent delegation feature where a main agent -delegates tasks to sub-agents for parallel processing. -Each sub-agent runs independently and returns its results to the main agent, -which then merges both analyses into a single consolidated report. +```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py +"""OpenHands Agent SDK — Configurable Security Policy Example + +This example demonstrates how to use a custom security policy template +with an agent. Security policies define risk assessment guidelines that +help agents evaluate the safety of their actions. + +By default, agents use the built-in security_policy.j2 template. This +example shows how to: +1. Use the default security policy +2. Provide a custom security policy template embedded in the script +3. Apply the custom policy to guide agent behavior """ import os +import tempfile +from pathlib import Path from pydantic import SecretStr from openhands.sdk import ( LLM, Agent, - AgentContext, Conversation, - Tool, + Event, + LLMConvertibleEvent, get_logger, ) -from openhands.sdk.context import Skill -from openhands.sdk.tool import register_tool -from openhands.tools.delegate import ( - DelegateTool, - DelegationVisualizer, - register_agent, +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Define a custom security policy template inline +CUSTOM_SECURITY_POLICY = ( + "# 🔐 Custom Security Risk Policy\n" + "When using tools that support the security_risk parameter, assess the " + "safety risk of your actions:\n" + "\n" + "- **LOW**: Safe read-only actions.\n" + " - Viewing files, calculations, documentation.\n" + "- **MEDIUM**: Moderate container-scoped actions.\n" + " - File modifications, package installations.\n" + "- **HIGH**: Potentially dangerous actions.\n" + " - Network access, system modifications, data exfiltration.\n" + "\n" + "**Custom Rules**\n" + "- Always prioritize user data safety.\n" + "- Escalate to **HIGH** for any external data transmission.\n" ) -from openhands.tools.preset.default import get_default_tools +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Example 1: Agent with default security policy +print("=" * 100) +print("Example 1: Agent with default security policy") +print("=" * 100) +default_agent = Agent(llm=llm, tools=tools) +print(f"Security policy filename: {default_agent.security_policy_filename}") +print("\nDefault security policy is embedded in the agent's system message.") + +# Example 2: Agent with custom security policy +print("\n" + "=" * 100) +print("Example 2: Agent with custom security policy") +print("=" * 100) + +# Create a temporary file for the custom security policy +with tempfile.NamedTemporaryFile( + mode="w", suffix=".j2", delete=False, encoding="utf-8" +) as temp_file: + temp_file.write(CUSTOM_SECURITY_POLICY) + custom_policy_path = temp_file.name + +try: + # Create agent with custom security policy (using absolute path) + custom_agent = Agent( + llm=llm, + tools=tools, + security_policy_filename=custom_policy_path, + ) + print(f"Security policy filename: {custom_agent.security_policy_filename}") + print("\nCustom security policy loaded from temporary file.") + + # Verify the custom policy is in the system message + system_message = custom_agent.static_system_message + if "Custom Security Risk Policy" in system_message: + print("✓ Custom security policy successfully embedded in system message.") + else: + print("✗ Custom security policy not found in system message.") + + # Run a conversation with the custom agent + print("\n" + "=" * 100) + print("Running conversation with custom security policy") + print("=" * 100) + + llm_messages = [] # collect raw LLM messages + + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + conversation = Conversation( + agent=custom_agent, + callbacks=[conversation_callback], + workspace=".", + ) + + conversation.send_message( + "Please create a simple Python script named hello.py that prints " + "'Hello, World!'. Make sure to follow security best practices." + ) + conversation.run() + + print("\n" + "=" * 100) + print("Conversation finished.") + print(f"Total LLM messages: {len(llm_messages)}") + print("=" * 100) -ONLY_RUN_SIMPLE_DELEGATION = False + # Report cost + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") -logger = get_logger(__name__) +finally: + # Clean up temporary file + Path(custom_policy_path).unlink(missing_ok=True) -# Configure LLM and agent -# You can get an API key from https://app.all-hands.dev/settings/api-keys -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=os.environ.get("LLM_BASE_URL", None), - usage_id="agent", +print("\n" + "=" * 100) +print("Example Summary") +print("=" * 100) +print("This example demonstrated:") +print("1. Using the default security policy (security_policy.j2)") +print("2. Creating a custom security policy template") +print("3. Applying the custom policy via security_policy_filename parameter") +print("4. Running a conversation with the custom security policy") +print( + "\nYou can customize security policies to match your organization's " + "specific requirements." ) +``` -cwd = os.getcwd() + -register_tool("DelegateTool", DelegateTool) -tools = get_default_tools(enable_browser=False) -tools.append(Tool(name="DelegateTool")) +## Next Steps -main_agent = Agent( - llm=llm, - tools=tools, -) -conversation = Conversation( - agent=main_agent, - workspace=cwd, - visualizer=DelegationVisualizer(name="Delegator"), -) +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management -task_message = ( - "Forget about coding. Let's switch to travel planning. " - "Let's plan a trip to London. I have two issues I need to solve: " - "Lodging: what are the best areas to stay at while keeping budget in mind? " - "Activities: what are the top 5 must-see attractions and hidden gems? " - "Please use the delegation tools to handle these two tasks in parallel. " - "Make sure the sub-agents use their own knowledge " - "and dont rely on internet access. " - "They should keep it short. After getting the results, merge both analyses " - "into a single consolidated report.\n\n" -) -conversation.send_message(task_message) -conversation.run() +### Agent Skills & Context +Source: https://docs.openhands.dev/sdk/guides/skill.md -conversation.send_message( - "Ask the lodging sub-agent what it thinks about Covent Garden." -) -conversation.run() +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; -# Report cost for simple delegation example -cost_1 = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST (simple delegation): {cost_1}") +This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). -print("Simple delegation example done!", "\n" * 20) +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. +## Context Loading Methods -# -------- Agent Delegation Second Part: User-Defined Agent Types -------- +| Method | When Content Loads | Use Case | +|--------|-------------------|----------| +| **Always-loaded** | At conversation start | Repository rules, coding standards | +| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | +| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | -if ONLY_RUN_SIMPLE_DELEGATION: - exit(0) +## Always-Loaded Context +Content that's always in the system prompt. -def create_lodging_planner(llm: LLM) -> Agent: - """Create a lodging planner focused on London stays.""" - skills = [ - Skill( - name="lodging_planning", - content=( - "You specialize in finding great places to stay in London. " - "Provide 3-4 hotel recommendations with neighborhoods, quick " - "pros/cons, " - "and notes on transit convenience. Keep options varied by budget." - ), - trigger=None, - ) - ] - return Agent( - llm=llm, - tools=[], - agent_context=AgentContext( - skills=skills, - system_message_suffix="Focus only on London lodging recommendations.", - ), - ) +### Option 1: `AGENTS.md` (Auto-loaded) +Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). -def create_activities_planner(llm: LLM) -> Agent: - """Create an activities planner focused on London itineraries.""" - skills = [ - Skill( - name="activities_planning", - content=( - "You design concise London itineraries. Suggest 2-3 daily " - "highlights, grouped by proximity to minimize travel time. " - "Include food/coffee stops " - "and note required tickets/reservations." - ), - trigger=None, - ) - ] - return Agent( - llm=llm, - tools=[], - agent_context=AgentContext( - skills=skills, - system_message_suffix="Plan practical, time-efficient days in London.", - ), - ) +```python icon="python" focus={3, 4} +from openhands.sdk.context.skills import load_project_skills +# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root +skills = load_project_skills(workspace_dir="/path/to/repo") +agent_context = AgentContext(skills=skills) +``` -# Register user-defined agent types (default agent type is always available) -register_agent( - name="lodging_planner", - factory_func=create_lodging_planner, - description="Finds London lodging options with transit-friendly picks.", -) -register_agent( - name="activities_planner", - factory_func=create_activities_planner, - description="Creates time-efficient London activity itineraries.", -) +### Option 2: Inline Skill (Code-defined) -# Make the delegation tool available to the main agent -register_tool("DelegateTool", DelegateTool) +```python icon="python" focus={5-11} +from openhands.sdk import AgentContext +from openhands.sdk.context import Skill -main_agent = Agent( - llm=llm, - tools=[Tool(name="DelegateTool")], -) -conversation = Conversation( - agent=main_agent, - workspace=cwd, - visualizer=DelegationVisualizer(name="Delegator"), +agent_context = AgentContext( + skills=[ + Skill( + name="code-style", + content="Always use type hints in Python.", + trigger=None, # No trigger = always loaded + ), + ] ) +``` -task_message = ( - "Plan a 3-day London trip. " - "1) Spawn two sub-agents: lodging_planner (hotel options) and " - "activities_planner (itinerary). " - "2) Ask lodging_planner for 3-4 central London hotel recommendations with " - "neighborhoods, quick pros/cons, and transit notes by budget. " - "3) Ask activities_planner for a concise 3-day itinerary with nearby stops, " - " food/coffee suggestions, and any ticket/reservation notes. " - "4) Share both sub-agent results and propose a combined plan." -) +## Trigger-Loaded Context -print("=" * 100) -print("Demonstrating London trip delegation (lodging + activities)...") -print("=" * 100) +Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). -conversation.send_message(task_message) -conversation.run() +```python icon="python" focus={6} +from openhands.sdk.context import Skill, KeywordTrigger -conversation.send_message( - "Ask the lodging sub-agent what it thinks about Covent Garden." +Skill( + name="encryption-helper", + content="Use the encrypt.sh script to encrypt messages.", + trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), ) -conversation.run() - -# Report cost for user-defined agent types example -cost_2 = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST (user-defined agents): {cost_2}") - -print("All done!") - -# Full example cost report for CI workflow -print(f"EXAMPLE_COST: {cost_1 + cost_2}") ``` - - -### Interactive Terminal -Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md +When user says "encrypt this", the content is injected into the message: -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +```xml icon="file" + +The following information has been included based on a keyword match for "encrypt". +Skill location: /path/to/encryption-helper -> A ready-to-run example is available [here](#ready-to-run-example)! +Use the encrypt.sh script to encrypt messages. + +``` -The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. +## Progressive Disclosure (AgentSkills Standard) +For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. -## How It Works +```python icon="python" +from openhands.sdk.context.skills import load_skills_from_dir -```python icon="python" focus={4-7} -cwd = os.getcwd() -register_tool("BashTool", BashTool) -tools = [ - Tool( - name="BashTool", - params={"no_change_timeout_seconds": 3}, - ) -] +# Load SKILL.md files from a directory +_, _, agent_skills = load_skills_from_dir("/path/to/skills") +agent_context = AgentContext(skills=list(agent_skills.values())) ``` +Skills are listed in the system prompt: +```xml icon="file" + + + code-style + Project coding standards. + /path/to/code-style/SKILL.md + + +``` -The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. - -In the example above, the agent should: -1. Enters Python's interactive mode by running `python3` -2. Executes Python code to get the current time -3. Exits the Python interpreter + +Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. + -The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/definition.py) and [terminal source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. +--- -## Ready-to-run Example +## Full Example -This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) +Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) - -```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py import os from pydantic import SecretStr @@ -20115,12 +21590,18 @@ from pydantic import SecretStr from openhands.sdk import ( LLM, Agent, + AgentContext, Conversation, Event, LLMConvertibleEvent, get_logger, ) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool from openhands.tools.terminal import TerminalTool @@ -20143,12 +21624,61 @@ cwd = os.getcwd() tools = [ Tool( name=TerminalTool.name, - params={"no_change_timeout_seconds": 3}, - ) + ), + Tool(name=FileEditorTool.name), ] +# AgentContext provides flexible ways to customize prompts: +# 1. Skills: Inject instructions (always-active or keyword-triggered) +# 2. system_message_suffix: Append text to the system prompt +# 3. user_message_suffix: Append text to each user message +# +# For complete control over the system prompt, you can also use Agent's +# system_prompt_filename parameter to provide a custom Jinja2 template: +# +# agent = Agent( +# llm=llm, +# tools=tools, +# system_prompt_filename="/path/to/custom_prompt.j2", +# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, +# ) +# +# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active (repo skill) + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + # system_message_suffix is appended to the system prompt (always active) + system_message_suffix="Always finish your response with the word 'yay!'", + # user_message_suffix is appended to each user message + user_message_suffix="The first character of your response should be 'I'", + # You can also enable automatic load skills from + # public registry at https://github.com/OpenHands/extensions + load_public_skills=True, +) + # Agent -agent = Agent(llm=llm, tools=tools) +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) llm_messages = [] # collect raw LLM messages @@ -20162,9 +21692,20 @@ conversation = Conversation( agent=agent, callbacks=[conversation_callback], workspace=cwd ) +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") +conversation.run() + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Now triggering public skill 'github'") conversation.send_message( - "Enter python interactive mode by directly running `python3`, then tell me " - "the current time, and exit python interactive mode." + "About GitHub - tell me what additional info I've just provided?" ) conversation.run() @@ -20172,12314 +21713,10773 @@ print("=" * 100) print("Conversation finished. Got the following LLM messages:") for i, message in enumerate(llm_messages): print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") ``` - + -## Next Steps +### Creating Skills -- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases +Skills are defined with a name, content (the instructions), and an optional trigger: -### API-based Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md +```python icon="python" focus={3-14} +agent_context = AgentContext( + skills=[ + Skill( + name="AGENTS.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +### Keyword Triggers +Use `KeywordTrigger` to activate skills only when specific words appear: -The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. +```python icon="python" focus={4} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` -## Key Concepts -### APIRemoteWorkspace +## File-Based Skills (`SKILL.md`) -The `APIRemoteWorkspace` connects to a hosted runtime API service: +For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. -```python icon="python" -with APIRemoteWorkspace( - runtime_api_url="https://runtime.eval.all-hands.dev", - runtime_api_key=runtime_api_key, - server_image="ghcr.io/openhands/agent-server:main-python", -) as workspace: -``` + +Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) + -This workspace type: -- Connects to a remote runtime API service -- Automatically provisions sandboxed environments -- Manages container lifecycle through the API -- Handles all infrastructure concerns +### Directory Structure -### Runtime API Authentication +Each skill is a directory containing: -The example requires a runtime API key for authentication: + + + + + + + + + + + + + + -```python icon="python" -runtime_api_key = os.getenv("RUNTIME_API_KEY") -if not runtime_api_key: - logger.error("RUNTIME_API_KEY required") - exit(1) -``` +where -This key authenticates your requests to the hosted runtime service. +| Component | Required | Description | +|-------|----------|-------------| +| `SKILL.md` | Yes | Skill definition with frontmatter | +| `scripts/` | No | Executable scripts | +| `references/` | No | Reference documentation | +| `assets/` | No | Static assets | -### Pre-built Image Selection -You can specify which pre-built agent server image to use: -```python icon="python" focus={4} -APIRemoteWorkspace( - runtime_api_url="https://runtime.eval.all-hands.dev", - runtime_api_key=runtime_api_key, - server_image="ghcr.io/openhands/agent-server:main-python", -) -``` +### `SKILL.md` Format -The runtime API will pull and run the specified image in a sandboxed environment. +The `SKILL.md` file defines the skill with YAML frontmatter: -### Workspace Testing +```md icon="markdown" +--- +name: my-skill # Required (standard) +description: > # Required (standard) + A brief description of what this skill does and when to use it. +license: MIT # Optional (standard) +compatibility: Requires bash # Optional (standard) +metadata: # Optional (standard) + author: your-name + version: "1.0" +triggers: # Optional (OpenHands extension) + - keyword1 + - keyword2 +--- -Just like with `DockerWorkspace`, you can test the workspace before running the agent: +# Skill Content -```python icon="python" focus={1-3} -result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" -) -logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +Instructions and documentation for the agent... ``` -This verifies connectivity to the remote runtime and ensures the environment is ready. - -### Automatic RemoteConversation - -The conversation uses WebSocket communication with the remote server: +#### Frontmatter Fields -```python icon="python" focus={1, 7} -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True -) -assert isinstance(conversation, RemoteConversation) -``` +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Skill identifier (lowercase + hyphens) | +| `description` | Yes | What the skill does (shown to agent) | +| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | +| `license` | No | License name | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Custom key-value pairs | -All agent execution happens on the remote runtime infrastructure. + +Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. + -## Ready-to-run Example +### Loading Skills - -This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) - +Use `load_skills_from_dir()` to load all skills from a directory: -This example shows how to connect to a hosted runtime API for fully managed agent execution: +```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py +"""Example: Loading Skills from Disk (AgentSkills Standard) -```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py -"""Example: APIRemoteWorkspace with Dynamic Build. +This example demonstrates how to load skills following the AgentSkills standard +from a directory on disk. -This example demonstrates building an agent-server image on-the-fly from the SDK -codebase and launching it in a remote sandboxed environment via Runtime API. +Skills are modular, self-contained packages that extend an agent's capabilities +by providing specialized knowledge, workflows, and tools. They follow the +AgentSkills standard which includes: +- SKILL.md file with frontmatter metadata (name, description, triggers) +- Optional resource directories: scripts/, references/, assets/ -Usage: - uv run examples/24_remote_convo_with_api_sandboxed_server.py +The example_skills/ directory contains two skills: +- rot13-encryption: Has triggers (encrypt, decrypt) - listed in + AND content auto-injected when triggered +- code-style-guide: No triggers - listed in for on-demand access -Requirements: - - LLM_API_KEY: API key for LLM access - - RUNTIME_API_KEY: API key for runtime API access +All SKILL.md files follow the AgentSkills progressive disclosure model: +they are listed in with name, description, and location. +Skills with triggers get the best of both worlds: automatic content injection +when triggered, plus the agent can proactively read them anytime. """ import os -import time +import sys +from pathlib import Path from pydantic import SecretStr -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - get_logger, +from openhands.sdk import LLM, Agent, AgentContext, Conversation +from openhands.sdk.context.skills import ( + discover_skill_resources, + load_skills_from_dir, ) -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import APIRemoteWorkspace +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool -logger = get_logger(__name__) +# Get the directory containing this script +script_dir = Path(__file__).parent +example_skills_dir = script_dir / "example_skills" + +# ========================================================================= +# Part 1: Loading Skills from a Directory +# ========================================================================= +print("=" * 80) +print("Part 1: Loading Skills from a Directory") +print("=" * 80) + +print(f"Loading skills from: {example_skills_dir}") + +# Discover resources in the skill directory +skill_subdir = example_skills_dir / "rot13-encryption" +resources = discover_skill_resources(skill_subdir) +print("\nDiscovered resources in rot13-encryption/:") +print(f" - scripts: {resources.scripts}") +print(f" - references: {resources.references}") +print(f" - assets: {resources.assets}") + +# Load skills from the directory +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) + +print("\nLoaded skills from directory:") +print(f" - Repo skills: {list(repo_skills.keys())}") +print(f" - Knowledge skills: {list(knowledge_skills.keys())}") +print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") + +# Access the loaded skill and show all AgentSkills standard fields +if agent_skills: + skill_name = next(iter(agent_skills)) + loaded_skill = agent_skills[skill_name] + print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") + print(f" - Name: {loaded_skill.name}") + desc = loaded_skill.description or "" + print(f" - Description: {desc[:70]}...") + print(f" - License: {loaded_skill.license}") + print(f" - Compatibility: {loaded_skill.compatibility}") + print(f" - Metadata: {loaded_skill.metadata}") + if loaded_skill.resources: + print(" - Resources:") + print(f" - Scripts: {loaded_skill.resources.scripts}") + print(f" - References: {loaded_skill.resources.references}") + print(f" - Assets: {loaded_skill.resources.assets}") + print(f" - Skill root: {loaded_skill.resources.skill_root}") +# ========================================================================= +# Part 2: Using Skills with an Agent +# ========================================================================= +print("\n" + "=" * 80) +print("Part 2: Using Skills with an Agent") +print("=" * 80) +# Check for API key api_key = os.getenv("LLM_API_KEY") -assert api_key, "LLM_API_KEY required" +if not api_key: + print("Skipping agent demo (LLM_API_KEY not set)") + print("\nTo run the full demo, set the LLM_API_KEY environment variable:") + print(" export LLM_API_KEY=your-api-key") + sys.exit(0) +# Configure LLM +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), + usage_id="skills-demo", + model=model, api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), ) -runtime_api_key = os.getenv("RUNTIME_API_KEY") -if not runtime_api_key: - logger.error("RUNTIME_API_KEY required") - exit(1) +# Create agent context with loaded skills +agent_context = AgentContext( + skills=list(agent_skills.values()), + # Disable public skills for this demo to keep output focused + load_public_skills=False, +) +# Create agent with tools so it can read skill resources +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) -# If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency -# Otherwise, use the latest image from main -server_image_sha = os.getenv("GITHUB_SHA") or "main" -server_image = f"ghcr.io/openhands/agent-server:{server_image_sha[:7]}-python-amd64" -logger.info(f"Using server image: {server_image}") +# Create conversation +conversation = Conversation(agent=agent, workspace=os.getcwd()) -with APIRemoteWorkspace( - runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), - runtime_api_key=runtime_api_key, - server_image=server_image, - image_pull_policy="Always", -) as workspace: - agent = get_default_agent(llm=llm, cli_mode=True) - received_events: list = [] - last_event_time = {"ts": time.time()} +# Test the skill (triggered by "encrypt" keyword) +# The skill provides instructions and a script for ROT13 encryption +print("\nSending message with 'encrypt' keyword to trigger skill...") +conversation.send_message("Encrypt the message 'hello world'.") +conversation.run() - def event_callback(event) -> None: - received_events.append(event) - last_event_time["ts"] = time.time() +print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") +print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` - result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" - ) - logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + - conversation = Conversation( - agent=agent, workspace=workspace, callbacks=[event_callback] - ) - assert isinstance(conversation, RemoteConversation) - try: - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) - conversation.run() +### Key Functions - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) +#### `load_skills_from_dir()` - conversation.send_message("Great! Now delete that file.") - conversation.run() - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") - finally: - conversation.close() +Loads all skills from a directory, returning three dictionaries: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import load_skills_from_dir + +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) ``` -You can run the example code as-is. +- **repo_skills**: Skills from `repo.md` files (always active) +- **knowledge_skills**: Skills from `knowledge/` subdirectories +- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) -```bash Running the Example -export LLM_API_KEY="your-api-key" -# If using the OpenHands LLM proxy, set its base URL: -export LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" -export RUNTIME_API_KEY="your-runtime-api-key" -# Set the runtime API URL for the remote sandbox -export RUNTIME_API_URL="https://runtime.eval.all-hands.dev" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +#### `discover_skill_resources()` + +Discovers resource files in a skill directory: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import discover_skill_resources + +resources = discover_skill_resources(skill_dir) +print(resources.scripts) # List of script files +print(resources.references) # List of reference files +print(resources.assets) # List of asset files +print(resources.skill_root) # Path to skill directory ``` -## Next Steps +### Skill Location in Prompts -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** -- **[Local Agent Server](/sdk/guides/agent-server/local-server)** -- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details -- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture +The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: -### Apptainer Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md +``` + +The following information has been included based on a keyword match for "encrypt". -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Skill location: /path/to/rot13-encryption +(Use this path to resolve relative file references in the skill content below) -> A ready-to-run example is available [here](#basic-apptainer-sandbox-example)! +[skill content from SKILL.md] + +``` -The Apptainer sandboxed agent server demonstrates how to run agents in isolated Apptainer containers using ApptainerWorkspace. +This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. -Apptainer (formerly Singularity) is a container runtime designed for HPC environments that doesn't require root access, making it ideal for shared computing environments, university clusters, and systems where Docker is not available. +### Example Skill: ROT13 Encryption -## When to Use Apptainer +Here's a skill with triggers (OpenHands extension): -Use Apptainer instead of Docker when: -- Running on HPC clusters or shared computing environments -- Root access is not available -- Docker daemon cannot be installed -- Working in academic or research computing environments -- Security policies restrict Docker usage +**SKILL.md:** +```markdown icon="markdown" +--- +name: rot13-encryption +description: > + This skill helps encrypt and decrypt messages using ROT13 cipher. +triggers: + - encrypt + - decrypt + - cipher +--- -## Prerequisites +# ROT13 Encryption Skill + +Run the [encrypt.sh](scripts/encrypt.sh) script with your message: + +\`\`\`bash +./scripts/encrypt.sh "your message" +\`\`\` +``` -Before running this example, ensure you have: -- Apptainer installed ([Installation Guide](https://apptainer.org/docs/user/main/quick_start.html)) -- LLM API key set in environment +**scripts/encrypt.sh:** +```bash icon="sh" +#!/bin/bash +echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +``` -## Basic Apptainer Sandbox Example +When the user says "encrypt", the skill is triggered and the agent can use the provided script. - -This example is available on GitHub: [examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py) - +## Loading Public Skills -This example shows how to create an `ApptainerWorkspace` that automatically manages Apptainer containers for agent execution: +OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. -```python icon="python" expandable examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py -import os -import platform -import time +### Automatic Loading via AgentContext -from pydantic import SecretStr +Enable public skills loading in your `AgentContext`: -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - get_logger, +```python icon="python" focus={2} +agent_context = AgentContext( + load_public_skills=True, # Auto-load from public registry + skills=[ + # Your custom skills here + ] ) -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import ApptainerWorkspace +``` +When enabled, the SDK will: +1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run +2. Load all available skills from the repository +3. Merge them with your explicitly defined skills -logger = get_logger(__name__) +### Skill Naming and Triggers -# 1) Ensure we have LLM API key -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), +**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. + +```python icon="python" +# Both skills will be triggered by "/codereview" +agent_context = AgentContext( + load_public_skills=True, # Loads public "code-review" skill + skills=[ + Skill( + name="custom-codereview-guide", # Different name = coexists + content="Project-specific guidelines...", + trigger=KeywordTrigger(keywords=["/codereview"]), + ), + ] ) +``` + +**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. + -def detect_platform(): - """Detects the correct platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" +### Programmatic Loading +You can also load public skills manually and have more control: -def get_server_image(): - """Get the server image tag, using PR-specific image in CI.""" - platform_str = detect_platform() - arch = "arm64" if "arm64" in platform_str else "amd64" - # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency - # Otherwise, use the latest image from main - github_sha = os.getenv("GITHUB_SHA") - if github_sha: - return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" - return "ghcr.io/openhands/agent-server:latest-python" +```python icon="python" +from openhands.sdk.context.skills import load_public_skills +# Load all public skills +public_skills = load_public_skills() -# 2) Create an Apptainer-based remote workspace that will set up and manage -# the Apptainer container automatically. Use `ApptainerWorkspace` with a -# pre-built agent server image. -# Apptainer (formerly Singularity) doesn't require root access, making it -# ideal for HPC and shared computing environments. -server_image = get_server_image() -logger.info(f"Using server image: {server_image}") -with ApptainerWorkspace( - # use pre-built image for faster startup - server_image=server_image, - host_port=8010, - platform=detect_platform(), -) as workspace: - # 3) Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) +# Use with AgentContext +agent_context = AgentContext(skills=public_skills) - # 4) Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} +# Or combine with custom skills +my_skills = [ + Skill(name="custom", content="Custom instructions", trigger=None) +] +agent_context = AgentContext(skills=my_skills + public_skills) +``` - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() +### Custom Skills Repository - # 5) Test the workspace with a simple command - result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" - ) - logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" - ) - logger.info(f"Output: {result.stdout}") - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) +You can load skills from your own repository: - try: - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") +```python icon="python" focus={3-7} +from openhands.sdk.context.skills import load_public_skills - logger.info("📝 Sending first message...") - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) - logger.info("🚀 Running conversation...") - conversation.run() - logger.info("✅ First task completed!") - logger.info(f"Agent status: {conversation.state.execution_status}") +# Load from a custom repository +custom_skills = load_public_skills( + repo_url="https://github.com/my-org/my-skills", + branch="main" +) +``` - # Wait for events to settle (no events for 2 seconds) - logger.info("⏳ Waiting for events to stop...") - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) - logger.info("✅ Events have stopped") +### How It Works - logger.info("🚀 Running conversation again...") - conversation.send_message("Great! Now delete that file.") - conversation.run() - logger.info("✅ Second task completed!") +The `load_public_skills()` function uses git-based caching for efficiency: - # Report cost (must be before conversation.close()) - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") - finally: - print("\n🧹 Cleaning up conversation...") - conversation.close() -``` +- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` +- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date +- **Offline mode**: Uses the cached version if network is unavailable - +This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. -## Configuration Options + +Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. + -The `ApptainerWorkspace` supports several configuration options: +## Customizing Agent Context -### Option 1: Pre-built Image (Recommended) +### Message Suffixes -Use a pre-built agent server image for fastest startup: +Append custom instructions to the system prompt or user messages via `AgentContext`: -```python icon="python" focus={2} -with ApptainerWorkspace( - server_image="ghcr.io/openhands/agent-server:main-python", - host_port=8010, -) as workspace: - # Your code here +```python icon="python" +agent_context = AgentContext( + system_message_suffix=""" + +Repository: my-project +Branch: feature/new-api + + """.strip(), + user_message_suffix="Remember to explain your reasoning." +) ``` -### Option 2: Build from Base Image +- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) +- **`user_message_suffix`**: Appended to each user message -Build from a base image when you need custom dependencies: +### Replacing the Entire System Prompt -```python icon="python" focus={2} -with ApptainerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=8010, -) as workspace: - # Your code here -``` +For complete control, provide a custom Jinja2 template via the `Agent` class: - -Building from a base image requires internet access and may take several minutes on first run. The built image is cached for subsequent runs. - +```python icon="python" focus={6} +from openhands.sdk import Agent -### Option 3: Use Existing SIF File +agent = Agent( + llm=llm, + tools=tools, + system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path + system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} +) +``` -If you have a pre-built Apptainer SIF file: +**Custom template example** (`custom_system_prompt.j2`): -```python icon="python" focus={2} -with ApptainerWorkspace( - sif_file="/path/to/your/agent-server.sif", - host_port=8010, -) as workspace: - # Your code here -``` +```jinja2 +You are a helpful coding assistant for {{ repo_name }}. -## Key Features +{% if cli_mode %} +You are running in CLI mode. Keep responses concise. +{% endif %} -### Rootless Container Execution +Follow these guidelines: +- Write clean, well-documented code +- Consider edge cases and error handling +- Suggest tests when appropriate +``` -Apptainer runs completely without root privileges: -- No daemon process required -- User namespace isolation -- Compatible with most HPC security policies +**Key points:** +- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory +- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location +- Pass variables to the template via `system_prompt_kwargs` +- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt -### Image Caching +## Next Steps -Apptainer automatically caches container images: -- First run builds/pulls the image -- Subsequent runs reuse cached SIF files -- Cache location: `~/.cache/apptainer/` +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval -### Port Mapping +## OpenHands CLI -The workspace exposes ports for agent services: -```python icon="python" focus={1, 3} -with ApptainerWorkspace( - server_image="ghcr.io/openhands/agent-server:main-python", - host_port=8010, # Maps to container port 8010 -) as workspace: - # Access agent server at http://localhost:8010 -``` +### OpenHands Cloud +Source: https://docs.openhands.dev/openhands/usage/cli/cloud.md -## Differences from Docker +## Overview -While the API is similar to DockerWorkspace, there are some differences: +The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: -| Feature | Docker | Apptainer | -|---------|--------|-----------| -| Root access required | Yes (daemon) | No | -| Installation | Requires Docker Engine | Single binary | -| Image format | OCI/Docker | SIF | -| Build speed | Fast (layers) | Slower (monolithic) | -| HPC compatibility | Limited | Excellent | -| Networking | Bridge/overlay | Host networking | +- Authenticate with your OpenHands Cloud account +- Create new cloud conversations +- Use cloud resources without the web interface -## Troubleshooting +## Authentication -### Apptainer Not Found +### Login -If you see `apptainer: command not found`: -1. Install Apptainer following the [official guide](https://apptainer.org/docs/user/main/quick_start.html) -2. Ensure it's in your PATH: `which apptainer` +Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: -### Permission Errors +```bash +openhands login +``` -Apptainer should work without root. If you see permission errors: -- Check that your user has access to `/tmp` -- Verify Apptainer is properly installed: `apptainer version` -- Ensure the cache directory is writable: `ls -la ~/.cache/apptainer/` +This opens a browser window for authentication. After successful login, your credentials are stored locally. -## Next Steps +#### Custom Server URL -- **[Docker Sandbox](/sdk/guides/agent-server/docker-sandbox)** - Alternative container runtime -- **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing -- **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution +For self-hosted or enterprise deployments: -### OpenHands Cloud Workspace -Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md +```bash +openhands login --server-url https://your-openhands-server.com +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +You can also set the server URL via environment variable: -The `OpenHandsCloudWorkspace` demonstrates how to use the [OpenHands Cloud](https://app.all-hands.dev) to provision and manage sandboxed environments for agent execution. This provides a seamless experience with automatic sandbox provisioning, monitoring, and secure execution without managing your own infrastructure. +```bash +export OPENHANDS_CLOUD_URL=https://your-openhands-server.com +openhands login +``` -## Key Concepts +### Logout -### OpenHandsCloudWorkspace +Log out from OpenHands Cloud: -The `OpenHandsCloudWorkspace` connects to OpenHands Cloud to provision sandboxes: +```bash +# Log out from all servers +openhands logout -```python icon="python" focus={1-2} -with OpenHandsCloudWorkspace( - cloud_api_url="https://app.all-hands.dev", - cloud_api_key=cloud_api_key, -) as workspace: +# Log out from a specific server +openhands logout --server-url https://app.all-hands.dev ``` -This workspace type: -- Connects to OpenHands Cloud API -- Automatically provisions sandboxed environments -- Manages sandbox lifecycle (create, poll status, delete) -- Handles all infrastructure concerns - -### Getting Your API Key +## Creating Cloud Conversations -To use OpenHands Cloud, you need an API key: +Create a new conversation in OpenHands Cloud: -1. Go to [app.all-hands.dev](https://app.all-hands.dev) -2. Sign in to your account -3. Navigate to Settings → API Keys -4. Create a new API key +```bash +# With a task +openhands cloud -t "Review the codebase and suggest improvements" -Store this key securely and use it as the `OPENHANDS_CLOUD_API_KEY` environment variable. +# From a file +openhands cloud -f task.txt +``` +### Options -### Configuration Options +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | -The `OpenHandsCloudWorkspace` supports several configuration options: +### Examples -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `cloud_api_url` | `str` | Required | OpenHands Cloud API URL | -| `cloud_api_key` | `str` | Required | API key for authentication | -| `sandbox_spec_id` | `str \| None` | `None` | Custom sandbox specification ID | -| `init_timeout` | `float` | `300.0` | Timeout for sandbox initialization (seconds) | -| `api_timeout` | `float` | `60.0` | Timeout for API requests (seconds) | -| `keep_alive` | `bool` | `False` | Keep sandbox running after cleanup | +```bash +# Create a cloud conversation with a task +openhands cloud -t "Fix the authentication bug in login.py" -### Keep Alive Mode +# Create from a task file +openhands cloud -f requirements.txt -By default, the sandbox is deleted when the workspace is closed. To keep it running: +# Use a custom server +openhands cloud --server-url https://custom.server.com -t "Add unit tests" -```python icon="python" focus={4} -workspace = OpenHandsCloudWorkspace( - cloud_api_url="https://app.all-hands.dev", - cloud_api_key=cloud_api_key, - keep_alive=True, -) +# Combine with environment variable +export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev +openhands cloud -t "Refactor the database module" ``` -This is useful for debugging or when you want to inspect the sandbox state after execution. +## Workflow -### Workspace Testing +A typical workflow with OpenHands Cloud: -You can test the workspace before running the agent: +1. **Login once**: + ```bash + openhands login + ``` -```python icon="python" focus={1-3} -result = workspace.execute_command( - "echo 'Hello from OpenHands Cloud sandbox!' && pwd" -) -logger.info(f"Command completed: {result.exit_code}, {result.stdout}") -``` +2. **Create conversations as needed**: + ```bash + openhands cloud -t "Your task here" + ``` -This verifies connectivity to the cloud sandbox and ensures the environment is ready. +3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server -## Comparison with Other Workspace Types +## Environment Variables -| Feature | OpenHandsCloudWorkspace | APIRemoteWorkspace | DockerWorkspace | -|---------|------------------------|-------------------|-----------------| -| Infrastructure | OpenHands Cloud | Runtime API | Local Docker | -| Authentication | API Key | API Key | None | -| Setup Required | None | Runtime API access | Docker installed | -| Custom Images | Via sandbox specs | Direct image specification | Direct image specification | -| Best For | Production use | Custom runtime environments | Local development | +| Variable | Description | +|----------|-------------| +| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | -## Ready-to-run Example +## Cloud vs Local - -This example is available on GitHub: [examples/02_remote_agent_server/07_convo_with_cloud_workspace.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/07_convo_with_cloud_workspace.py) - +| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | +|---------|---------------------------|---------------------| +| Compute | Cloud-hosted | Your machine | +| Persistence | Cloud storage | Local files | +| Collaboration | Share via link | Local only | +| Setup | Just login | Configure LLM & runtime | +| Cost | Subscription/usage-based | Your LLM API costs | -This example shows how to connect to OpenHands Cloud for fully managed agent execution: + +Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. + -```python icon="python" expandable examples/02_remote_agent_server/07_convo_with_cloud_workspace.py -"""Example: OpenHandsCloudWorkspace for OpenHands Cloud API. +## See Also -This example demonstrates using OpenHandsCloudWorkspace to provision a sandbox -via OpenHands Cloud (app.all-hands.dev) and run an agent conversation. +- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation +- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide +- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access -Usage: - uv run examples/02_remote_agent_server/06_convo_with_cloud_workspace.py +### Command Reference +Source: https://docs.openhands.dev/openhands/usage/cli/command-reference.md -Requirements: - - LLM_API_KEY: API key for direct LLM provider access (e.g., Anthropic API key) - - OPENHANDS_CLOUD_API_KEY: API key for OpenHands Cloud access +## Basic Usage -Note: - The LLM configuration is sent to the cloud sandbox, so you need an API key - that works directly with the LLM provider (not a local proxy). If using - Anthropic, set LLM_API_KEY to your Anthropic API key. -""" +```bash +openhands [OPTIONS] [COMMAND] +``` -import os -import time +## Global Options -from pydantic import SecretStr +| Option | Description | +|--------|-------------| +| `-v, --version` | Show version number and exit | +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | +| `--last` | Resume the most recent conversation (use with `--resume`) | +| `--exp` | Use textual-based UI (now default, kept for compatibility) | +| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | +| `--json` | Enable JSONL output (requires `--headless`) | +| `--always-approve` | Auto-approve all actions without confirmation | +| `--llm-approve` | Use LLM-based security analyzer for action approval | +| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | +| `--exit-without-confirmation` | Exit without showing confirmation dialog | -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - get_logger, -) -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import OpenHandsCloudWorkspace +## Subcommands +### serve -logger = get_logger(__name__) +Launch the OpenHands GUI server using Docker. +```bash +openhands serve [OPTIONS] +``` -api_key = os.getenv("LLM_API_KEY") -assert api_key, "LLM_API_KEY required" +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | -# Note: Don't use a local proxy URL here - the cloud sandbox needs direct access -# to the LLM provider. Use None for base_url to let LiteLLM use the default -# provider endpoint, or specify the provider's direct URL. -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL") or None, - api_key=SecretStr(api_key), -) +**Examples:** +```bash +openhands serve +openhands serve --mount-cwd +openhands serve --gpu +openhands serve --mount-cwd --gpu +``` -cloud_api_key = os.getenv("OPENHANDS_CLOUD_API_KEY") -if not cloud_api_key: - logger.error("OPENHANDS_CLOUD_API_KEY required") - exit(1) +### web -cloud_api_url = os.getenv("OPENHANDS_CLOUD_API_URL", "https://app.all-hands.dev") -logger.info(f"Using OpenHands Cloud API: {cloud_api_url}") +Launch the CLI as a web application accessible via browser. -with OpenHandsCloudWorkspace( - cloud_api_url=cloud_api_url, - cloud_api_key=cloud_api_key, -) as workspace: - agent = get_default_agent(llm=llm, cli_mode=True) - received_events: list = [] - last_event_time = {"ts": time.time()} +```bash +openhands web [OPTIONS] +``` - def event_callback(event) -> None: - received_events.append(event) - last_event_time["ts"] = time.time() +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host to bind the web server to | +| `--port` | `12000` | Port to bind the web server to | +| `--debug` | `false` | Enable debug mode | - result = workspace.execute_command( - "echo 'Hello from OpenHands Cloud sandbox!' && pwd" - ) - logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +**Examples:** +```bash +openhands web +openhands web --port 8080 +openhands web --host 127.0.0.1 --port 3000 +openhands web --debug +``` - conversation = Conversation( - agent=agent, workspace=workspace, callbacks=[event_callback] - ) - assert isinstance(conversation, RemoteConversation) +### cloud - try: - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) - conversation.run() +Create a new conversation in OpenHands Cloud. - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) +```bash +openhands cloud [OPTIONS] +``` - conversation.send_message("Great! Now delete that file.") - conversation.run() - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") - finally: - conversation.close() +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | - logger.info("✅ Conversation completed successfully.") - logger.info(f"Total {len(received_events)} events received during conversation.") +**Examples:** +```bash +openhands cloud -t "Fix the bug" +openhands cloud -f task.txt +openhands cloud --server-url https://custom.server.com -t "Task" ``` +### acp -```bash Running the Example -export LLM_API_KEY="your-llm-api-key" -export OPENHANDS_CLOUD_API_KEY="your-cloud-api-key" -# Optional: specify a custom sandbox spec -# export OPENHANDS_SANDBOX_SPEC_ID="your-sandbox-spec-id" -cd agent-sdk -uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +Start the Agent Client Protocol server for IDE integrations. + +```bash +openhands acp [OPTIONS] ``` -## Next Steps +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | -- **[API-based Sandbox](/sdk/guides/agent-server/api-sandbox)** - Connect to Runtime API service -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run locally with Docker -- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers -- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +**Examples:** +```bash +openhands acp +openhands acp --llm-approve +openhands acp --resume abc123def456 +openhands acp --resume --last +``` -### Custom Tools with Remote Agent Server -Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md +### mcp -> A ready-to-run example is available [here](#ready-to-run-example)! +Manage Model Context Protocol server configurations. +```bash +openhands mcp [OPTIONS] +``` -When using a [remote agent server](/sdk/guides/agent-server/overview), custom tools must be available in the server's Python environment. This guide shows how to build a custom base image with your tools and use `DockerDevWorkspace` to automatically build the agent server on top of it. +#### mcp add - -For standalone custom tools (without remote agent server), see the [Custom Tools guide](/sdk/guides/custom-tools). - +Add a new MCP server. -## How It Works +```bash +openhands mcp add --transport [OPTIONS] [-- args...] +``` -1. **Define custom tool** with `register_tool()` at module level -2. **Create Dockerfile** that copies tools and sets `PYTHONPATH` -3. **Build custom base image** with your tools -4. **Use `DockerDevWorkspace`** with `base_image` parameter - it builds the agent server on top -5. **Import tool module** in client before creating conversation -6. **Server imports modules** dynamically, triggering registration +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | -## Key Files +**Examples:** +```bash +openhands mcp add my-api --transport http https://api.example.com/mcp +openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com +openhands mcp add local --transport stdio python -- -m my_server +openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server +``` -### Custom Tool (`custom_tools/log_data.py`) +#### mcp list -```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py -"""Log Data Tool - Example custom tool for logging structured data to JSON. +List all configured MCP servers. -This tool demonstrates how to create a custom tool that logs structured data -to a local JSON file during agent execution. The data can be retrieved and -verified after the agent completes. -""" +```bash +openhands mcp list +``` -import json -from collections.abc import Sequence -from datetime import UTC, datetime -from enum import Enum -from pathlib import Path -from typing import Any +#### mcp get -from pydantic import Field +Get details for a specific MCP server. -from openhands.sdk import ( - Action, - ImageContent, - Observation, - TextContent, - ToolDefinition, -) -from openhands.sdk.tool import ToolExecutor, register_tool +```bash +openhands mcp get +``` +#### mcp remove -# --- Enums and Models --- +Remove an MCP server configuration. +```bash +openhands mcp remove +``` -class LogLevel(str, Enum): - """Log level for entries.""" +#### mcp enable - DEBUG = "debug" - INFO = "info" - WARNING = "warning" - ERROR = "error" +Enable an MCP server. +```bash +openhands mcp enable +``` -class LogDataAction(Action): - """Action to log structured data to a JSON file.""" +#### mcp disable - message: str = Field(description="The log message") - level: LogLevel = Field( - default=LogLevel.INFO, - description="Log level (debug, info, warning, error)", - ) - data: dict[str, Any] = Field( - default_factory=dict, - description="Additional structured data to include in the log entry", - ) +Disable an MCP server. +```bash +openhands mcp disable +``` -class LogDataObservation(Observation): - """Observation returned after logging data.""" +### login - success: bool = Field(description="Whether the data was successfully logged") - log_file: str = Field(description="Path to the log file") - entry_count: int = Field(description="Total number of entries in the log file") +Authenticate with OpenHands Cloud. - @property - def to_llm_content(self) -> Sequence[TextContent | ImageContent]: - """Convert observation to LLM content.""" - if self.success: - return [ - TextContent( - text=( - f"✅ Data logged successfully to {self.log_file}\n" - f"Total entries: {self.entry_count}" - ) - ) - ] - return [TextContent(text="❌ Failed to log data")] +```bash +openhands login [OPTIONS] +``` +| Option | Description | +|--------|-------------| +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | -# --- Executor --- +**Examples:** +```bash +openhands login +openhands login --server-url https://enterprise.openhands.dev +``` -# Default log file path -DEFAULT_LOG_FILE = "/tmp/agent_data.json" +### logout +Log out from OpenHands Cloud. -class LogDataExecutor(ToolExecutor[LogDataAction, LogDataObservation]): - """Executor that logs structured data to a JSON file.""" +```bash +openhands logout [OPTIONS] +``` - def __init__(self, log_file: str = DEFAULT_LOG_FILE): - """Initialize the log data executor. +| Option | Description | +|--------|-------------| +| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | - Args: - log_file: Path to the JSON log file - """ - self.log_file = Path(log_file) +**Examples:** +```bash +openhands logout +openhands logout --server-url https://app.all-hands.dev +``` - def __call__( - self, - action: LogDataAction, - conversation=None, # noqa: ARG002 - ) -> LogDataObservation: - """Execute the log data action. +## Interactive Commands - Args: - action: The log data action - conversation: Optional conversation context (not used) +Commands available inside the CLI (prefix with `/`): - Returns: - LogDataObservation with the result - """ - # Load existing entries or start fresh - entries: list[dict[str, Any]] = [] - if self.log_file.exists(): - try: - with open(self.log_file) as f: - entries = json.load(f) - except (json.JSONDecodeError, OSError): - entries = [] +| Command | Description | +|---------|-------------| +| `/help` | Display available commands | +| `/new` | Start a new conversation | +| `/history` | Toggle conversation history | +| `/confirm` | Configure confirmation settings | +| `/condense` | Condense conversation history | +| `/skills` | View loaded skills, hooks, and MCPs | +| `/feedback` | Send anonymous feedback about CLI | +| `/exit` | Exit the application | - # Create new entry with timestamp - entry = { - "timestamp": datetime.now(UTC).isoformat(), - "level": action.level.value, - "message": action.message, - "data": action.data, - } - entries.append(entry) +## Command Palette - # Write back to file - self.log_file.parent.mkdir(parents=True, exist_ok=True) - with open(self.log_file, "w") as f: - json.dump(entries, f, indent=2) +Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: - return LogDataObservation( - success=True, - log_file=str(self.log_file), - entry_count=len(entries), - ) +| Option | Description | +|--------|-------------| +| **History** | Toggle conversation history panel | +| **Keys** | Show keyboard shortcuts | +| **MCP** | View MCP server configurations | +| **Maximize** | Maximize/restore window | +| **Plan** | View agent plan | +| **Quit** | Quit the application | +| **Screenshot** | Take a screenshot | +| **Settings** | Configure LLM model, API keys, and other settings | +| **Theme** | Toggle color theme | +## Changing Your Model -# --- Tool Definition --- +### Via Settings UI -_LOG_DATA_DESCRIPTION = """Log structured data to a JSON file. +1. Press `Ctrl+P` to open the command palette +2. Select **Settings** +3. Choose your LLM provider and model +4. Save changes (no restart required) -Use this tool to record information, findings, or events during your work. -Each log entry includes a timestamp and can contain arbitrary structured data. +### Via Configuration File -Parameters: -* message: A descriptive message for the log entry -* level: Log level - one of 'debug', 'info', 'warning', 'error' (default: info) -* data: Optional dictionary of additional structured data to include +Edit `~/.openhands/agent_settings.json` and change the `model` field: -Example usage: -- Log a finding: message="Found potential issue", level="warning", data={"file": "app.py", "line": 42} -- Log progress: message="Completed analysis", level="info", data={"files_checked": 10} -""" # noqa: E501 +```json +{ + "llm": { + "model": "claude-sonnet-4-5-20250929", + "api_key": "...", + "base_url": "..." + } +} +``` +### Via Environment Variables -class LogDataTool(ToolDefinition[LogDataAction, LogDataObservation]): - """Tool for logging structured data to a JSON file.""" +Temporarily override your model without changing saved configuration: - @classmethod - def create(cls, conv_state, **params) -> Sequence[ToolDefinition]: # noqa: ARG003 - """Create LogDataTool instance. +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-api-key" +openhands --override-with-envs +``` - Args: - conv_state: Conversation state (not used in this example) - **params: Additional parameters: - - log_file: Path to the JSON log file (default: /tmp/agent_data.json) +Changes made with `--override-with-envs` are not persisted. - Returns: - A sequence containing a single LogDataTool instance - """ - log_file = params.get("log_file", DEFAULT_LOG_FILE) - executor = LogDataExecutor(log_file=log_file) +## Environment Variables - return [ - cls( - description=_LOG_DATA_DESCRIPTION, - action_type=LogDataAction, - observation_type=LogDataObservation, - executor=executor, - ) - ] +| Variable | Description | +|----------|-------------| +| `LLM_API_KEY` | API key for your LLM provider | +| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | +| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | +| `OPENHANDS_CLOUD_URL` | Default cloud server URL | +| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | +## Exit Codes -# Auto-register the tool when this module is imported -# This is what enables dynamic tool registration in the remote agent server -register_tool("LogDataTool", LogDataTool) -``` +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | Error or task failed | +| `2` | Invalid arguments | -### Dockerfile +## Configuration Files -```dockerfile icon="docker" -FROM nikolaik/python-nodejs:python3.12-nodejs22 +| File | Purpose | +|------|---------| +| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | +| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | +| `~/.openhands/mcp.json` | MCP server configurations | +| `~/.openhands/conversations/` | Conversation history | -COPY custom_tools /app/custom_tools -ENV PYTHONPATH="/app:${PYTHONPATH}" -``` +## See Also -## Troubleshooting +- [Installation](/openhands/usage/cli/installation) - Install the CLI +- [Quick Start](/openhands/usage/cli/quick-start) - Get started +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers -| Issue | Solution | -|-------|----------| -| Tool not found | Ensure `register_tool()` is called at module level, import tool before creating conversation | -| Import errors on server | Check `PYTHONPATH` in Dockerfile, verify all dependencies installed | -| Build failures | Verify file paths in `COPY` commands, ensure Python 3.12+ | +### Critic (Experimental) +Source: https://docs.openhands.dev/openhands/usage/cli/critic.md -**Binary Mode Limitation**: Custom tools only work with **source mode** deployments. When using `DockerDevWorkspace`, set `target="source"` (the default). See [GitHub issue #1531](https://github.com/OpenHands/software-agent-sdk/issues/1531) for details. +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. -## Ready-to-run Example +## Overview - -This example is available on GitHub: [examples/02_remote_agent_server/06_custom_tool/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/02_remote_agent_server/06_custom_tool) - +If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. -```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tool_example.py -"""Example: Using custom tools with remote agent server. +For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). -This example demonstrates how to use custom tools with a remote agent server -by building a custom base image that includes the tool implementation. -Prerequisites: - 1. Build the custom base image first: - cd examples/02_remote_agent_server/05_custom_tool - ./build_custom_image.sh +## What is the Critic? - 2. Set LLM_API_KEY environment variable +The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: -The workflow is: -1. Define a custom tool (LogDataTool for logging structured data to JSON) -2. Create a simple Dockerfile that copies the tool into the base image -3. Build the custom base image -4. Use DockerDevWorkspace with base_image pointing to the custom image -5. DockerDevWorkspace builds the agent server on top of the custom base image -6. The server dynamically registers tools when the client creates a conversation -7. The agent can use the custom tool during execution -8. Verify the logged data by reading the JSON file from the workspace +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion -This pattern is useful for: -- Collecting structured data during agent runs (logs, metrics, events) -- Implementing custom integrations with external systems -- Adding domain-specific operations to the agent -""" + -import os -import platform -import subprocess -import sys -import time -from pathlib import Path +![Critic output in CLI](./screenshots/critic-cli-output.png) -from pydantic import SecretStr +## Pricing -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - Tool, - get_logger, -) -from openhands.workspace import DockerDevWorkspace +The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. + +## Disabling the Critic +If you prefer not to use the critic feature, you can disable it in your settings: -logger = get_logger(__name__) +1. Open the command palette with `Ctrl+P` +2. Select **Settings** +3. Navigate to the **CLI Settings** tab +4. Toggle off **Enable Critic (Experimental)** -# 1) Ensure we have LLM API key -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +![Critic settings in CLI](./screenshots/critic-cli-settings.png) -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) +### GUI Server +Source: https://docs.openhands.dev/openhands/usage/cli/gui-server.md +## Overview -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" +The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. +```bash +openhands serve +``` -# Get the directory containing this script -example_dir = Path(__file__).parent.absolute() + +This requires Docker to be installed and running on your system. + -# Custom base image tag (contains custom tools, agent server built on top) -CUSTOM_BASE_IMAGE_TAG = "custom-base-image:latest" +## Prerequisites -# 2) Check if custom base image exists, build if not -logger.info(f"🔍 Checking for custom base image: {CUSTOM_BASE_IMAGE_TAG}") -result = subprocess.run( - ["docker", "images", "-q", CUSTOM_BASE_IMAGE_TAG], - capture_output=True, - text=True, - check=False, -) +- [Docker](https://docs.docker.com/get-docker/) installed and running +- Sufficient disk space for Docker images (~2GB) -if not result.stdout.strip(): - logger.info("⚠️ Custom base image not found. Building...") - logger.info("📦 Building custom base image with custom tools...") - build_script = example_dir / "build_custom_image.sh" - try: - subprocess.run( - [str(build_script), CUSTOM_BASE_IMAGE_TAG], - cwd=str(example_dir), - check=True, - ) - logger.info("✅ Custom base image built successfully!") - except subprocess.CalledProcessError as e: - logger.error(f"❌ Failed to build custom base image: {e}") - logger.error("Please run ./build_custom_image.sh manually and fix any errors.") - sys.exit(1) -else: - logger.info(f"✅ Custom base image found: {CUSTOM_BASE_IMAGE_TAG}") +## Basic Usage -# 3) Create a DockerDevWorkspace with the custom base image -# DockerDevWorkspace will build the agent server on top of this base image -logger.info("🚀 Building and starting agent server with custom tools...") -logger.info("📦 This may take a few minutes on first run...") +```bash +# Launch the GUI server +openhands serve -with DockerDevWorkspace( - base_image=CUSTOM_BASE_IMAGE_TAG, - host_port=8011, - platform=detect_platform(), - target="source", # NOTE: "binary" target does not work with custom tools -) as workspace: - logger.info("✅ Custom agent server started!") +# The server will be available at http://localhost:3000 +``` - # 4) Import custom tools to register them in the client's registry - # This allows the client to send the module qualname to the server - # The server will then import the same module and execute the tool - import custom_tools.log_data # noqa: F401 +The command will: +1. Check Docker requirements +2. Pull the required Docker images +3. Start the OpenHands GUI server +4. Display the URL to access the interface - # 5) Create agent with custom tools - # Note: We specify the tool here, but it's actually executed on the server - # Get default tools and add our custom tool - from openhands.sdk import Agent - from openhands.tools.preset.default import get_default_condenser, get_default_tools +## Options - tools = get_default_tools(enable_browser=False) - # Add our custom tool! - tools.append(Tool(name="LogDataTool")) +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | - agent = Agent( - llm=llm, - tools=tools, - system_prompt_kwargs={"cli_mode": True}, - condenser=get_default_condenser( - llm=llm.model_copy(update={"usage_id": "condenser"}) - ), - ) +## Mounting Your Workspace - # 6) Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} +To give OpenHands access to your local files: - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() +```bash +# Mount current directory +openhands serve --mount-cwd +``` - # 7) Test the workspace with a simple command - result = workspace.execute_command( - "echo 'Custom agent server ready!' && python --version" - ) - logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" - ) - logger.info(f"Output: {result.stdout}") +This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. - # 8) Create conversation with the custom agent - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) + +Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. + - try: - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") +## GPU Support - logger.info("📝 Sending task to analyze files and log findings...") - conversation.send_message( - "Please analyze the Python files in the current directory. " - "Use the LogDataTool to log your findings as you work. " - "For example:\n" - "- Log when you start analyzing a file (level: info)\n" - "- Log any interesting patterns you find (level: info)\n" - "- Log any potential issues (level: warning)\n" - "- Include relevant data like file names, line numbers, etc.\n\n" - "Make at least 3 log entries using the LogDataTool." - ) - logger.info("🚀 Running conversation...") - conversation.run() - logger.info("✅ Task completed!") - logger.info(f"Agent status: {conversation.state.execution_status}") +For tasks that benefit from GPU acceleration: - # Wait for events to settle (no events for 2 seconds) - logger.info("⏳ Waiting for events to stop...") - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) - logger.info("✅ Events have stopped") +```bash +openhands serve --gpu +``` - # 9) Read the logged data from the JSON file using file_download API - logger.info("\n📊 Logged Data Summary:") - logger.info("=" * 80) +This requires: +- NVIDIA GPU +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed +- Docker configured for GPU support - # Download the log file from the workspace using the file download API - import json - import tempfile +## Examples - with tempfile.NamedTemporaryFile( - mode="w", suffix=".json", delete=False - ) as tmp_file: - local_path = tmp_file.name +```bash +# Basic GUI server +openhands serve - download_result = workspace.file_download( - source_path="/tmp/agent_data.json", - destination_path=local_path, - ) +# Mount current project and enable GPU +cd /path/to/your/project +openhands serve --mount-cwd --gpu +``` - if download_result.success: - try: - with open(local_path) as f: - log_entries = json.load(f) - logger.info(f"Found {len(log_entries)} log entries:\n") - for i, entry in enumerate(log_entries, 1): - logger.info(f"Entry {i}:") - logger.info(f" Timestamp: {entry.get('timestamp', 'N/A')}") - logger.info(f" Level: {entry.get('level', 'N/A')}") - logger.info(f" Message: {entry.get('message', 'N/A')}") - if entry.get("data"): - logger.info(f" Data: {json.dumps(entry['data'], indent=4)}") - logger.info("") - except json.JSONDecodeError: - logger.info("Log file exists but couldn't parse JSON") - with open(local_path) as f: - logger.info(f"Raw content: {f.read()}") - finally: - # Clean up the temporary file - Path(local_path).unlink(missing_ok=True) - else: - logger.info("No log file found (agent may not have used the tool)") - if download_result.error: - logger.debug(f"Download error: {download_result.error}") +## How It Works - logger.info("=" * 80) +The `openhands serve` command: - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"\nEXAMPLE_COST: {cost}") +1. **Pulls Docker images**: Downloads the OpenHands runtime and application images +2. **Starts containers**: Runs the OpenHands server in a Docker container +3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` +4. **Shares settings**: Uses your `~/.openhands` directory for configuration - finally: - logger.info("\n🧹 Cleaning up conversation...") - conversation.close() +## Stopping the Server -logger.info("\n✅ Example completed successfully!") -logger.info("\nThis example demonstrated how to:") -logger.info("1. Create a custom tool that logs structured data to JSON") -logger.info("2. Build a simple base image with the custom tool") -logger.info("3. Use DockerDevWorkspace with base_image to build agent server on top") -logger.info("4. Enable dynamic tool registration on the server") -logger.info("5. Use the custom tool during agent execution") -logger.info("6. Read the logged data back from the workspace") -``` +Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. -```bash Running the Example -# Build the custom base image first -cd examples/02_remote_agent_server/06_custom_tool -./build_custom_image.sh +## Comparison: GUI Server vs Web Interface -# Run the example -export LLM_API_KEY="your-api-key" -uv run python custom_tool_example.py -``` +| Feature | `openhands serve` | `openhands web` | +|---------|-------------------|-----------------| +| Interface | Full web GUI | Terminal UI in browser | +| Dependencies | Docker required | None | +| Resources | Full container (~2GB) | Lightweight | +| Features | All GUI features | CLI features only | +| Best for | Rich GUI experience | Quick terminal access | +## Troubleshooting -## Next Steps +### Docker Not Running -- **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server -- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers +``` +❌ Docker daemon is not running. +Please start Docker and try again. +``` -### Docker Sandbox -Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md +**Solution**: Start Docker Desktop or the Docker daemon. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Permission Denied -The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using `DockerWorkspace`. +``` +Got permission denied while trying to connect to the Docker daemon socket +``` -This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. +**Solution**: Add your user to the docker group: +```bash +sudo usermod -aG docker $USER +# Then log out and back in +``` -Use `DockerWorkspace` with a pre-built agent server image for the fastest startup. When you need to build your own image from a base image, switch to `DockerDevWorkspace`. +### Port Already in Use -the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. +If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. -## 1) Basic Docker Sandbox +## See Also -> A ready-to-run example is available [here](#ready-to-run-example-docker-sandbox)! +- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide +- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access +- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details -### Key Concepts +### Headless Mode +Source: https://docs.openhands.dev/openhands/usage/cli/headless.md -#### DockerWorkspace Context Manager +## Overview -The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: +Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: +- CI/CD pipelines +- Automated scripting +- Integration with other tools +- Batch processing -```python icon="python" -with DockerWorkspace( - # use pre-built image for faster startup (recommended) - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, - platform=detect_platform(), -) as workspace: - # Container is running here - # Work with the workspace - pass -# Container is automatically stopped and cleaned up here +```bash +openhands --headless -t "Your task here" ``` -The workspace automatically: -- Pulls or builds the Docker image -- Starts the container with an agent server -- Waits for the server to be ready -- Cleans up the container when done +## Requirements -#### Platform Detection +- Must specify a task with `--task` or `--file` -The example includes platform detection to ensure the correct Docker image is built and used: + +**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. + -```python icon="python" -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" -``` +## Basic Usage -This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). +```bash +# Run a task in headless mode +openhands --headless -t "Write a Python script that prints hello world" +# Load task from a file +openhands --headless -f task.txt +``` -#### Testing the Workspace +## JSON Output Mode -Before creating a conversation, the example tests the workspace connection: +The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: -```python icon="python" -result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" -) -logger.info( - f"Command '{result.command}' completed" - f"with exit code {result.exit_code}" -) -logger.info(f"Output: {result.stdout}") +```bash +openhands --headless --json -t "Create a simple Flask app" ``` -This verifies the workspace is properly initialized and can execute commands. - -#### Automatic RemoteConversation - -When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: +Each line is a JSON object representing an agent event: -```python icon="python" focus={1, 3, 7} -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, -) -assert isinstance(conversation, RemoteConversation) +```json +{"type": "action", "action": "write", "path": "app.py", ...} +{"type": "observation", "content": "File created successfully", ...} +{"type": "action", "action": "run", "command": "python app.py", ...} ``` -The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. +### Use Cases for JSON Output +- **CI/CD pipelines**: Parse events to determine success/failure +- **Automated processing**: Feed output to other tools +- **Logging**: Capture structured logs for analysis +- **Integration**: Connect OpenHands with other systems -#### DockerWorkspace vs DockerDevWorkspace +### Example: Capture Output to File -Use `DockerWorkspace` when you can rely on the official pre-built images for the agent server. Switch to `DockerDevWorkspace` when you need to build or customize the image on-demand (slower startup, requires the SDK source tree and Docker build support). +```bash +openhands --headless --json -t "Add unit tests" > output.jsonl +``` -```python icon="python" -# ✅ Fast: Use pre-built image (recommended) -DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, -) +## See Also -# 🛠️ Custom: Build on the fly (requires SDK tooling) -DockerDevWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=8010, - target="source", -) -``` +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options -### Ready-tu-run Example Docker Sandbox - -This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) - +### JetBrains IDEs +Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md -This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: +[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. -```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py -import os -import platform -import time +## Supported IDEs -from pydantic import SecretStr +This guide applies to all JetBrains IDEs: -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - get_logger, -) -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace +- IntelliJ IDEA +- PyCharm +- WebStorm +- GoLand +- Rider +- CLion +- PhpStorm +- RubyMine +- DataGrip +- And other JetBrains IDEs +## Prerequisites -logger = get_logger(__name__) +Before configuring JetBrains IDEs: -# 1) Ensure we have LLM API key -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **JetBrains IDE version 25.3 or later** +4. **JetBrains AI Assistant enabled** in your IDE -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) + +JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. + +## Configuration -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" +### Step 1: Create the ACP Configuration File +Create or edit the file `$HOME/.jetbrains/acp.json`: -def get_server_image(): - """Get the server image tag, using PR-specific image in CI.""" - platform_str = detect_platform() - arch = "arm64" if "arm64" in platform_str else "amd64" - # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency - # Otherwise, use the latest image from main - github_sha = os.getenv("GITHUB_SHA") - if github_sha: - return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" - return "ghcr.io/openhands/agent-server:latest-python" + + + ```bash + mkdir -p ~/.jetbrains + nano ~/.jetbrains/acp.json + ``` + + + Create the file at `C:\Users\\.jetbrains\acp.json` + + +### Step 2: Add the Configuration -# 2) Create a Docker-based remote workspace that will set up and manage -# the Docker container automatically. Use `DockerWorkspace` with a pre-built -# image or `DockerDevWorkspace` to automatically build the image on-demand. -# with DockerDevWorkspace( -# # dynamically build agent-server image -# base_image="nikolaik/python-nodejs:python3.13-nodejs22", -# host_port=8010, -# platform=detect_platform(), -# ) as workspace: -server_image = get_server_image() -logger.info(f"Using server image: {server_image}") -with DockerWorkspace( - # use pre-built image for faster startup - server_image=server_image, - host_port=8010, - platform=detect_platform(), -) as workspace: - # 3) Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) +Add the following JSON: - # 4) Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + } + } +} +``` - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() +### Step 3: Use OpenHands in Your IDE - # 5) Test the workspace with a simple command - result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" - ) - logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" - ) - logger.info(f"Output: {result.stdout}") - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) +Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. - try: - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") +## Advanced Configuration - logger.info("📝 Sending first message...") - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) - logger.info("🚀 Running conversation...") - conversation.run() - logger.info("✅ First task completed!") - logger.info(f"Agent status: {conversation.state.execution_status}") +### LLM-Approve Mode - # Wait for events to settle (no events for 2 seconds) - logger.info("⏳ Waiting for events to stop...") - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) - logger.info("✅ Events have stopped") +For automatic LLM-based approval: - logger.info("🚀 Running conversation again...") - conversation.send_message("Great! Now delete that file.") - conversation.run() - logger.info("✅ Second task completed!") +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--llm-approve"], + "env": {} + } + } +} +``` - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") - finally: - print("\n🧹 Cleaning up conversation...") - conversation.close() +### Auto-Approve Mode + +For automatic approval of all actions (use with caution): + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + } + } +} ``` - +### Resume a Conversation +Resume a specific conversation: ---- +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "openhands", + "args": ["acp", "--resume", "abc123def456"], + "env": {} + } + } +} +``` -## 2) VS Code in Docker Sandbox +Resume the latest conversation: -> A ready-to-run example is available [here](#ready-to-run-example-vs-code)! +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` -VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. +### Multiple Configurations -### Key Concepts +Add multiple configurations for different use cases: -#### VS Code-Enabled DockerWorkspace +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` -The workspace is configured with extra ports for VS Code access: +### Environment Variables -```python icon="python" focus={1, 5} -with DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=18010, - platform="linux/arm64", # or "linux/amd64" depending on your architecture - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to access VSCode at localhost:18011""" +Pass environment variables to the agent: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": { + "LLM_API_KEY": "your-api-key" + } + } + } +} ``` -The `extra_ports=True` setting exposes: -- Port `host_port+1`: VS Code Web interface (host_port + 1) -- Port `host_port+2`: VNC viewer for visual access +## Troubleshooting -If you need to customize the agent-server image, swap in `DockerDevWorkspace` with the same parameters and provide `base_image`/`target` to build on demand. +### "Agent not found" or "Command failed" -#### VS Code URL Generation +1. Verify OpenHands CLI is installed: + ```bash + openhands --version + ``` -The example retrieves the VS Code URL with authentication token: +2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) -```python icon="python" -# Get VSCode URL with token -vscode_port = (workspace.host_port or 8010) + 1 -try: - response = httpx.get( - f"{workspace.host}/api/vscode/url", - params={"workspace_dir": workspace.working_dir}, - ) - vscode_data = response.json() - vscode_url = vscode_data.get("url", "").replace( - "localhost:8001", f"localhost:{vscode_port}" - ) -except Exception: - # Fallback if server route not available - folder = ( - f"/{workspace.working_dir}" - if not str(workspace.working_dir).startswith("/") - else str(workspace.working_dir) - ) - vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" -``` +### "AI Assistant not available" -This generates a properly authenticated URL with the workspace directory pre-opened. +1. Ensure you have JetBrains IDE version 25.3 or later +2. Enable AI Assistant: `Settings > Plugins > AI Assistant` +3. Restart the IDE after enabling -#### VS Code URL Format +### Agent doesn't respond -```text -http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} -``` -where: -- `vscode_port`: Usually host_port + 1 (e.g., 8011) -- `token`: Authentication token for security -- `workspace_dir`: Workspace directory to open +1. Check your LLM settings: + ```bash + openhands + # Use /settings to configure + ``` -### Ready-to-run Example VS Code +2. Test ACP mode in terminal: + ```bash + openhands acp + # Should start without errors + ``` - -This example is available on GitHub: [examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py) - +### Configuration not applied +1. Verify the config file location: `~/.jetbrains/acp.json` +2. Validate JSON syntax (no trailing commas, proper quotes) +3. Restart your JetBrains IDE -```python icon="python" expandable examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py -import os -import platform -import time +### Finding Your Conversation ID -import httpx -from pydantic import SecretStr +To resume conversations, first find the ID: -from openhands.sdk import LLM, Conversation, get_logger -from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace +```bash +openhands --resume +``` +This displays recent conversations with their IDs: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py +-------------------------------------------------------------------------------- +``` -logger = get_logger(__name__) +## See Also -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) +### IDE Integration Overview +Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview.md + +IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). + -# Create a Docker-based remote workspace with extra ports for VSCode access -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" + +**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. + +## What is the Agent Client Protocol (ACP)? -def get_server_image(): - """Get the server image tag, using PR-specific image in CI.""" - platform_str = detect_platform() - arch = "arm64" if "arm64" in platform_str else "amd64" - # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency - # Otherwise, use the latest image from main - github_sha = os.getenv("GITHUB_SHA") - if github_sha: - return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" - return "ghcr.io/openhands/agent-server:latest-python" +The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. +## Supported IDEs -server_image = get_server_image() -logger.info(f"Using server image: {server_image}") -with DockerWorkspace( - server_image=server_image, - host_port=18010, - platform=detect_platform(), - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to access VSCode at localhost:18011""" +| IDE | Support Level | Setup Guide | +|-----|---------------|-------------| +| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | +| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | +| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | +| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | - # Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) +## Prerequisites - # Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} +Before using OpenHands with any IDE, you must: - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() +1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) - # Create RemoteConversation using the workspace - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) +2. **Configure your LLM settings** using the `/settings` command: + ```bash + openhands + # Then use /settings to configure + ``` - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") - logger.info("📝 Sending first message...") - conversation.send_message("Create a simple Python script that prints Hello World") - conversation.run() +The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. - # Get VSCode URL with token - vscode_port = (workspace.host_port or 8010) + 1 - try: - response = httpx.get( - f"{workspace.host}/api/vscode/url", - params={"workspace_dir": workspace.working_dir}, - ) - vscode_data = response.json() - vscode_url = vscode_data.get("url", "").replace( - "localhost:8001", f"localhost:{vscode_port}" - ) - except Exception: - # Fallback if server route not available - folder = ( - f"/{workspace.working_dir}" - if not str(workspace.working_dir).startswith("/") - else str(workspace.working_dir) - ) - vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +## How It Works - # Wait for user to explore VSCode - y = None - while y != "y": - y = input( - "\n" - "Because you've enabled extra_ports=True in DockerDevWorkspace, " - "you can open VSCode Web to see the workspace.\n\n" - f"VSCode URL: {vscode_url}\n\n" - "The VSCode should have the OpenHands settings extension installed:\n" - " - Dark theme enabled\n" - " - Auto-save enabled\n" - " - Telemetry disabled\n" - " - Auto-updates disabled\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) +```mermaid +graph LR + IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] + CLI -->|API Calls| LLM[LLM Provider] + CLI -->|Commands| Runtime[Sandbox Runtime] ``` - +1. Your IDE launches `openhands acp` as a subprocess +2. Communication happens via JSON-RPC 2.0 over stdio +3. OpenHands uses your configured LLM and runtime settings +4. Results are displayed in your IDE's interface ---- - -## 3) Browser in Docker Sandbox -> A ready-to-run example is available [here](#ready-to-run-example-browser)! +## The ACP Command -Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. +The `openhands acp` command starts OpenHands as an ACP server: -### Key Concepts +```bash +# Basic ACP server +openhands acp -#### Browser-Enabled DockerWorkspace +# With LLM-based approval +openhands acp --llm-approve -The workspace is configured with extra ports for browser access: +# Resume a conversation +openhands acp --resume -```python icon="python" focus={1-5} -with DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, - platform=detect_platform(), - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to check localhost:8012 for VNC""" +# Resume the latest conversation +openhands acp --resume --last ``` -The `extra_ports=True` setting exposes additional ports for: -- Port `host_port+1`: VS Code Web interface -- Port `host_port+2`: VNC viewer for browser visualization +### ACP Options -If you need to pre-build a custom browser image, replace `DockerWorkspace` with `DockerDevWorkspace` and provide `base_image`/`target` to build before launch. +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | + +## Confirmation Modes +OpenHands ACP supports three confirmation modes to control how agent actions are approved: -#### Enabling Browser Tools +### Always Ask (Default) -Browser tools are enabled by setting `cli_mode=False`: +The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. -```python icon="python" focus={2, 4} -# Create agent with browser tools enabled -agent = get_default_agent( - llm=llm, - cli_mode=False, # CLI mode = False will enable browser tools -) +```bash +openhands acp # defaults to always-ask mode ``` -When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. +### Always Approve -When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: - +The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. -#### VNC Access +```bash +openhands acp --always-approve +``` -The VNC interface provides real-time visual access to the browser: +### LLM-Based Approval -```text -http://localhost:8012/vnc.html?autoconnect=1&resize=remote +The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. + +```bash +openhands acp --llm-approve ``` -- `autoconnect=1`: Automatically connect to VNC server -- `resize=remote`: Automatically adjust resolution +### Changing Modes During a Session ---- +You can change the confirmation mode during an active session using slash commands: -### Ready-to-run Example Browser +| Command | Description | +|---------|-------------| +| `/confirm always-ask` | Switch to always-ask mode | +| `/confirm always-approve` | Switch to always-approve mode | +| `/confirm llm-approve` | Switch to LLM-based approval mode | +| `/help` | Show all available slash commands | -This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) +The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. -This example shows how to configure `DockerWorkspace` with browser capabilities and VNC access: - -```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py -import os -import platform -import time +## Choosing an IDE -from pydantic import SecretStr + + + High-performance editor with native ACP support. Best for speed and simplicity. + + + Universal terminal interface. Works with any terminal, consistent experience. + + + Popular editor with community extension. Great for VS Code users. + + + IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. + + -from openhands.sdk import LLM, Conversation, get_logger -from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace +## Resuming Conversations in IDEs +You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: -logger = get_logger(__name__) +```bash +openhands --resume +``` -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +This shows your recent conversations: -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service +-------------------------------------------------------------------------------- +``` -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" +Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. +## See Also -def get_server_image(): - """Get the server image tag, using PR-specific image in CI.""" - platform_str = detect_platform() - arch = "arm64" if "arm64" in platform_str else "amd64" - # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency - # Otherwise, use the latest image from main - github_sha = os.getenv("GITHUB_SHA") - if github_sha: - return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" - return "ghcr.io/openhands/agent-server:latest-python" +- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide +### Toad Terminal +Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad.md -# Create a Docker-based remote workspace with extra ports for browser access. -# Use `DockerWorkspace` with a pre-built image or `DockerDevWorkspace` to -# automatically build the image on-demand. -# with DockerDevWorkspace( -# # dynamically build agent-server image -# base_image="nikolaik/python-nodejs:python3.13-nodejs22", -# host_port=8010, -# platform=detect_platform(), -# ) as workspace: -server_image = get_server_image() -logger.info(f"Using server image: {server_image}") -with DockerWorkspace( - server_image=server_image, - host_port=8011, - platform=detect_platform(), - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to check localhost:8012 for VNC""" +[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). - # Create agent with browser tools enabled - agent = get_default_agent( - llm=llm, - cli_mode=False, # CLI mode = False will enable browser tools - ) +The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. - # Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} +![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() +## Why Toad? - # Create RemoteConversation using the workspace - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) +Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") - logger.info("📝 Sending first message...") - conversation.send_message( - "Could you go to https://openhands.dev/ blog page and summarize main " - "points of the latest blog?" - ) - conversation.run() +- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything +- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs +- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") +OpenHands is included as a recommended agent in Toad's agent store. - if os.getenv("CI"): - logger.info( - "CI environment detected; skipping interactive prompt and closing workspace." # noqa: E501 - ) - else: - # Wait for user confirm to exit when running locally - y = None - while y != "y": - y = input( - "Because you've enabled extra_ports=True in DockerDevWorkspace, " - "you can open a browser tab to see the *actual* browser OpenHands " - "is interacting with via VNC.\n\n" - "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` +## Prerequisites - +Before using Toad with OpenHands: -## Next Steps +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` -- **[Local Agent Server](/sdk/guides/agent-server/local-server)** -- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service -- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture +## Installation -### Local Agent Server -Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server.md +Install Toad using [uv](https://docs.astral.sh/uv/): -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +```bash +uvx batrachian-toad +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). -The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using `RemoteConversation`. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. +## Setup -## Key Concepts +### Using the Agent Store -### Managed API Server +The easiest way to set up OpenHands with Toad: -The ready-to-run example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: +1. Launch Toad: `uvx batrachian-toad` +2. Open Toad's agent store +3. Find **OpenHands** in the list of recommended agents +4. Click **Install** to set up OpenHands +5. Select OpenHands and start a conversation -```python icon="python" focus={1, 2, 4, 5} -class ManagedAPIServer: - """Context manager for subprocess-managed OpenHands API server.""" - - def __enter__(self): - """Start the API server subprocess.""" - self.process = subprocess.Popen( - [ - "python", - "-m", - "openhands.agent_server", - "--port", - str(self.port), - "--host", - self.host, - ], - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - text=True, - env={"LOG_JSON": "true", **os.environ}, - ) +The install process runs: +```bash +uv tool install openhands --python 3.12 && openhands login ``` -The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. - -### Remote Workspace +### Manual Configuration -When connecting to a remote server, you need to provide a `Workspace` that connects to that server: +You can also launch Toad directly with OpenHands: -```python icon="python" -workspace = Workspace(host=server.base_url) -result = workspace.execute_command("pwd") +```bash +toad acp "openhands acp" ``` -When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). -The `Workspace` object communicates with the remote server's API to execute commands and manage files. - -### RemoteConversation +## Usage -When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): +### Basic Usage -```python icon="python" focus={1, 3, 7} -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, -) -assert isinstance(conversation, RemoteConversation) +```bash +# Launch Toad with OpenHands +toad acp "openhands acp" ``` -`RemoteConversation` handles communication with the remote agent server over WebSocket for real-time event streaming. +### With Command Line Arguments -### Event Callbacks +Pass OpenHands CLI flags through Toad: -Callbacks receive events in real-time as they happen on the remote server: +```bash +# Use LLM-based approval mode +toad acp "openhands acp --llm-approve" -```python icon="python" -def event_callback(event): - """Callback to capture events for testing.""" - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - event_tracker["last_event_time"] = time.time() +# Auto-approve all actions +toad acp "openhands acp --always-approve" ``` -This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. - -### Conversation State - -The conversation state provides access to all events and status: +### Resume a Conversation -```python icon="python" -# Count total events using state.events -total_events = len(conversation.state.events) -logger.info(f"📈 Total events in conversation: {total_events}") +Resume a specific conversation by ID: -# Get recent events (last 5) using state.events -all_events = conversation.state.events -recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +```bash +toad acp "openhands acp --resume abc123def456" ``` -This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. - -## Ready-to-run Example +Resume the most recent conversation: - -This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) - +```bash +toad acp "openhands acp --resume --last" +``` -This example shows how to programmatically start a local agent server and interact with it through a `RemoteConversation`: + +Find your conversation IDs by running `openhands --resume` in a regular terminal. + -```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py -import os -import subprocess -import sys -import threading -import time +## Advanced Configuration -from pydantic import SecretStr +### Combined Options -from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger -from openhands.sdk.event import ConversationStateUpdateEvent -from openhands.tools.preset.default import get_default_agent +```bash +# Resume with LLM approval +toad acp "openhands acp --resume --last --llm-approve" +``` +### Environment Variables -logger = get_logger(__name__) +Pass environment variables to OpenHands: +```bash +LLM_API_KEY=your-key toad acp "openhands acp" +``` -def _stream_output(stream, prefix, target_stream): - """Stream output from subprocess to target stream with prefix.""" - try: - for line in iter(stream.readline, ""): - if line: - target_stream.write(f"[{prefix}] {line}") - target_stream.flush() - except Exception as e: - print(f"Error streaming {prefix}: {e}", file=sys.stderr) - finally: - stream.close() +## Troubleshooting +### "openhands" command not found -class ManagedAPIServer: - """Context manager for subprocess-managed OpenHands API server.""" +Ensure OpenHands is installed: +```bash +uv tool install openhands --python 3.12 +``` - def __init__(self, port: int = 8000, host: str = "127.0.0.1"): - self.port: int = port - self.host: str = host - self.process: subprocess.Popen[str] | None = None - self.base_url: str = f"http://{host}:{port}" - self.stdout_thread: threading.Thread | None = None - self.stderr_thread: threading.Thread | None = None +Verify it's in your PATH: +```bash +which openhands +``` - def __enter__(self): - """Start the API server subprocess.""" - print(f"Starting OpenHands API server on {self.base_url}...") +### Agent doesn't respond - # Start the server process - self.process = subprocess.Popen( - [ - "python", - "-m", - "openhands.agent_server", - "--port", - str(self.port), - "--host", - self.host, - ], - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - text=True, - env={"LOG_JSON": "true", **os.environ}, - ) +1. Check your LLM settings: `openhands` then `/settings` +2. Verify your API key is valid +3. Check network connectivity to your LLM provider - # Start threads to stream stdout and stderr - assert self.process is not None - assert self.process.stdout is not None - assert self.process.stderr is not None - self.stdout_thread = threading.Thread( - target=_stream_output, - args=(self.process.stdout, "SERVER", sys.stdout), - daemon=True, - ) - self.stderr_thread = threading.Thread( - target=_stream_output, - args=(self.process.stderr, "SERVER", sys.stderr), - daemon=True, - ) +### Conversation not persisting - self.stdout_thread.start() - self.stderr_thread.start() +Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. - # Wait for server to be ready - max_retries = 30 - for i in range(max_retries): - try: - import httpx +## See Also - response = httpx.get(f"{self.base_url}/health", timeout=1.0) - if response.status_code == 200: - print(f"API server is ready at {self.base_url}") - return self - except Exception: - pass +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs - assert self.process is not None - if self.process.poll() is not None: - # Process has terminated - raise RuntimeError( - "Server process terminated unexpectedly. " - "Check the server logs above for details." - ) +### VS Code +Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md - time.sleep(1) +[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. - raise RuntimeError(f"Server failed to start after {max_retries} seconds") + +VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. + - def __exit__(self, exc_type, exc_val, exc_tb): - """Stop the API server subprocess.""" - if self.process: - print("Stopping API server...") - self.process.terminate() - try: - self.process.wait(timeout=5) - except subprocess.TimeoutExpired: - print("Force killing API server...") - self.process.kill() - self.process.wait() +## Prerequisites - # Wait for streaming threads to finish (they're daemon threads, - # so they'll stop automatically) - # But give them a moment to flush any remaining output - time.sleep(0.5) - print("API server stopped.") +Before configuring VS Code: +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +## Installation -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) -title_gen_llm = LLM( - usage_id="title-gen-llm", - model=os.getenv("LLM_MODEL", "openhands/gpt-5-mini-2025-08-07"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) +### Step 1: Install the Extension -# Use managed API server -with ManagedAPIServer(port=8001) as server: - # Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, # Disable browser tools for simplicity - ) +1. Open VS Code +2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) +3. Search for **"VSCode ACP"** +4. Click **Install** - # Define callbacks to test the WebSocket functionality - received_events = [] - event_tracker = {"last_event_time": time.time()} +Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). - def event_callback(event): - """Callback to capture events for testing.""" - event_type = type(event).__name__ - logger.info(f"🔔 Callback received event: {event_type}\n{event}") - received_events.append(event) - event_tracker["last_event_time"] = time.time() +### Step 2: Connect to OpenHands - # Create RemoteConversation with callbacks - # NOTE: Workspace is required for RemoteConversation - workspace = Workspace(host=server.base_url) - result = workspace.execute_command("pwd") - logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" - ) - logger.info(f"Output: {result.stdout}") +1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) +2. Click **Connect** to start a session +3. Select **OpenHands** from the agent dropdown +4. Start chatting with OpenHands! - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - ) - assert isinstance(conversation, RemoteConversation) +## How It Works - try: - logger.info(f"\n📋 Conversation ID: {conversation.state.id}") +The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. - # Send first message and run - logger.info("📝 Sending first message...") - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) +The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. - # Generate title using a specific LLM - title = conversation.generate_title(max_length=60, llm=title_gen_llm) - logger.info(f"Generated conversation title: {title}") +## Verification - logger.info("🚀 Running conversation...") - conversation.run() +Ensure OpenHands is discoverable: - logger.info("✅ First task completed!") - logger.info(f"Agent status: {conversation.state.execution_status}") +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` - # Wait for events to stop coming (no events for 2 seconds) - logger.info("⏳ Waiting for events to stop...") - while time.time() - event_tracker["last_event_time"] < 2.0: - time.sleep(0.1) - logger.info("✅ Events have stopped") +If the command is not found, install OpenHands CLI: +```bash +uv tool install openhands --python 3.12 +``` - logger.info("🚀 Running conversation again...") - conversation.send_message("Great! Now delete that file.") - conversation.run() - logger.info("✅ Second task completed!") +## Advanced Usage - # Demonstrate state.events functionality - logger.info("\n" + "=" * 50) - logger.info("📊 Demonstrating State Events API") - logger.info("=" * 50) +### Custom Arguments - # Count total events using state.events - total_events = len(conversation.state.events) - logger.info(f"📈 Total events in conversation: {total_events}") +The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. - # Get recent events (last 5) using state.events - logger.info("\n🔍 Getting last 5 events using state.events...") - all_events = conversation.state.events - recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +### Resume Conversations - for i, event in enumerate(recent_events, 1): - event_type = type(event).__name__ - timestamp = getattr(event, "timestamp", "Unknown") - logger.info(f" {i}. {event_type} at {timestamp}") +To resume a conversation, you may need to: - # Let's see what the actual event types are - logger.info("\n🔍 Event types found:") - event_types = set() - for event in recent_events: - event_type = type(event).__name__ - event_types.add(event_type) - for event_type in sorted(event_types): - logger.info(f" - {event_type}") +1. Find your conversation ID: `openhands --resume` +2. Configure the extension to use custom arguments (if supported) +3. Or use the terminal directly: `openhands acp --resume ` - # Print all ConversationStateUpdateEvent - logger.info("\n🗂️ ConversationStateUpdateEvent events:") - for event in conversation.state.events: - if isinstance(event, ConversationStateUpdateEvent): - logger.info(f" - {event}") + +The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. + - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") +## Troubleshooting - finally: - # Clean up - print("\n🧹 Cleaning up conversation...") - conversation.close() -``` +### OpenHands Not Appearing in Dropdown - +1. Verify OpenHands is installed and in PATH: + ```bash + which openhands + openhands --version + ``` -## Next Steps +2. Restart VS Code after installing OpenHands -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service -- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details -- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture +3. Check if the extension recognizes agents: + - Look for any error messages in the extension panel + - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) -### Overview -Source: https://docs.openhands.dev/sdk/guides/agent-server/overview.md +### Connection Failed -Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. +1. Ensure your LLM settings are configured: + ```bash + openhands + # Use /settings to configure + ``` +2. Check that `openhands acp` works in terminal: + ```bash + openhands acp + # Should start without errors (Ctrl+C to exit) + ``` -For example, switching from a local workspace to a Docker‑based remote agent server: +### Extension Not Working -```python icon="python" lines -# Local → Docker -conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] -from openhands.workspace import DockerWorkspace # [!code ++] -with DockerWorkspace( # [!code ++] - server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] -) as workspace: # [!code ++] - conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] -``` +1. Update to the latest version of the extension +2. Check for VS Code updates +3. Report issues on the [extension's GitHub](https://github.com/omercnet) -Use `DockerWorkspace` with the pre-built agent server image for the fastest startup. When you need to build from a custom base image, switch to [`DockerDevWorkspace`](/sdk/guides/agent-server/docker-sandbox). +## Limitations -Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): +Since this is a community extension: -```python icon="python" lines -# Local → Remote API -conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] -from openhands.workspace import APIRemoteWorkspace # [!code ++] -with APIRemoteWorkspace( # [!code ++] - runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] - runtime_api_key="YOUR_API_KEY", # [!code ++] - server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] -) as workspace: # [!code ++] - conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] -``` +- Feature availability may vary +- Support depends on the extension maintainer +- Not all OpenHands CLI flags may be accessible through the UI +For the most control over OpenHands, consider using: +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage +- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support -## What is a Remote Agent Server? +## See Also -A Remote Agent Server is an HTTP/WebSocket server that: -- **Package the Software Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) -- **Runs agents** on dedicated infrastructure -- **Manages workspaces** (Docker containers or remote sandboxes) -- **Streams events** to clients via WebSocket -- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details -- **Provides isolation** between different agent executions +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal -Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. +### Zed IDE +Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed.md -{/* -Same interfaces as local: -[BaseConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), -[ConversationStateProtocol](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), -[EventsListBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: -[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). - */} +[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. + -## Architecture Overview +## Prerequisites -Remote Agent Servers follow a simple three-part architecture: +Before configuring Zed, ensure you have: -```mermaid -graph TD - Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] - Server --> Workspace[Workspace] +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **Zed editor** - Download from [zed.dev](https://zed.dev/) - subgraph Workspace Types - Workspace --> Local[Local Folder] - Workspace --> Docker[Docker Container] - Workspace --> API[Remote Sandbox via API] - end +## Configuration - Local --> Files[File System] - Docker --> Container[Isolated Runtime] - API --> Cloud[Cloud Infrastructure] +### Step 1: Open Agent Settings - style Client fill:#e1f5fe - style Server fill:#fff3e0 - style Workspace fill:#e8f5e8 -``` +1. Open Zed +2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette +3. Search for `agent: open settings` -1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. -2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. -3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. +![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) -The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. +### Step 2: Add OpenHands as an Agent -## How Remote Conversations Work +1. On the right side, click `+ Add Agent` +2. Select `Add Custom Agent` -Each step in the diagram maps directly to how the SDK and server interact: +![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) -### 1. Workspace Connection → *(Client → Server)* +### Step 3: Configure the Agent -When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: +Add the following configuration to the `agent_servers` field: -```python icon="python" -with DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest" -) as workspace: - conversation = Conversation(agent=agent, workspace=workspace) +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": [ + "openhands", + "acp" + ], + "env": {} + } + } +} ``` -This turns the local `Conversation` into a **[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. +### Step 4: Save and Use +1. Save the settings file +2. You can now use OpenHands within Zed! -### 2. Server Initialization → *(Server → Workspace)* +![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) -Once the workspace starts: -- It launches the agent server process. -- Waits for it to be ready. -- Shares the server URL with the SDK client. +## Advanced Configuration -You don’t need to manage this manually—the workspace context handles startup and teardown automatically. +### LLM-Approve Mode -### 3. Event Streaming → *(Bidirectional WebSocket)* +For automatic LLM-based approval of actions: -The client and agent server maintain a live WebSocket connection for streaming events: +```json +{ + "agent_servers": { + "OpenHands (LLM Approve)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--llm-approve" + ], + "env": {} + } + } +} +``` -```python icon="python" -def on_event(event): - print(f"Received: {type(event).__name__}") +### Resume a Specific Conversation -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[on_event], -) -``` +To resume a previous conversation: -This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "abc123def456" + ], + "env": {} + } + } +} +``` -### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* +Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. -Workspace supports file and command operations via the agent server API ([base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: +### Resume Latest Conversation -```python icon="python" -workspace.file_upload(local_path, remote_path) -workspace.file_download(remote_path, local_path) -result = workspace.execute_command("ls -la") -print(result.stdout) +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "--last" + ], + "env": {} + } + } +} ``` -These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. - -### Summary - -The architecture makes remote execution seamless: -- Your **client code** stays the same. -- The **agent server** manages execution and streaming. -- The **workspace** provides secure, isolated runtime environments. +### Multiple Configurations -Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. +You can add multiple OpenHands configurations for different use cases: -## Next Steps +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": ["openhands", "acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "uvx", + "args": ["openhands", "acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "uvx", + "args": ["openhands", "acp", "--resume", "--last"], + "env": {} + } + } +} +``` -Explore different deployment options: +## Troubleshooting -- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Run agent server in the same process -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run agent server in isolated Docker containers -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted agent server via API +### Accessing Debug Logs -For architectural details: -- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment +If you encounter issues: -### Stuck Detector -Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md +1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) +2. Type and select `acp debug log` +3. Review the logs for errors or warnings +4. Restart the conversation to reload connections after configuration changes -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Common Issues -> A ready-to-run example is available [here](#ready-to-run-example)! +**"openhands" command not found** -The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: +Ensure OpenHands is installed and in your PATH: +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` -1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) -2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) -3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) -4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) -5. **Context Window Errors**: Repeated context window errors that indicate memory management issues +If using `uvx`, ensure uv is installed: +```bash +uv --version +``` -When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. +**Agent doesn't start** - - For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). - +1. Check that your LLM settings are configured: run `openhands` and verify `/settings` +2. Verify the configuration JSON syntax is valid +3. Check the ACP debug logs for detailed errors +**Conversation doesn't persist** -## How It Works +Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. -In the [ready-to-run example](#ready-to-run-example), the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` -command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: + +After making configuration changes, restart the conversation in Zed to apply them. + -1. The conversation proceeds normally until the agent starts repeating actions -2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck -3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action +## See Also -The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the -stuck status at any point using `conversation.stuck_detector.is_stuck()`. +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs -## Pattern Detection +### Installation +Source: https://docs.openhands.dev/openhands/usage/cli/installation.md -The stuck detector compares events based on their semantic content rather than object identity. For example: -- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) -- **Observations** are compared by their observation content and tool name -- **Errors** are compared by their error messages -- **Messages** are compared by their content and source + +**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. + -This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. +## Installation Methods -## Ready-to-run Example + + + Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. - -This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) - + **Install OpenHands:** + ```bash + uv tool install openhands --python 3.12 + ``` + **Run OpenHands:** + ```bash + openhands + ``` -```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py -import os + **Upgrade OpenHands:** + ```bash + uv tool upgrade openhands --python 3.12 + ``` + + + Install the OpenHands CLI binary with the install script: -from pydantic import SecretStr + ```bash + curl -fsSL https://install.openhands.dev/install.sh | sh + ``` -from openhands.sdk import ( - LLM, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.tools.preset.default import get_default_agent + Then run: + ```bash + openhands + ``` + + Your system may require you to allow permissions to run the executable. -logger = get_logger(__name__) + + When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple + cannot check it for malicious software." -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + 1. Open `System Settings`. + 2. Go to `Privacy & Security`. + 3. Scroll down to `Security` and click `Allow Anyway`. + 4. Rerun the OpenHands CLI. -agent = get_default_agent(llm=llm) + ![mac-security](/openhands/static/img/cli-security-mac.png) -llm_messages = [] + + + + + 1. Set the following environment variable in your terminal: + - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) + 2. Ensure you have configured your settings before starting: + - Set up `~/.openhands/settings.json` with your LLM configuration -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) + 3. Run the following command: + ```bash + docker run -it \ + --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e SANDBOX_USER_ID=$(id -u) \ + -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/root/.openhands \ + --add-host host.docker.internal:host-gateway \ + --name openhands-cli-$(date +%Y%m%d%H%M%S) \ + python:3.12-slim \ + bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" + ``` -# Create conversation with built-in stuck detection -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=os.getcwd(), - # This is by default True, shown here for clarity of the example - stuck_detection=True, -) + The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's + permissions. This prevents the agent from creating root-owned files in the mounted workspace. + + -# Send a task that will be caught by stuck detection -conversation.send_message( - "Please execute 'ls' command 5 times, each in its own " - "action without any thought and then exit at the 6th step." -) +## First Run -# Run the conversation - stuck detection happens automatically -conversation.run() +The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved +for future sessions in `~/.openhands/settings.json`. -assert conversation.stuck_detector is not None -final_stuck_check = conversation.stuck_detector.is_stuck() -print(f"Final stuck status: {final_stuck_check}") +The conversation history will be saved in `~/.openhands/conversations`. -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") + +If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the +configuration format has changed. + -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +## Next Steps - +- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +### MCP Servers +Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md -## Next Steps +## Overview -- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control -- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK +[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. -### Theory of Mind (TOM) Agent -Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent.md +The CLI provides two ways to manage MCP servers: +1. **CLI commands** (`openhands mcp`) - Manage servers from the command line +2. **Interactive command** (`/mcp`) - View server status within a conversation -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. + -## Overview +## MCP Commands -Tom (Theory of Mind) Agent provides advanced user understanding capabilities that help your agent interpret vague instructions and adapt to user preferences over time. Built on research in user mental modeling, Tom agents can: +### List Servers -- Understand unclear or ambiguous user requests -- Provide personalized guidance based on user modeling -- Build long-term user preference profiles -- Adapt responses based on conversation history +View all configured MCP servers: -This is particularly useful when: -- User instructions are vague or incomplete -- You need to infer user intent from minimal context -- Building personalized experiences across multiple conversations -- Understanding user preferences and working patterns +```bash +openhands mcp list +``` -## Research Foundation +### Get Server Details -Tom agent is based on the TOM-SWE research paper on user mental modeling for software engineering agents: +View details for a specific server: -```bibtex Citation -@misc{zhou2025tomsweusermentalmodeling, - title={TOM-SWE: User Mental Modeling For Software Engineering Agents}, - author={Xuhui Zhou and Valerie Chen and Zora Zhiruo Wang and Graham Neubig and Maarten Sap and Xingyao Wang}, - year={2025}, - eprint={2510.21903}, - archivePrefix={arXiv}, - primaryClass={cs.SE}, - url={https://arxiv.org/abs/2510.21903}, -} +```bash +openhands mcp get ``` - -Paper: [TOM-SWE on arXiv](https://arxiv.org/abs/2510.21903) - +### Remove a Server -## Quick Start +Remove a server configuration: - -This example is available on GitHub: [examples/01_standalone_sdk/30_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/30_tom_agent.py) - +```bash +openhands mcp remove +``` -```python icon="python" expandable examples/01_standalone_sdk/30_tom_agent.py -"""Example demonstrating Tom agent with Theory of Mind capabilities. +### Enable/Disable Servers -This example shows how to set up an agent with Tom tools for getting -personalized guidance based on user modeling. Tom tools include: -- TomConsultTool: Get guidance for vague or unclear tasks -- SleeptimeComputeTool: Index conversations for user modeling -""" +Control which servers are active: -import os +```bash +# Enable a server +openhands mcp enable -from pydantic import SecretStr +# Disable a server +openhands mcp disable +``` -from openhands.sdk import LLM, Agent, Conversation -from openhands.sdk.tool import Tool -from openhands.tools.preset.default import get_default_tools -from openhands.tools.tom_consult import ( - SleeptimeComputeAction, - SleeptimeComputeObservation, - SleeptimeComputeTool, - TomConsultTool, -) +## Adding Servers +### HTTP/SSE Servers -# Configure LLM -api_key: str | None = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." +Add remote servers with HTTP or SSE transport: -llm: LLM = LLM( - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - api_key=os.getenv("LLM_API_KEY"), - base_url=os.getenv("LLM_BASE_URL", None), - usage_id="agent", - drop_params=True, -) +```bash +openhands mcp add --transport http +``` -# Build tools list with Tom tools -# Note: Tom tools are automatically registered on import (PR #862) -tools = get_default_tools(enable_browser=False) +#### With Bearer Token Authentication -# Configure Tom tools with parameters -tom_params: dict[str, bool | str] = { - "enable_rag": True, # Enable RAG in Tom agent -} +```bash +openhands mcp add my-api --transport http \ + --header "Authorization: Bearer your-token" \ + https://api.example.com/mcp +``` -# Add LLM configuration for Tom tools (uses same LLM as main agent) -tom_params["llm_model"] = llm.model -if llm.api_key: - if isinstance(llm.api_key, SecretStr): - tom_params["api_key"] = llm.api_key.get_secret_value() - else: - tom_params["api_key"] = llm.api_key -if llm.base_url: - tom_params["api_base"] = llm.base_url +#### With API Key Authentication -# Add both Tom tools to the agent -tools.append(Tool(name=TomConsultTool.name, params=tom_params)) -tools.append(Tool(name=SleeptimeComputeTool.name, params=tom_params)) +```bash +openhands mcp add weather-api --transport http \ + --header "X-API-Key: your-api-key" \ + https://weather.api.com +``` -# Create agent with Tom capabilities -# This agent can consult Tom for personalized guidance -# Note: Tom's user modeling data will be stored in ~/.openhands/ -agent: Agent = Agent(llm=llm, tools=tools) +#### With Multiple Headers -# Start conversation -cwd: str = os.getcwd() -PERSISTENCE_DIR = os.path.expanduser("~/.openhands") -CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations") -conversation = Conversation( - agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR -) +```bash +openhands mcp add secure-api --transport http \ + --header "Authorization: Bearer token123" \ + --header "X-Client-ID: client456" \ + https://api.example.com +``` -# Optionally run sleeptime compute to index existing conversations -# This builds user preferences and patterns from conversation history -# Using execute_tool allows running tools before conversation.run() -print("\nRunning sleeptime compute to index conversations...") -try: - sleeptime_result = conversation.execute_tool( - "sleeptime_compute", SleeptimeComputeAction() - ) - # Cast to the expected observation type for type-safe access - if isinstance(sleeptime_result, SleeptimeComputeObservation): - print(f"Result: {sleeptime_result.message}") - print(f"Sessions processed: {sleeptime_result.sessions_processed}") - else: - print(f"Result: {sleeptime_result.text}") -except KeyError as e: - print(f"Tool not available: {e}") +#### With OAuth Authentication -# Send a potentially vague message where Tom consultation might help -conversation.send_message( - "I need to debug some code but I'm not sure where to start. " - + "Can you help me figure out the best approach?" -) -conversation.run() +```bash +openhands mcp add notion-server --transport http \ + --auth oauth \ + https://mcp.notion.com/mcp +``` -print("\n" + "=" * 80) -print("Tom agent consultation example completed!") -print("=" * 80) +### Stdio Servers -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +Add local servers that communicate via stdio: +```bash +openhands mcp add --transport stdio -- [args...] +``` -# Optional: Index this conversation for Tom's user modeling -# This builds user preferences and patterns from conversation history -# Uncomment the lines below to index the conversation: -# -# conversation.send_message("Please index this conversation using sleeptime_compute") -# conversation.run() -# print("\nConversation indexed for user modeling!") +#### Basic Example -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +```bash +openhands mcp add local-server --transport stdio \ + python -- -m my_mcp_server ``` - +#### With Environment Variables -## Tom Tools +```bash +openhands mcp add local-server --transport stdio \ + --env "API_KEY=secret123" \ + --env "DATABASE_URL=postgresql://localhost/mydb" \ + python -- -m my_mcp_server --config config.json +``` -### TomConsultTool +#### Add in Disabled State -The consultation tool provides personalized guidance when the agent encounters vague or unclear user requests: +```bash +openhands mcp add my-server --transport stdio --disabled \ + node -- my-server.js +``` -```python icon="python" -# The agent can automatically call this tool when needed -# Example: User says "I need to debug something" -# Tom analyzes the vague request and provides specific guidance +### Command Reference + +```bash +openhands mcp add --transport [options] [-- args...] ``` -Key features: -- Analyzes conversation history for context -- Provides personalized suggestions based on user modeling -- Helps disambiguate vague instructions -- Adapts to user communication patterns +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | -### SleeptimeComputeTool +## Example: Web Search with Tavily -The indexing tool processes conversation history to build user preference profiles: +Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): -```python icon="python" -# Index conversations for future personalization -sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute") -if sleeptime_compute_tool: - result = sleeptime_compute_tool.executor( - SleeptimeComputeAction(), conversation - ) +```bash +openhands mcp add tavily --transport stdio \ + npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" ``` -Key features: -- Processes conversation history into user models -- Stores preferences in `~/.openhands/` directory -- Builds understanding of user patterns over time -- Enables long-term personalization across sessions +## Manual Configuration -## Configuration +You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. -### RAG Support +### Configuration Format + +The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): + +```json +{ + "mcpServers": { + "server-name": { + "command": "command-to-run", + "args": ["arg1", "arg2"], + "env": { + "ENV_VAR": "value" + } + } + } +} +``` -Enable retrieval-augmented generation for enhanced context awareness: +### Example Configuration -```python icon="python" -tom_params = { - "enable_rag": True, # Enable RAG for better context retrieval +```json +{ + "mcpServers": { + "tavily-remote": { + "command": "npx", + "args": [ + "-y", + "mcp-remote", + "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" + ] + }, + "local-tools": { + "command": "python", + "args": ["-m", "my_mcp_tools"], + "env": { + "DEBUG": "true" + } + } + } } ``` -### Custom LLM for Tom +## Interactive `/mcp` Command -You can optionally use a different LLM for Tom's internal reasoning: +Within an OpenHands conversation, use `/mcp` to view server status: -```python icon="python" -# Use the same LLM as main agent -tom_params["llm_model"] = llm.model -tom_params["api_key"] = llm.api_key.get_secret_value() +- **View active servers**: Shows which MCP servers are currently active in the conversation +- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts -# Or configure a separate LLM for Tom -tom_llm = LLM(model="gpt-4", api_key=SecretStr("different-key")) -tom_params["llm_model"] = tom_llm.model -tom_params["api_key"] = tom_llm.api_key.get_secret_value() -``` + +The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. + -## Data Storage +## Workflow -Tom stores user modeling data persistently in `~/.openhands/`: +1. **Add servers** using `openhands mcp add` +2. **Start a conversation** with `openhands` +3. **Check status** with `/mcp` inside the conversation +4. **Use the tools** provided by your MCP servers - - - - - - - - - - - - - - - +The agent will automatically have access to tools provided by enabled MCP servers. -where -- `user_models/` stores user preference profiles, with each user having their own subdirectory containing `user_model.json` (the current user model). -- `conversations/` contains indexed conversation data +## Troubleshooting -This persistent storage enables Tom to: -- Remember user preferences across sessions -- Track which conversations have been indexed -- Build long-term understanding of user patterns +### Server Not Appearing -## Use Cases +1. Verify the server is enabled: + ```bash + openhands mcp list + ``` -### 1. Handling Vague Requests +2. Check the configuration: + ```bash + openhands mcp get + ``` -When a user provides minimal information: +3. Restart the conversation to load new configurations -```python icon="python" -conversation.send_message("Help me with that bug") -# Tom analyzes history to determine which bug and suggest approach -``` +### Server Fails to Start -### 2. Personalized Recommendations +1. Test the command manually: + ```bash + # For stdio servers + python -m my_mcp_server + + # For HTTP servers, check the URL is reachable + curl https://api.example.com/mcp + ``` -Tom adapts suggestions based on past interactions: +2. Check environment variables and credentials -```python icon="python" -# After multiple conversations, Tom learns: -# - User prefers minimal explanations -# - User typically works with Python -# - User values efficiency over verbosity -``` +3. Review error messages in the CLI output -### 3. Intent Inference +### Configuration File Location -Understanding what the user really wants: +The MCP configuration is stored at: +- **Config file**: `~/.openhands/mcp.json` -```python icon="python" -conversation.send_message("Make it better") -# Tom infers from context what "it" is and how to improve it -``` +## See Also -## Best Practices +- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation +- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference -1. **Enable RAG**: For better context awareness, always enable RAG: - ```python icon="python" - tom_params = {"enable_rag": True} - ``` +### Quick Start +Source: https://docs.openhands.dev/openhands/usage/cli/quick-start.md -2. **Index Regularly**: Run sleeptime compute after important conversations to build better user models + +**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. + -3. **Provide Context**: Even with Tom, providing more context leads to better results +## Overview -4. **Monitor Data**: Check `~/.openhands/` periodically to understand what's being learned +The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: -5. **Privacy Considerations**: Be aware that conversation data is stored locally for user modeling +| Mode | Command | Best For | +|------|---------|----------| +| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | +| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | +| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | +| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | +| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | -## Next Steps + -- **[Agent Delegation](/sdk/guides/agent-delegation)** - Combine Tom with sub-agents for complex workflows -- **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively -- **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights +## Your First Conversation -### Browser Session Recording -Source: https://docs.openhands.dev/sdk/guides/browser-session-recording.md +**Set up your account** (first time only): -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + ```bash + openhands login + ``` + This authenticates with OpenHands Cloud and fetches your settings. + + + The CLI will prompt you to configure your LLM provider and API key on first run. + + -> A ready-to-run example is available [here](#ready-to-run-example)! +1. **Start the CLI:** + ```bash + openhands + ``` -The browser session recording feature allows you to capture your agent's browser interactions and replay them later using [rrweb](https://github.com/rrweb-io/rrweb). This is useful for debugging, auditing, and understanding how your agent interacts with web pages. +2. **Enter a task:** + ``` + Create a Python script that prints "Hello, World!" + ``` -## How It Works +3. **Watch OpenHands work:** + The agent will create the file and show you the results. -The recording feature uses rrweb to capture DOM mutations, mouse movements, scrolling, and other browser events. The recordings are saved as JSON files that can be replayed using rrweb-player or the online viewer. +## Controls -The [ready-to-run example](#ready-to-run-example) demonstrates: +Once inside the CLI, use these controls: -1. **Starting a recording**: Use `browser_start_recording` to begin capturing browser events -2. **Browsing and interacting**: Navigate to websites and perform actions while recording -3. **Stopping the recording**: Use `browser_stop_recording` to stop and save the recording +| Control | Description | +|---------|-------------| +| `Ctrl+P` | Open command palette (access Settings, MCP status) | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | -The recording files are automatically saved to the persistence directory when the recording is stopped. +## Starting with a Task -## Replaying Recordings +You can start the CLI with an initial task: -After recording a session, you can replay it using: +```bash +# Start with a task +openhands -t "Fix the bug in auth.py" -- **rrweb-player**: A standalone player component - [GitHub](https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player) -- **Online viewer**: Upload your recording at [rrweb.io/demo](https://www.rrweb.io/) +# Start with a task from a file +openhands -f task.txt +``` -## Ready-to-run Example +## Resuming Conversations - -This example is available on GitHub: [examples/01_standalone_sdk/38_browser_session_recording.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/38_browser_session_recording.py) - +Resume a previous conversation: -```python icon="python" expandable examples/01_standalone_sdk/38_browser_session_recording.py -"""Browser Session Recording Example +```bash +# List recent conversations and select one +openhands --resume -This example demonstrates how to use the browser session recording feature -to capture and save a recording of the agent's browser interactions using rrweb. +# Resume the most recent conversation +openhands --resume --last -The recording can be replayed later using rrweb-player to visualize the agent's -browsing session. +# Resume a specific conversation by ID +openhands --resume abc123def456 +``` -The recording will be automatically saved to the persistence directory when -browser_stop_recording is called. You can replay it with: - - rrweb-player: https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player - - Online viewer: https://www.rrweb.io/ -""" +For more details, see [Resume Conversations](/openhands/usage/cli/resume). + +## Next Steps + + + + Learn about the interactive terminal interface + + + Use OpenHands in Zed, VS Code, or JetBrains + + + Automate tasks with scripting + + + Add tools via Model Context Protocol + + -import json -import os +### Resume Conversations +Source: https://docs.openhands.dev/openhands/usage/cli/resume.md -from pydantic import SecretStr +## Overview -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.browser_use import BrowserToolSet -from openhands.tools.browser_use.definition import BROWSER_RECORDING_OUTPUT_DIR +OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. +## Listing Previous Conversations -logger = get_logger(__name__) +To see a list of your recent conversations, run: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +```bash +openhands --resume +``` -# Tools - including browser tools with recording capability -cwd = os.getcwd() -tools = [ - Tool(name=BrowserToolSet.name), -] +This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: -# Agent -agent = Agent(llm=llm, tools=tools) +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py -llm_messages = [] # collect raw LLM messages + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service + 3. mno345pqr678 (3 days ago) + Refactor the database connection module +-------------------------------------------------------------------------------- +To resume a conversation, use: openhands --resume +``` -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Resuming a Specific Conversation +To resume a specific conversation, use the `--resume` flag with the conversation ID: -# Create conversation with persistence_dir set to save browser recordings -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir="./.conversations", -) +```bash +openhands --resume +``` -# The prompt instructs the agent to: -# 1. Start recording the browser session -# 2. Browse to a website and perform some actions -# 3. Stop recording (auto-saves to file) -PROMPT = """ -Please complete the following task to demonstrate browser session recording: +For example: -1. First, use `browser_start_recording` to begin recording the browser session. +```bash +openhands --resume abc123def456 +``` -2. Then navigate to https://docs.openhands.dev/ and: - - Get the page content - - Scroll down the page - - Get the browser state to see interactive elements +## Resuming the Latest Conversation -3. Next, navigate to https://docs.openhands.dev/openhands/usage/cli/installation and: - - Get the page content - - Scroll down to see more content +To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: -4. Finally, use `browser_stop_recording` to stop the recording. - Events are automatically saved. -""" +```bash +openhands --resume --last +``` -print("=" * 80) -print("Browser Session Recording Example") -print("=" * 80) -print("\nTask: Record an agent's browser session and save it for replay") -print("\nStarting conversation with agent...\n") +This automatically finds and resumes the most recent conversation. -conversation.send_message(PROMPT) -conversation.run() +## How It Works -print("\n" + "=" * 80) -print("Conversation finished!") -print("=" * 80) +When you resume a conversation: -# Check if the recording files were created -# Recordings are saved in BROWSER_RECORDING_OUTPUT_DIR/recording-{timestamp}/ -if os.path.exists(BROWSER_RECORDING_OUTPUT_DIR): - # Find recording subdirectories (they start with "recording-") - recording_dirs = sorted( - [ - d - for d in os.listdir(BROWSER_RECORDING_OUTPUT_DIR) - if d.startswith("recording-") - and os.path.isdir(os.path.join(BROWSER_RECORDING_OUTPUT_DIR, d)) - ] - ) +1. OpenHands loads the full conversation history from disk +2. The agent has access to all previous context, including: + - Your previous messages and requests + - The agent's responses and actions + - Any files that were created or modified +3. You can continue the conversation as if you never left - if recording_dirs: - # Process the most recent recording directory - latest_recording = recording_dirs[-1] - recording_path = os.path.join(BROWSER_RECORDING_OUTPUT_DIR, latest_recording) - json_files = sorted( - [f for f in os.listdir(recording_path) if f.endswith(".json")] - ) + +The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. + - print(f"\n✓ Recording saved to: {recording_path}") - print(f"✓ Number of files: {len(json_files)}") +## Resuming in Different Modes - # Count total events across all files - total_events = 0 - all_event_types: dict[int | str, int] = {} - total_size = 0 +### Terminal Mode - for json_file in json_files: - filepath = os.path.join(recording_path, json_file) - file_size = os.path.getsize(filepath) - total_size += file_size +```bash +openhands --resume abc123def456 +openhands --resume --last +``` - with open(filepath) as f: - events = json.load(f) +### ACP Mode (IDEs) - # Events are stored as a list in each file - if isinstance(events, list): - total_events += len(events) - for event in events: - event_type = event.get("type", "unknown") - all_event_types[event_type] = all_event_types.get(event_type, 0) + 1 +```bash +openhands acp --resume abc123def456 +openhands acp --resume --last +``` - print(f" - {json_file}: {len(events)} events, {file_size} bytes") +For IDE-specific configurations, see: +- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) +- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) +- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) - print(f"✓ Total events: {total_events}") - print(f"✓ Total size: {total_size} bytes") - if all_event_types: - print(f"✓ Event types: {all_event_types}") +### With Confirmation Modes - print("\nTo replay this recording, you can use:") - print( - " - rrweb-player: " - "https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player" - ) - else: - print(f"\n✗ No recording directories found in: {BROWSER_RECORDING_OUTPUT_DIR}") - print(" The agent may not have completed the recording task.") -else: - print(f"\n✗ Observations directory not found: {BROWSER_RECORDING_OUTPUT_DIR}") - print(" The agent may not have completed the recording task.") +Combine `--resume` with confirmation mode flags: -print("\n" + "=" * 100) -print("Conversation finished.") -print(f"Total LLM messages: {len(llm_messages)}") -print("=" * 100) +```bash +# Resume with LLM-based approval +openhands --resume abc123def456 --llm-approve -# Report cost -cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"Conversation ID: {conversation.id}") -print(f"EXAMPLE_COST: {cost}") +# Resume with auto-approve +openhands --resume --last --always-approve ``` - +## Tips -### Context Condenser -Source: https://docs.openhands.dev/sdk/guides/context-condenser.md + +**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. + -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. + -> A ready-to-run example is available [here](#ready-to-run-example)! +## Storage Location -## What is a Context Condenser? +Conversations are stored in: -A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: +``` +~/.openhands/conversations/ +├── abc123def456/ +│ └── conversation.json +├── xyz789ghi012/ +│ └── conversation.json +└── ... +``` -- **💰 Increased API Costs**: More tokens in the context means higher costs per API call -- **⏱️ Slower Response Times**: Larger contexts take longer to process -- **📉 Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information +## See Also -The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference -## Default Implementation: `LLMSummarizingCondenser` +### Terminal (CLI) +Source: https://docs.openhands.dev/openhands/usage/cli/terminal.md -OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. +## Overview -### How It Works +The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. -When conversation history exceeds a defined threshold, the LLM-based condenser: +```bash +openhands +``` -1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context -2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained -3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries -4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction +## Features + +- **Real-time interaction**: Type natural language tasks and receive instant feedback +- **Live status monitoring**: Watch the agent's progress as it works +- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more -{/* Auto-switching light/dark mode image. */} -Light mode interface -Dark mode interface +## Command Palette -This approach achieves remarkable efficiency gains: -- Up to **2x reduction** in per-turn API costs -- **Consistent response times** even in long sessions -- **Equivalent or better performance** on software engineering tasks +Press `Ctrl+P` to open the command palette, then select from the dropdown options: -Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). +| Option | Description | +|--------|-------------| +| **Settings** | Open the settings configuration menu | +| **MCP** | View MCP server status | -### Extensibility +## Controls -The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): +| Control | Action | +|---------|--------| +| `Ctrl+P` | Open command palette | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | -- **`RollingCondenser`** - For condensers that apply condensation to rolling history -- **`CondenserBase`** - For more specialized condensation strategies +## Starting with a Task -This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. +Start a conversation with an initial task: +```bash +# Provide a task directly +openhands -t "Create a REST API for user management" -### Setting Up Condensing +# Load task from a file +openhands -f requirements.txt +``` -Create a `LLMSummarizingCondenser` to manage the context. -The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. +## Confirmation Modes -This condenser triggers when there are more than `max_context_length` events in -the conversation history, and always keeps the first `keep_first` events (system prompts, -initial user messages) to preserve important context. +Control how the agent requests approval for actions: -```python focus={3-4} icon="python" -from openhands.sdk.context import LLMSummarizingCondenser +```bash +# Default: Always ask for confirmation +openhands -condenser = LLMSummarizingCondenser( - llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 -) +# Auto-approve all actions (use with caution) +openhands --always-approve -# Agent with condenser -agent = Agent(llm=llm, tools=tools, condenser=condenser) +# Use LLM-based security analyzer +openhands --llm-approve ``` -### Ready-to-run example +## Resuming Conversations - -This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) - +Resume previous conversations: +```bash +# List recent conversations +openhands --resume -Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: +# Resume the most recent +openhands --resume --last -```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py -""" -To manage context in long-running conversations, the agent can use a context condenser -that keeps the conversation history within a specified size limit. This example -demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes -older parts of the conversation when the history exceeds a defined threshold. -""" +# Resume a specific conversation +openhands --resume abc123def456 +``` -import os +For more details, see [Resume Conversations](/openhands/usage/cli/resume). -from pydantic import SecretStr +## Tips -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.context.condenser import LLMSummarizingCondenser -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool + +Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. + + +Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. + -logger = get_logger(__name__) +## See Also -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), -] +### Web Interface +Source: https://docs.openhands.dev/openhands/usage/cli/web-interface.md -# Create a condenser to manage the context. The condenser will automatically truncate -# conversation history when it exceeds max_size, and replaces the dropped events with an -# LLM-generated summary. This condenser triggers when there are more than ten events in -# the conversation history, and always keeps the first two events (system prompts, -# initial user messages) to preserve important context. -condenser = LLMSummarizingCondenser( - llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 -) +## Overview -# Agent with condenser -agent = Agent(llm=llm, tools=tools, condenser=condenser) +The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: +- Access the CLI remotely +- Share your terminal session +- Use the CLI on devices without a full terminal -llm_messages = [] # collect raw LLM messages +```bash +openhands web +``` + +This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. + -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Basic Usage +```bash +# Start on default port (12000) +openhands web -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - persistence_dir="./.conversations", - workspace=".", -) +# Access at http://localhost:12000 +``` -# Send multiple messages to demonstrate condensation -print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") +## Options -conversation.send_message( - "Hello! Can you create a Python file named math_utils.py with functions for " - "basic arithmetic operations (add, subtract, multiply, divide)?" -) -conversation.run() +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host address to bind to | +| `--port` | `12000` | Port number to use | +| `--debug` | `false` | Enable debug mode | -conversation.send_message( - "Great! Now add a function to calculate the factorial of a number." -) -conversation.run() +## Examples -conversation.send_message("Add a function to check if a number is prime.") -conversation.run() +```bash +# Custom port +openhands web --port 8080 -conversation.send_message( - "Add a function to calculate the greatest common divisor (GCD) of two numbers." -) -conversation.run() +# Bind to localhost only (more secure) +openhands web --host 127.0.0.1 -conversation.send_message( - "Now create a test file to verify all these functions work correctly." -) -conversation.run() +# Enable debug mode +openhands web --debug -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +# Full example with custom host and port +openhands web --host 0.0.0.0 --port 3000 +``` -# Conversation persistence -print("Serializing conversation...") +## Remote Access -del conversation +To access the web interface from another machine: -# Deserialize the conversation -print("Deserializing conversation...") -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - persistence_dir="./.conversations", - workspace=".", -) +1. Start with `--host 0.0.0.0` to bind to all interfaces: + ```bash + openhands web --host 0.0.0.0 --port 12000 + ``` -print("Sending message to deserialized conversation...") -conversation.send_message("Finally, clean up by deleting both files.") -conversation.run() +2. Access from another machine using the host's IP: + ``` + http://:12000 + ``` -print("=" * 100) -print("Conversation finished with LLM Summarizing Condenser.") -print(f"Total LLM messages collected: {len(llm_messages)}") -print("\nThe condenser automatically summarized older conversation history") -print("when the conversation exceeded the configured max_size threshold.") -print("This helps manage context length while preserving important information.") + +When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. + -# Report cost -cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +## Use Cases - +### Development on Remote Servers -## Next Steps +Access OpenHands on a remote development server through your local browser: -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings +```bash +# On remote server +openhands web --host 0.0.0.0 --port 12000 -### Ask Agent Questions -Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent.md +# On local machine, use SSH tunnel +ssh -L 12000:localhost:12000 user@remote-server -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +# Access at http://localhost:12000 +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +### Sharing Sessions -Use `ask_agent()` to get quick responses from the agent about the current conversation state without -interrupting the main execution flow. +Run the web interface on a shared server for team access: -## Key Features +```bash +openhands web --host 0.0.0.0 --port 8080 +``` -The `ask_agent()` method provides several important capabilities: +## Comparison: Web Interface vs GUI Server -#### Context-Aware Responses +| Feature | `openhands web` | `openhands serve` | +|---------|-----------------|-------------------| +| Interface | Terminal UI in browser | Full web GUI | +| Dependencies | None | Docker required | +| Resources | Lightweight | Full container | +| Best for | Quick access | Rich GUI experience | -The agent has access to the full conversation history when answering questions: +## See Also -```python focus={2-3} icon="python" wrap -# Agent can reference what it has done so far -response = conversation.ask_agent( - "Summarize the activity so far in 1 sentence." -) -print(f"Response: {response}") -``` +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage +- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options -#### Non-Intrusive Operation +## OpenHands Web App Server -Questions don't interrupt the main conversation flow - they're processed separately: +### About OpenHands +Source: https://docs.openhands.dev/openhands/usage/about.md -```python focus={4-6} icon="python" wrap -# Start main conversation -thread = threading.Thread(target=conversation.run) -thread.start() +## Research Strategy -# Ask questions without affecting main execution -response = conversation.ask_agent("How's the progress?") -``` +Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves: -#### Works During and After Execution +- **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling. +- **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization. +- **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our agents. -You can ask questions while the agent is running or after it has completed: +## Default Agent -```python focus={3,7} icon="python" wrap -# During execution -time.sleep(2) # Let agent start working -response1 = conversation.ask_agent("Have you finished running?") +Our default Agent is currently the [CodeActAgent](./agents), which is capable of generating code and handling files. -# After completion -thread.join() -response2 = conversation.ask_agent("What did you accomplish?") -``` +## Built With -### Use Cases +OpenHands is built using a combination of powerful frameworks and libraries, providing a robust foundation for its +development. Here are the key technologies used in the project: -- **Progress Monitoring**: Check on long-running tasks -- **Status Updates**: Get real-time information about agent activities -- **User Interfaces**: Provide sidebar information in chat applications +![FastAPI](https://img.shields.io/badge/FastAPI-black?style=for-the-badge) ![uvicorn](https://img.shields.io/badge/uvicorn-black?style=for-the-badge) ![LiteLLM](https://img.shields.io/badge/LiteLLM-black?style=for-the-badge) ![Docker](https://img.shields.io/badge/Docker-black?style=for-the-badge) ![Ruff](https://img.shields.io/badge/Ruff-black?style=for-the-badge) ![MyPy](https://img.shields.io/badge/MyPy-black?style=for-the-badge) ![LlamaIndex](https://img.shields.io/badge/LlamaIndex-black?style=for-the-badge) ![React](https://img.shields.io/badge/React-black?style=for-the-badge) -## Ready-to-run Example +Please note that the selection of these technologies is in progress, and additional technologies may be added or +existing ones may be removed as the project evolves. We strive to adopt the most suitable and efficient tools to +enhance the capabilities of OpenHands. - - This example is available on GitHub: - [examples/01_standalone_sdk/28_ask_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/28_ask_agent_example.py) - +## License -Example demonstrating the ask_agent functionality for getting sidebar replies -from the agent for a running conversation. +Distributed under MIT [License](https://github.com/OpenHands/OpenHands/blob/main/LICENSE). -This example shows how to use `ask_agent()` to get quick responses from the agent -about the current conversation state without interrupting the main execution flow. +### Configuration Options +Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md -```python icon="python" expandable examples/01_standalone_sdk/28_ask_agent_example.py -""" -Example demonstrating the ask_agent functionality for getting sidebar replies -from the agent for a running conversation. + + This page documents the current V1 configuration model. -This example shows how to use ask_agent() to get quick responses from the agent -about the current conversation state without interrupting the main execution flow. -""" + Legacy config.toml / “runtime” configuration docs have been moved + to the Legacy (V0) section of the Web tab. + -import os -import threading -import time -from datetime import datetime +## Where configuration lives in V1 -from pydantic import SecretStr +Most user-facing configuration is done via the **Settings** UI in the Web app +(LLM provider/model, integrations, MCP, secrets, etc.). -from openhands.sdk import ( - LLM, - Agent, - Conversation, -) -from openhands.sdk.conversation import ConversationVisualizerBase -from openhands.sdk.event import Event -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +For self-hosted deployments and advanced workflows, OpenHands also supports +environment-variable configuration. +## Common V1 environment variables -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +These are some commonly used variables in V1 deployments: -# Tools -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), -] +- **LLM credentials** + - LLM_API_KEY + - LLM_MODEL +- **Persistence** + - OH_PERSISTENCE_DIR: where OpenHands stores local state (defaults to + ~/.openhands). -class MinimalVisualizer(ConversationVisualizerBase): - """A minimal visualizer that print the raw events as they occur.""" +- **Public URL (optional)** + - OH_WEB_URL: the externally reachable URL of your OpenHands instance + (used for callbacks in some deployments). - count = 0 +- **Sandbox workspace mounting** + - SANDBOX_VOLUMES: mount host directories into the sandbox (see + [Docker Sandbox](/openhands/usage/sandboxes/docker)). - def on_event(self, event: Event) -> None: - """Handle events for minimal progress visualization.""" - print(f"\n\n[EVENT {self.count}] {type(event).__name__}") - self.count += 1 +- **Sandbox image selection** + - AGENT_SERVER_IMAGE_REPOSITORY + - AGENT_SERVER_IMAGE_TAG -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation( - agent=agent, workspace=cwd, visualizer=MinimalVisualizer, max_iteration_per_run=5 -) +## Sandbox provider selection +Some deployments still use the legacy RUNTIME environment variable to +choose which sandbox provider to use: -def timestamp() -> str: - return datetime.now().strftime("%H:%M:%S") +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote +See [Sandboxes overview](/openhands/usage/sandboxes/overview) for details. -print("=== Ask Agent Example ===") -print("This example demonstrates asking questions during conversation execution") +## Need legacy options? -# Step 1: Build conversation context -print(f"\n[{timestamp()}] Building conversation context...") -conversation.send_message("Explore the current directory and describe the architecture") +If you are looking for the old config.toml reference or V0 “runtime” +providers, see: -# Step 2: Start conversation in background thread -print(f"[{timestamp()}] Starting conversation in background thread...") -thread = threading.Thread(target=conversation.run) -thread.start() +- Web → Legacy (V0) → V0 Configuration Options +- Web → Legacy (V0) → V0 Runtime Configuration -# Give the agent time to start processing -time.sleep(2) +### Custom Sandbox +Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md -# Step 3: Use ask_agent while conversation is running -print(f"\n[{timestamp()}] Using ask_agent while conversation is processing...") + + These settings are only available in [Local GUI](/openhands/usage/run-openhands/local-setup). OpenHands Cloud uses managed sandbox environments. + -# Ask context-aware questions -questions_and_responses = [] +The sandbox is where the agent performs its tasks. Instead of running commands directly on your computer +(which could be risky), the agent runs them inside a Docker container. -question_1 = "Summarize the activity so far in 1 sentence." -print(f"\n[{timestamp()}] Asking: {question_1}") -response1 = conversation.ask_agent(question_1) -questions_and_responses.append((question_1, response1)) -print(f"Response: {response1}") +The default OpenHands sandbox (`python-nodejs:python3.12-nodejs22` +from [nikolaik/python-nodejs](https://hub.docker.com/r/nikolaik/python-nodejs)) comes with some packages installed such +as python and Node.js but may need other software installed by default. -time.sleep(1) +You have two options for customization: -question_2 = "How's the progress?" -print(f"\n[{timestamp()}] Asking: {question_2}") -response2 = conversation.ask_agent(question_2) -questions_and_responses.append((question_2, response2)) -print(f"Response: {response2}") +- Use an existing image with the required software. +- Create your own custom Docker image. -time.sleep(1) +If you choose the first option, you can skip the `Create Your Docker Image` section. -question_3 = "Have you finished running?" -print(f"\n[{timestamp()}] {question_3}") -response3 = conversation.ask_agent(question_3) -questions_and_responses.append((question_3, response3)) -print(f"Response: {response3}") +## Create Your Docker Image -# Step 4: Wait for conversation to complete -print(f"\n[{timestamp()}] Waiting for conversation to complete...") -thread.join() +To create a custom Docker image, it must be Debian based. -# Step 5: Verify conversation state wasn't affected -final_event_count = len(conversation.state.events) -# Step 6: Ask a final question after conversation completion -print(f"\n[{timestamp()}] Asking final question after completion...") -final_response = conversation.ask_agent( - "Can you summarize what you accomplished in this conversation?" -) -print(f"Final response: {final_response}") +For example, if you want OpenHands to have `ruby` installed, you could create a `Dockerfile` with the following content: -# Step 7: Summary -print("\n" + "=" * 60) -print("SUMMARY OF ASK_AGENT DEMONSTRATION") -print("=" * 60) +```dockerfile +FROM nikolaik/python-nodejs:python3.12-nodejs22 -print("\nQuestions and Responses:") -for i, (question, response) in enumerate(questions_and_responses, 1): - print(f"\n{i}. Q: {question}") - print(f" A: {response[:100]}{'...' if len(response) > 100 else ''}") +# Install required packages +RUN apt-get update && apt-get install -y ruby +``` -final_truncated = final_response[:100] + ("..." if len(final_response) > 100 else "") -print(f"\nFinal Question Response: {final_truncated}") +Or you could use a Ruby-specific base image: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost:.4f}") +```dockerfile +FROM ruby:latest ``` - +Save this file in a folder. Then, build your Docker image (e.g., named custom-image) by navigating to the folder in +the terminal and running:: +```bash +docker build -t custom-image . +``` +This will produce a new image called `custom-image`, which will be available in Docker. -## Next Steps +## Using the Docker Command -- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interrupt and redirect agent execution -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress +When running OpenHands using [the docker command](/openhands/usage/run-openhands/local-setup#start-the-app), replace +the `AGENT_SERVER_IMAGE_REPOSITORY` and `AGENT_SERVER_IMAGE_TAG` environment variables with `-e SANDBOX_BASE_CONTAINER_IMAGE=`: -### Conversation with Async -Source: https://docs.openhands.dev/sdk/guides/convo-async.md +```commandline +docker run -it --rm --pull=always \ + -e SANDBOX_BASE_CONTAINER_IMAGE=custom-image \ + ... +``` -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +## Using the Development Workflow -> A ready-to-run example is available [here](#ready-to-run-example)! +### Setup -### Concurrent Agents +First, ensure you can run OpenHands by following the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md). -Run multiple agent tasks in parallel using `asyncio.gather()`: +### Specify the Base Sandbox Image -```python icon="python" wrap -async def main(): - loop = asyncio.get_running_loop() - callback = AsyncCallbackWrapper(callback_coro, loop) +In the `config.toml` file within the OpenHands directory, set the `base_container_image` to the image you want to use. +This can be an image you’ve already pulled or one you’ve built: - # Create multiple conversation tasks running in parallel - tasks = [ - loop.run_in_executor(None, run_conversation, callback), - loop.run_in_executor(None, run_conversation, callback), - loop.run_in_executor(None, run_conversation, callback) - ] - results = await asyncio.gather(*tasks) +```bash +[core] +... +[sandbox] +base_container_image="custom-image" ``` -## Ready-to-run Example - - -This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) - +### Additional Configuration Options -This example demonstrates usage of a Conversation in an async context -(e.g.: From a fastapi server). The conversation is run in a background -thread and a callback with results is executed in the main runloop +The `config.toml` file supports several other options for customizing your sandbox: -```python icon="python" expandable examples/01_standalone_sdk/11_async.py -""" -This example demonstrates usage of a Conversation in an async context -(e.g.: From a fastapi server). The conversation is run in a background -thread and a callback with results is executed in the main runloop +```toml +[core] +# Install additional dependencies when the runtime is built +# Can contain any valid shell commands +# If you need the path to the Python interpreter in any of these commands, you can use the $OH_INTERPRETER_PATH variable +runtime_extra_deps = """ +pip install numpy pandas +apt-get update && apt-get install -y ffmpeg """ -import asyncio -import os +# Set environment variables for the runtime +# Useful for configuration that needs to be available at runtime +runtime_startup_env_vars = { DATABASE_URL = "postgresql://user:pass@localhost/db" } -from pydantic import SecretStr +# Specify platform for multi-architecture builds (e.g., "linux/amd64" or "linux/arm64") +platform = "linux/amd64" +``` -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.conversation.types import ConversationCallbackType -from openhands.sdk.tool import Tool -from openhands.sdk.utils.async_utils import AsyncCallbackWrapper -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +### Run +Run OpenHands by running ```make run``` in the top level directory. -logger = get_logger(__name__) +### Search Engine Setup +Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +## Setting Up Search Engine in OpenHands -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), -] +OpenHands can be configured to use [Tavily](https://tavily.com/) as a search engine, which allows the agent to +search the web for information when needed. This capability enhances the agent's ability to provide up-to-date +information and solve problems that require external knowledge. -# Agent -agent = Agent(llm=llm, tools=tools) + + Tavily is configured as a search engine by default in OpenHands Cloud! + -llm_messages = [] # collect raw LLM messages +### Getting a Tavily API Key +To use the search functionality in OpenHands, you'll need to obtain a Tavily API key: -# Callback coroutine -async def callback_coro(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +1. Visit [Tavily's website](https://tavily.com/) and sign up for an account. +2. Navigate to the API section in your dashboard. +3. Generate a new API key. +4. Copy the API key (it should start with `tvly-`). +### Configuring Search in OpenHands -# Synchronous run conversation -def run_conversation(callback: ConversationCallbackType): - conversation = Conversation(agent=agent, callbacks=[callback]) +Once you have your Tavily API key, you can configure OpenHands to use it: - conversation.send_message( - "Hello! Can you create a new Python file named hello.py that prints " - "'Hello, World!'? Use task tracker to plan your steps." - ) - conversation.run() +#### In the OpenHands UI - conversation.send_message("Great! Now delete that file.") - conversation.run() +1. Open OpenHands and navigate to the `Settings > LLM` page. +2. Enter your Tavily API key (starting with `tvly-`) in the `Search API Key (Tavily)` field. +3. Click `Save` to apply the changes. + + The search API key field is optional. If you don't provide a key, the search functionality will not be available to + the agent. + -async def main(): - loop = asyncio.get_running_loop() +#### Using Configuration Files - # Create the callback - callback = AsyncCallbackWrapper(callback_coro, loop) +If you're running OpenHands in headless mode or via CLI, you can configure the search API key in your configuration file: - # Run the conversation in a background thread and wait for it to finish... - await loop.run_in_executor(None, run_conversation, callback) +```toml +# In your OpenHands config file +[core] +search_api_key = "tvly-your-api-key-here" +``` - print("=" * 100) - print("Conversation finished. Got the following LLM messages:") - for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +### How Search Works in OpenHands - # Report cost - cost = llm.metrics.accumulated_cost - print(f"EXAMPLE_COST: {cost}") +When the search engine is configured: +- The agent can decide to search the web when it needs external information. +- Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which + includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map). +- Results are returned and incorporated into the agent's context. +- The agent can use this information to provide more accurate and up-to-date responses. -if __name__ == "__main__": - asyncio.run(main()) -``` +### Limitations - +- Search results depend on Tavily's coverage and freshness. +- Usage may be subject to Tavily's rate limits and pricing tiers. +- The agent will only search when it determines that external information is needed. -## Next Steps +### Troubleshooting -- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state -- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents +If you encounter issues with the search functionality: -### Custom Visualizer -Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md +- Verify that your API key is correct and active. +- Check that your API key starts with `tvly-`. +- Ensure you have an active internet connection. +- Check Tavily's status page for any service disruptions. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Main Agent and Capabilities +Source: https://docs.openhands.dev/openhands/usage/agents.md -> A ready-to-run example is available [here](#ready-to-run-example)! +## CodeActAgent -The SDK provides flexible visualization options. You can use the default rich-formatted visualizer, customize it with highlighting patterns, or build completely custom visualizers by subclassing `ConversationVisualizerBase`. +### Description -## Visualizer Configuration Options +This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a +unified **code** action space for both _simplicity_ and _performance_. -The `visualizer` parameter in `Conversation` controls how events are displayed: +The conceptual idea is illustrated below. At each turn, the agent can: -```python icon="python" focus={4-5, 7-8, 10-11, 13, 18, 20, 25} -from openhands.sdk import Conversation -from openhands.sdk.conversation import DefaultConversationVisualizer, ConversationVisualizerBase +1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. +2. **CodeAct**: Choose to perform the task by executing code -# Option 1: Use default visualizer (enabled by default) -conversation = Conversation(agent=agent, workspace=workspace) +- Execute any valid Linux `bash` command +- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. -# Option 2: Disable visualization -conversation = Conversation(agent=agent, workspace=workspace, visualizer=None) +![image](https://github.com/OpenHands/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) + +### Demo + +https://github.com/OpenHands/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac + +_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_. + +### REST API (V1) +Source: https://docs.openhands.dev/openhands/usage/api/v1.md + + + OpenHands is in a transition period: legacy (V0) endpoints still exist alongside + the new /api/v1 endpoints. -# Option 3: Pass a visualizer class (will be instantiated automatically) -conversation = Conversation(agent=agent, workspace=workspace, visualizer=DefaultConversationVisualizer) + If you need the legacy OpenAPI reference, see the Legacy (V0) section in the Web tab. + -# Option 4: Pass a configured visualizer instance -custom_viz = DefaultConversationVisualizer( - name="MyAgent", - highlight_regex={r"^Reasoning:": "bold cyan"} -) -conversation = Conversation(agent=agent, workspace=workspace, visualizer=custom_viz) +## Overview -# Option 5: Use custom visualizer class -class MyVisualizer(ConversationVisualizerBase): - def on_event(self, event): - print(f"Event: {event}") +OpenHands V1 REST endpoints are mounted under: -conversation = Conversation(agent=agent, workspace=workspace, visualizer=MyVisualizer()) -``` +- /api/v1 -## Customizing the Default Visualizer +These endpoints back the current Web UI and are intended for newer integrations. -`DefaultConversationVisualizer` uses Rich panels and supports customization through configuration: +## Key resources -```python icon="python" focus={3-14, 19} -from openhands.sdk.conversation import DefaultConversationVisualizer +The V1 API is organized around a few core concepts: -# Configure highlighting patterns using regex -custom_visualizer = DefaultConversationVisualizer( - name="MyAgent", # Prefix panel titles with agent name - highlight_regex={ - r"^Reasoning:": "bold cyan", # Lines starting with "Reasoning:" - r"^Thought:": "bold green", # Lines starting with "Thought:" - r"^Action:": "bold yellow", # Lines starting with "Action:" - r"\[ERROR\]": "bold red", # Error markers anywhere - r"\*\*(.*?)\*\*": "bold", # Markdown bold **text** - }, - skip_user_messages=False, # Show user messages -) +- **App conversations**: create/list conversations and access conversation metadata. + - POST /api/v1/app-conversations + - GET /api/v1/app-conversations -conversation = Conversation( - agent=agent, - workspace=workspace, - visualizer=custom_visualizer -) -``` +- **Sandboxes**: list/start/pause/resume the execution environments that power conversations. + - GET /api/v1/sandboxes/search + - POST /api/v1/sandboxes + - POST /api/v1/sandboxes/{id}/pause + - POST /api/v1/sandboxes/{id}/resume -**When to use**: Perfect for customizing colors and highlighting without changing the panel-based layout. +- **Sandbox specs**: list the available sandbox “templates” (e.g., Docker image presets). + - GET /api/v1/sandbox-specs/search -## Creating Custom Visualizers +### Backend Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/backend.md -For complete control over visualization, subclass `ConversationVisualizerBase`: +This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents. -```python icon="python" focus={4, 11, 28} -from openhands.sdk.conversation import ConversationVisualizerBase -from openhands.sdk.event import ActionEvent, ObservationEvent, AgentErrorEvent, Event +# System overview -class MinimalVisualizer(ConversationVisualizerBase): - """A minimal visualizer that prints raw event information.""" - - def __init__(self, name: str | None = None): - super().__init__(name=name) - self.step_count = 0 - - def on_event(self, event: Event) -> None: - """Handle each event.""" - if isinstance(event, ActionEvent): - self.step_count += 1 - tool_name = event.tool_name or "unknown" - print(f"Step {self.step_count}: {tool_name}") - - elif isinstance(event, ObservationEvent): - print(f" → Result received") - - elif isinstance(event, AgentErrorEvent): - print(f"❌ Error: {event.error}") +```mermaid +flowchart LR + U["User"] --> FE["Frontend (SPA)"] + FE -- "HTTP/WS" --> BE["OpenHands Backend"] + BE --> ES["EventStream"] + BE --> ST["Storage"] + BE --> RT["Runtime Interface"] + BE --> LLM["LLM Providers"] -# Use your custom visualizer -conversation = Conversation( - agent=agent, - workspace=workspace, - visualizer=MinimalVisualizer(name="Agent") -) + subgraph Runtime + direction TB + RT --> DRT["Docker Runtime"] + RT --> LRT["Local Runtime"] + RT --> RRT["Remote Runtime"] + DRT --> AES["Action Execution Server"] + LRT --> AES + RRT --> AES + AES --> Bash["Bash Session"] + AES --> Jupyter["Jupyter Plugin"] + AES --> Browser["BrowserEnv"] + end ``` -### Key Methods - -**`__init__(self, name: str | None = None)`** -- Initialize your visualizer with optional configuration -- `name` parameter is available from the base class for agent identification -- Call `super().__init__(name=name)` to initialize the base class +This Overview is simplified to show the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below. -**`initialize(self, state: ConversationStateProtocol)`** -- Called automatically by `Conversation` after state is created -- Provides access to conversation state and statistics via `self._state` -- Override if you need custom initialization, but call `super().initialize(state)` +# Backend Architecture -**`on_event(self, event: Event)`** *(required)* -- Called for each conversation event -- Implement your visualization logic here -- Access conversation stats via `self.conversation_stats` property -**When to use**: When you need a completely different output format, custom state tracking, or integration with external systems. +```mermaid +classDiagram + class Agent { + <> + +sandbox_plugins: list[PluginRequirement] + } + class CodeActAgent { + +tools + } + Agent <|-- CodeActAgent -## Ready-to-run Example + class EventStream + class Observation + class Action + Action --> Observation + Agent --> EventStream - -This example is available on GitHub: [examples/01_standalone_sdk/26_custom_visualizer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/26_custom_visualizer.py) - + class Runtime { + +connect() + +send_action_for_execution() + } + class ActionExecutionClient { + +_send_action_server_request() + } + class DockerRuntime + class LocalRuntime + class RemoteRuntime + Runtime <|-- ActionExecutionClient + ActionExecutionClient <|-- DockerRuntime + ActionExecutionClient <|-- LocalRuntime + ActionExecutionClient <|-- RemoteRuntime -```python icon="python" expandable examples/01_standalone_sdk/26_custom_visualizer.py -"""Custom Visualizer Example + class ActionExecutionServer { + +/execute_action + +/alive + } + class BashSession + class JupyterPlugin + class BrowserEnv + ActionExecutionServer --> BashSession + ActionExecutionServer --> JupyterPlugin + ActionExecutionServer --> BrowserEnv -This example demonstrates how to create and use a custom visualizer by subclassing -ConversationVisualizer. This approach provides: -- Clean, testable code with class-based state management -- Direct configuration (just pass the visualizer instance to visualizer parameter) -- Reusable visualizer that can be shared across conversations + Agent --> Runtime + Runtime ..> ActionExecutionServer : REST +``` -This demonstrates how you can pass a ConversationVisualizer instance directly -to the visualizer parameter for clean, reusable visualization logic. -""" +
+ Updating this Diagram +
+ We maintain architecture diagrams inline with Mermaid in this MDX. -import logging -import os + Guidance: + - Edit the Mermaid blocks directly (flowchart/classDiagram). + - Quote labels and edge text for GitHub preview compatibility. + - Keep relationships concise and reflect stable abstractions (agents, runtime client/server, plugins). + - Verify accuracy against code: + - openhands/runtime/impl/action_execution/action_execution_client.py + - openhands/runtime/impl/docker/docker_runtime.py + - openhands/runtime/impl/local/local_runtime.py + - openhands/runtime/action_execution_server.py + - openhands/runtime/plugins/* + - Build docs locally or view on GitHub to confirm diagrams render. -from pydantic import SecretStr +
+
-from openhands.sdk import LLM, Conversation -from openhands.sdk.conversation.visualizer import ConversationVisualizerBase -from openhands.sdk.event import ( - Event, -) -from openhands.tools.preset.default import get_default_agent +### Runtime Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/runtime.md +The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action. +It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system. -class MinimalVisualizer(ConversationVisualizerBase): - """A minimal visualizer that print the raw events as they occur.""" +## Why do we need a sandboxed runtime? - def on_event(self, event: Event) -> None: - """Handle events for minimal progress visualization.""" - print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") +OpenHands needs to execute arbitrary code in a secure, isolated environment for several reasons: +1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources +2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues +3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system +4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system +5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - usage_id="agent", -) -agent = get_default_agent(llm=llm, cli_mode=True) +## How does the Runtime work? -# ============================================================================ -# Configure Visualization -# ============================================================================ -# Set logging level to reduce verbosity -logging.getLogger().setLevel(logging.WARNING) +The OpenHands Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works: -# Start a conversation with custom visualizer -cwd = os.getcwd() -conversation = Conversation( - agent=agent, - workspace=cwd, - visualizer=MinimalVisualizer(), -) +```mermaid +graph TD + A[User-provided Custom Docker Image] --> B[OpenHands Backend] + B -->|Builds| C[OH Runtime Image] + C -->|Launches| D[Action Executor] + D -->|Initializes| E[Browser] + D -->|Initializes| F[Bash Shell] + D -->|Initializes| G[Plugins] + G -->|Initializes| L[Jupyter Server] -# Send a message and let the agent run -print("Sending task to agent...") -conversation.send_message("Write 3 facts about the current project into FACTS.txt.") -conversation.run() -print("Task completed!") + B -->|Spawn| H[Agent] + B -->|Spawn| I[EventStream] + I <--->|Execute Action to + Get Observation + via REST API + | D -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost:.4f}") + H -->|Generate Action| I + I -->|Obtain Observation| H + + subgraph "Docker Container" + D + E + F + G + L + end ``` - +1. User Input: The user provides a custom base Docker image +2. Image Building: OpenHands builds a new Docker image (the "OH runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client" +3. Container Launch: When OpenHands starts, it launches a Docker container using the OH runtime image +4. Action Execution Server Initialization: The action execution server initializes an `ActionExecutor` inside the container, setting up necessary components like a bash shell and loading any specified plugins +5. Communication: The OpenHands backend (client: `openhands/runtime/impl/action_execution/action_execution_client.py`; runtimes: `openhands/runtime/impl/docker/docker_runtime.py`, `openhands/runtime/impl/local/local_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations +6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations +7. Observation Return: The action execution server sends execution results back to the OpenHands backend as observations -## Next Steps +The role of the client: -Now that you understand custom visualizers, explore these related topics: +- It acts as an intermediary between the OpenHands backend and the sandboxed environment +- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container +- It manages the state of the sandboxed environment, including the current working directory and loaded plugins +- It formats and returns observations to the backend, ensuring a consistent interface for processing results -- **[Events](/sdk/arch/events)** - Learn more about different event types -- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data -- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic +## How OpenHands builds and maintains OH Runtime images -### Pause and Resume -Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md +OpenHands' approach to building and managing runtime images ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Check out the [relevant code](https://github.com/OpenHands/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details. -> A ready-to-run example is available [here](#ready-to-run-example)! +### Image Tagging System -### Pausing Execution +OpenHands uses a three-tag system for its runtime images to balance reproducibility with flexibility. +The tags are: -Pause the agent from another thread or after a delay using `conversation.pause()`, and -Resume the paused conversation after performing operations by calling `conversation.run()` again. +- **Versioned Tag**: `oh_v{openhands_version}_{base_image}` (e.g.: `oh_v0.9.9_nikolaik_s_python-nodejs_t_python3.12-nodejs22`) +- **Lock Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}` (e.g.: `oh_v0.9.9_1234567890abcdef`) +- **Source Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}_{16_digit_source_hash}` + (e.g.: `oh_v0.9.9_1234567890abcdef_1234567890abcdef`) -```python icon="python" focus={9, 15} wrap -import time -thread = threading.Thread(target=conversation.run) -thread.start() +#### Source Tag - Most Specific -print("Letting agent work for 5 seconds...") -time.sleep(5) +This is the first 16 digits of the MD5 of the directory hash for the source directory. This gives a hash +for only the openhands source -print("Pausing the agent...") -conversation.pause() +#### Lock Tag -print("Waiting for 5 seconds...") -time.sleep(5) +This hash is built from the first 16 digits of the MD5 of: -print("Resuming the execution...") -conversation.run() -``` +- The name of the base image upon which the image was built (e.g.: `nikolaik/python-nodejs:python3.12-nodejs22`) +- The content of the `pyproject.toml` included in the image. +- The content of the `poetry.lock` included in the image. -## Ready-to-run Example +This effectively gives a hash for the dependencies of Openhands independent of the source code. - -This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) - +#### Versioned Tag - Most Generic -Pause agent execution mid-task by calling `conversation.pause()`: +This tag is a concatenation of openhands version and the base image name (transformed to fit in tag standard). -```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py -import os -import threading -import time +#### Build Process -from pydantic import SecretStr +When generating an image... -from openhands.sdk import ( - LLM, - Agent, - Conversation, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +- **No re-build**: OpenHands first checks whether an image with the same **most specific source tag** exists. If there is such an image, + no build is performed - the existing image is used. +- **Fastest re-build**: OpenHands next checks whether an image with the **generic lock tag** exists. If there is such an image, + OpenHands builds a new image based upon it, bypassing all installation steps (like `poetry install` and + `apt-get`) except a final operation to copy the current source code. The new image is tagged with a + **source** tag only. +- **Ok-ish re-build**: If neither a **source** nor **lock** tag exists, an image will be built based upon the **versioned** tag image. + In versioned tag image, most dependencies should already been installed hence saving time. +- **Slowest re-build**: If all of the three tags don't exists, a brand new image is built based upon the base + image (Which is a slower operation). This new image is tagged with all the **source**, **lock**, and **versioned** tags. +This tagging approach allows OpenHands to efficiently manage both development and production environments. -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +1. Identical source code and Dockerfile always produce the same image (via hash-based tags) +2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images) +3. The **lock** tag (e.g., `runtime:oh_v0.9.3_1234567890abcdef`) always points to the latest build for a particular base image, dependency, and OpenHands version combination -# Tools -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +## Volume mounts: named volumes and overlay -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent, workspace=os.getcwd()) +OpenHands supports both bind mounts and Docker named volumes in SandboxConfig.volumes: -print("=" * 60) -print("Pause and Continue Example") -print("=" * 60) -print() +- Bind mount: "/abs/host/path:/container/path[:mode]" +- Named volume: "volume:``:/container/path[:mode]" or any non-absolute host spec treated as a named volume -# Phase 1: Start a long-running task -print("Phase 1: Starting agent with a task...") -conversation.send_message( - "Create a file called countdown.txt and write numbers from 100 down to 1, " - "one number per line. After you finish, summarize what you did." -) +Overlay mode (copy-on-write layer) is supported for bind mounts by appending ":overlay" to the mode (e.g., ":ro,overlay"). +To enable overlay COW, set SANDBOX_VOLUME_OVERLAYS to a writable host directory; per-container upper/work dirs are created under it. If SANDBOX_VOLUME_OVERLAYS is unset, overlay mounts are skipped. -print(f"Initial status: {conversation.state.execution_status}") -print() +Implementation references: +- openhands/runtime/impl/docker/docker_runtime.py (named volumes in _build_docker_run_args; overlay mounts in _process_overlay_mounts) +- openhands/core/config/sandbox_config.py (volumes field) -# Start the agent in a background thread -thread = threading.Thread(target=conversation.run) -thread.start() -# Let the agent work for a few seconds -print("Letting agent work for 2 seconds...") -time.sleep(2) +## Runtime Plugin System -# Phase 2: Pause the agent -print() -print("Phase 2: Pausing the agent...") -conversation.pause() +The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the action execution server starts up inside the runtime. -# Wait for the thread to finish (it will stop when paused) -thread.join() +## Ports and URLs -print(f"Agent status after pause: {conversation.state.execution_status}") -print() +- Host port allocation uses file-locked ranges for stability and concurrency: + - Main runtime port: find_available_port_with_lock on configured range + - VSCode port: SandboxConfig.sandbox.vscode_port if provided, else find_available_port_with_lock in VSCODE_PORT_RANGE + - App ports: two additional ranges for plugin/web apps +- DOCKER_HOST_ADDR (if set) adjusts how URLs are formed for LocalRuntime/Docker environments. +- VSCode URL is exposed with a connection token from the action execution server endpoint /vscode/connection_token and rendered as: + - Docker/Local: `http://localhost:{port}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` + - RemoteRuntime: `scheme://vscode-{host}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` -# Phase 3: Send a new message while paused -print("Phase 3: Sending a new message while agent is paused...") -conversation.send_message( - "Actually, stop working on countdown.txt. Instead, create a file called " - "hello.txt with just the text 'Hello, World!' in it." -) -print() +References: +- openhands/runtime/impl/docker/docker_runtime.py (port ranges, locking, DOCKER_HOST_ADDR, vscode_url) +- openhands/runtime/impl/local/local_runtime.py (vscode_url factory) +- openhands/runtime/impl/remote/remote_runtime.py (vscode_url mapping) +- openhands/runtime/action_execution_server.py (/vscode/connection_token) -# Phase 4: Resume the agent with .run() -print("Phase 4: Resuming agent with .run()...") -print(f"Status before resume: {conversation.state.execution_status}") -# Resume execution -conversation.run() +Examples: +- Jupyter: openhands/runtime/plugins/jupyter/__init__.py (JupyterPlugin, Kernel Gateway) +- VS Code: openhands/runtime/plugins/vscode/* (VSCodePlugin, exposes tokenized URL) +- Agent Skills: openhands/runtime/plugins/agent_skills/* -print(f"Final status: {conversation.state.execution_status}") +Key aspects of the plugin system: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class +2. Plugin Registration: Available plugins are registered in `openhands/runtime/plugins/__init__.py` via `ALL_PLUGINS` +3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime +4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions +5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping - +### Repository Customization +Source: https://docs.openhands.dev/openhands/usage/customization/repository.md + +## Skills (formerly Microagents) +Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands +should function. See [Skills Overview](/overview/skills) for more information. -## Next Steps +## Setup Script +You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. +This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. -- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state -- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents +For example: +```bash +#!/bin/bash +export MY_ENV_VAR="my value" +sudo apt-get update +sudo apt-get install -y lsof +cd frontend && npm install ; cd .. +``` -### Persistence -Source: https://docs.openhands.dev/sdk/guides/convo-persistence.md +## Pre-commit Script +You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. +This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. + +For example: +```bash +#!/bin/bash +# Run linting checks +cd frontend && npm run lint +if [ $? -ne 0 ]; then + echo "Frontend linting failed. Please fix the issues before committing." + exit 1 +fi -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +# Run tests +cd backend && pytest tests/unit +if [ $? -ne 0 ]; then + echo "Backend tests failed. Please fix the issues before committing." + exit 1 +fi -> A ready-to-run example is available [here](#ready-to-run-example)! +exit 0 +``` -## How to use Persistence +### Debugging +Source: https://docs.openhands.dev/openhands/usage/developers/debugging.md -Save conversation state to disk and restore it later for long-running or multi-session workflows. +The following is intended as a primer on debugging OpenHands for Development purposes. -### Saving State +## Server / VSCode -Create a conversation with a unique ID to enable persistence: +The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: -```python focus={3-4,10-11} icon="python" wrap -import uuid +``` +{ + "version": "0.2.0", + "configurations": [ + { + "name": "OpenHands CLI", + "type": "debugpy", + "request": "launch", + "module": "openhands.cli.main", + "justMyCode": false + }, + { + "name": "OpenHands WebApp", + "type": "debugpy", + "request": "launch", + "module": "uvicorn", + "args": [ + "openhands.server.listen:app", + "--reload", + "--reload-exclude", + "${workspaceFolder}/workspace", + "--port", + "3000" + ], + "justMyCode": false + } + ] +} +``` -conversation_id = uuid.uuid4() -persistence_dir = "./.conversations" +More specific debugging configurations which include more parameters may be specified: -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) -conversation.send_message("Start long task") -conversation.run() # State automatically saved +``` + ... + { + "name": "Debug CodeAct", + "type": "debugpy", + "request": "launch", + "module": "openhands.core.main", + "args": [ + "-t", + "Ask me what your task is.", + "-d", + "${workspaceFolder}/workspace", + "-c", + "CodeActAgent", + "-l", + "llm.o1", + "-n", + "prompts" + ], + "justMyCode": false + } + ... ``` -### Restoring State - -Restore a conversation using the same ID and persistence directory: +Values in the snippet above can be updated such that: -```python focus={9-10} icon="python" -# Later, in a different session -del conversation + * *t*: the task + * *d*: the openhands workspace directory + * *c*: the agent + * *l*: the LLM config (pre-defined in config.toml) + * *n*: session name (e.g. eventstream name) -# Deserialize the conversation -print("Deserializing conversation...") -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) +### Development Overview +Source: https://docs.openhands.dev/openhands/usage/developers/development-overview.md -conversation.send_message("Continue task") -conversation.run() # Continues from saved state -``` +## Core Documentation -## What Gets Persisted +### Project Fundamentals +- **Main Project Overview** (`/README.md`) + The primary entry point for understanding OpenHands, including features and basic setup instructions. -The conversation state includes information that allows seamless restoration: +- **Development Guide** (`/Development.md`) + Guide for developers working on OpenHands, including setup, requirements, and development workflows. -- **Message History**: Complete event log including user messages, agent responses, and system events -- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters -- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings -- **Tool Outputs**: Results from bash commands, file operations, and other tool executions -- **Statistics**: LLM usage metrics like token counts and API calls -- **Workspace Context**: Working directory and file system state -- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation -- **Secrets**: Managed credentials and API keys -- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) +- **Contributing Guidelines** (`/CONTRIBUTING.md`) + Essential information for contributors, covering code style, PR process, and contribution workflows. - - For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. - +### Component Documentation -## Persistence Directory Structure +#### Frontend +- **Frontend Application** (`/frontend/README.md`) + Complete guide for setting up and developing the React-based frontend application. -When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each -conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` -(unless you specify a custom path). +#### Backend +- **Backend Implementation** (`/openhands/README.md`) + Detailed documentation of the Python backend implementation and architecture. -**Directory structure:** - - - - - - - - - - - - - - - - - - - - +- **Server Documentation** (`/openhands/server/README.md`) + Server implementation details, API documentation, and service architecture. -Each conversation directory contains: -- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata -- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) +- **Runtime Environment** (`/openhands/runtime/README.md`) + Documentation covering the runtime environment, execution model, and runtime configurations. -The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. +#### Infrastructure +- **Container Documentation** (`/containers/README.md`) + Information about Docker containers, deployment strategies, and container management. -## Ready-to-run Example +### Testing and Evaluation +- **Unit Testing Guide** (`/tests/unit/README.md`) + Instructions for writing, running, and maintaining unit tests. - -This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) - +- **Evaluation Framework** (`/evaluation/README.md`) + Documentation for the evaluation framework, benchmarks, and performance testing. -```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py -import os -import uuid +### Advanced Features +- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) + Detailed information about the skills architecture, implementation, and usage. -from pydantic import SecretStr +### Documentation Standards +- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) + Standards and guidelines for writing and maintaining project documentation. -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +## Getting Started with Development +If you're new to developing with OpenHands, we recommend following this sequence: -logger = get_logger(__name__) +1. Start with the main `README.md` to understand the project's purpose and features +2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute +3. Follow the setup instructions in `Development.md` +4. Dive into specific component documentation based on your area of interest: + - Frontend developers should focus on `/frontend/README.md` + - Backend developers should start with `/openhands/README.md` + - Infrastructure work should begin with `/containers/README.md` -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +## Documentation Updates -# Tools -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +When making changes to the codebase, please ensure that: +1. Relevant documentation is updated to reflect your changes +2. New features are documented in the appropriate README files +3. Any API changes are reflected in the server documentation +4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` -# Add MCP Tools -mcp_config = { - "mcpServers": { - "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, - } -} -# Agent -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) +### Evaluation Harness +Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md -llm_messages = [] # collect raw LLM messages +This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. +## Setup Environment and LLM Configuration -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. +OpenHands in development mode uses `config.toml` to keep track of most configurations. +Here's an example configuration file you can use to define and use multiple LLMs: -conversation_id = uuid.uuid4() -persistence_dir = "./.conversations" +```toml +[llm] +# IMPORTANT: add your API key here, and set the model to the one you want to evaluate +model = "claude-3-5-sonnet-20241022" +api_key = "sk-XXX" -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() +[llm.eval_gpt4_1106_preview_llm] +model = "gpt-4-1106-preview" +api_key = "XXX" +temperature = 0.0 -conversation.send_message("Great! Now delete that file.") -conversation.run() +[llm.eval_some_openai_compatible_model_llm] +model = "openai/MODEL_NAME" +base_url = "https://OPENAI_COMPATIBLE_URL/v1" +api_key = "XXX" +temperature = 0.0 +``` -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") -# Conversation persistence -print("Serializing conversation...") +## How to use OpenHands in the command line -del conversation +OpenHands can be run from the command line using the following format: -# Deserialize the conversation -print("Deserializing conversation...") -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, - persistence_dir=persistence_dir, - conversation_id=conversation_id, -) +```bash +poetry run python ./openhands/core/main.py \ + -i \ + -t "" \ + -c \ + -l +``` -print("Sending message to deserialized conversation...") -conversation.send_message("Hey what did you create? Return an agent finish action") -conversation.run() +For example: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +```bash +poetry run python ./openhands/core/main.py \ + -i 10 \ + -t "Write me a bash script that prints hello world." \ + -c CodeActAgent \ + -l llm ``` +This command runs OpenHands with: +- A maximum of 10 iterations +- The specified task description +- Using the CodeActAgent +- With the LLM configuration defined in the `llm` section of your `config.toml` file - +## How does OpenHands work -## Reading serialized events +The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: -Convert persisted events into LLM-ready messages for reuse or analysis. +1. Parse command-line arguments and load the configuration +2. Create a runtime environment using `create_runtime()` +3. Initialize the specified agent +4. Run the controller using `run_controller()`, which: + - Attaches the runtime to the agent + - Executes the agent's task + - Returns a final state when complete - -This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) - +The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. -```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py -"""Load persisted events and convert them into LLM-ready messages.""" -import json -import os -import uuid -from pathlib import Path +## Easiest way to get started: Exploring Existing Benchmarks -from pydantic import SecretStr +We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. +To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. -conversation_id = uuid.uuid4() -persistence_root = Path(".conversations") -log_dir = ( - persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex -) +## How to create an evaluation workflow -os.environ.setdefault("LOG_JSON", "true") -os.environ.setdefault("LOG_TO_FILE", "true") -os.environ.setdefault("LOG_DIR", str(log_dir)) -os.environ.setdefault("LOG_LEVEL", "INFO") -from openhands.sdk import ( # noqa: E402 - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - Tool, -) -from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 -from openhands.tools.terminal import TerminalTool # noqa: E402 +To create an evaluation workflow for your benchmark, follow these steps: + +1. Import relevant OpenHands utilities: + ```python + import openhands.agenthub + from evaluation.utils.shared import ( + EvalMetadata, + EvalOutput, + make_metadata, + prepare_dataset, + reset_logger_for_multiprocessing, + run_evaluation, + ) + from openhands.controller.state.state import State + from openhands.core.config import ( + AppConfig, + SandboxConfig, + get_llm_config_arg, + parse_arguments, + ) + from openhands.core.logger import openhands_logger as logger + from openhands.core.main import create_runtime, run_controller + from openhands.events.action import CmdRunAction + from openhands.events.observation import CmdOutputObservation, ErrorObservation + from openhands.runtime.runtime import Runtime + ``` +2. Create a configuration: + ```python + def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: + config = AppConfig( + default_agent=metadata.agent_class, + runtime='docker', + max_iterations=metadata.max_iterations, + sandbox=SandboxConfig( + base_container_image='your_container_image', + enable_auto_lint=True, + timeout=300, + ), + ) + config.set_llm_config(metadata.llm_config) + return config + ``` -setup_logging(log_to_file=True, log_dir=str(log_dir)) -logger = get_logger(__name__) +3. Initialize the runtime and set up the evaluation environment: + ```python + def initialize_runtime(runtime: Runtime, instance: pd.Series): + # Set up your evaluation environment here + # For example, setting environment variables, preparing files, etc. + pass + ``` -api_key = os.getenv("LLM_API_KEY") -if not api_key: - raise RuntimeError("LLM_API_KEY environment variable is not set.") +4. Create a function to process each instance: + ```python + from openhands.utils.async_utils import call_async_from_sync + def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: + config = get_config(instance, metadata) + runtime = create_runtime(config) + call_async_from_sync(runtime.connect) + initialize_runtime(runtime, instance) -llm = LLM( - usage_id="agent", - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) + instruction = get_instruction(instance, metadata) -agent = Agent( - llm=llm, - tools=[Tool(name=TerminalTool.name)], -) + state = run_controller( + config=config, + task_str=instruction, + runtime=runtime, + fake_user_response_fn=your_user_response_function, + ) -###### -# Create a conversation that persists its events -###### + # Evaluate the agent's actions + evaluation_result = await evaluate_agent_actions(runtime, instance) -conversation = Conversation( - agent=agent, - workspace=os.getcwd(), - persistence_dir=str(persistence_root), - conversation_id=conversation_id, -) + return EvalOutput( + instance_id=instance.instance_id, + instruction=instruction, + test_result=evaluation_result, + metadata=metadata, + history=compatibility_for_eval_history_pairs(state.history), + metrics=state.metrics.get() if state.metrics else None, + error=state.last_error if state and state.last_error else None, + ) + ``` -conversation.send_message( - "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " - "Reply with a short confirmation once done." -) -conversation.run() +5. Run the evaluation: + ```python + metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) + output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') + instances = prepare_dataset(your_dataset, output_file, eval_n_limit) -conversation.send_message( - "Without using any tools, summarize in one sentence what you did." -) -conversation.run() + await run_evaluation( + instances, + metadata, + output_file, + num_workers, + process_instance + ) + ``` -assert conversation.state.persistence_dir is not None -persistence_dir = Path(conversation.state.persistence_dir) -event_dir = persistence_dir / "events" +This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. -event_paths = sorted(event_dir.glob("event-*.json")) +Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. -if not event_paths: - raise RuntimeError("No event files found. Was persistence enabled?") +By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. -###### -# Read from serialized events -###### +## Understanding the `user_response_fn` -events = [Event.model_validate_json(path.read_text()) for path in event_paths] +The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. -convertible_events = [ - event for event in events if isinstance(event, LLMConvertibleEvent) -] -llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) -if llm.uses_responses_api(): - logger.info("Formatting messages for the OpenAI Responses API.") - instructions, input_items = llm.format_messages_for_responses(llm_messages) - logger.info("Responses instructions:\n%s", instructions) - logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) -else: - logger.info("Formatting messages for the OpenAI Chat Completions API.") - chat_messages = llm.format_messages_for_llm(llm_messages) - logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) +### Workflow and Interaction -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +The correct workflow for handling actions and the `user_response_fn` is as follows: - +1. Agent receives a task and starts processing +2. Agent emits an Action +3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): + - The Runtime processes the Action + - Runtime returns an Observation +4. If the Action is not executable (typically a MessageAction): + - The `user_response_fn` is called + - It returns a simulated user response +5. The agent receives either the Observation or the simulated response +6. Steps 2-5 repeat until the task is completed or max iterations are reached +Here's a more accurate visual representation: -## How State Persistence Works +``` + [Agent] + | + v + [Emit Action] + | + v + [Is Action Executable?] + / \ + Yes No + | | + v v + [Runtime] [user_response_fn] + | | + v v + [Return Observation] [Simulated Response] + \ / + \ / + v v + [Agent receives feedback] + | + v + [Continue or Complete Task] +``` -The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. +In this workflow: -### Auto-Save Mechanism +- Executable actions (like running commands or executing code) are handled directly by the Runtime +- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` +- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` -When you modify any public field on `ConversationState`, the SDK automatically: +This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. -1. Detects the field change via a custom `__setattr__` implementation -2. Serializes the entire base state to `base_state.json` -3. Triggers any registered state change callbacks +### Example Implementation -This happens transparently—you don't need to call any save methods manually. +Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: ```python -# These changes are automatically persisted: -conversation.state.execution_status = ConversationExecutionStatus.RUNNING -conversation.state.max_iterations = 100 +def codeact_user_response(state: State | None) -> str: + msg = ( + 'Please continue working on the task on whatever approach you think is suitable.\n' + 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' + 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' + ) + + if state and state.history: + # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up + user_msgs = [ + event + for event in state.history + if isinstance(event, MessageAction) and event.source == 'user' + ] + if len(user_msgs) >= 2: + # let the agent know that it can give up when it has tried 3 times + return ( + msg + + 'If you want to give up, run: exit .\n' + ) + return msg ``` -### Events vs Base State +This function does the following: -The persistence system separates data into two categories: +1. Provides a standard message encouraging the agent to continue working +2. Checks how many times the agent has attempted to communicate with the user +3. If the agent has made multiple attempts, it provides an option to give up -| Category | Storage | Contents | -|----------|---------|----------| -| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | -| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | +By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. -Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: -- **Fast event appends**: No need to rewrite the entire history -- **Atomic state updates**: Base state is always consistent -- **Efficient restoration**: Events can be loaded lazily +### WebSocket Connection +Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md +This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. +## Overview -## Next Steps +OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations +1. Receive real-time events from the agent +2. Send user actions to the agent +3. Maintain a persistent connection for ongoing conversations -### Send Message While Running -Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md +## Connecting to the WebSocket -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Connection Parameters + +When connecting to the WebSocket, you need to provide the following query parameters: +- `conversation_id`: The ID of the conversation you want to join +- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) +- `providers_set`: (Optional) A comma-separated list of provider types - -This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) - +### Connection Example -Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: +Here's a basic example of connecting to the WebSocket using JavaScript: -```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py -""" -Example demonstrating that user messages can be sent and processed while -an agent is busy. +```javascript +import { io } from "socket.io-client"; -This example demonstrates a key capability of the OpenHands agent system: the ability -to receive and process new user messages even while the agent is actively working on -a previous task. This is made possible by the agent's event-driven architecture. +const socket = io("http://localhost:3000", { + transports: ["websocket"], + query: { + conversation_id: "your-conversation-id", + latest_event_id: -1, + providers_set: "github,gitlab" // Optional + } +}); -Demonstration Flow: -1. Send initial message asking agent to: - - Write "Message 1 sent at [time], written at [CURRENT_TIME]" - - Wait 3 seconds - - Write "Message 2 sent at [time], written at [CURRENT_TIME]" - [time] is the time the message was sent to the agent - [CURRENT_TIME] is the time the agent writes the line -2. Start agent processing in a background thread -3. While agent is busy (during the 3-second delay), send a second message asking to add: - - "Message 3 sent at [time], written at [CURRENT_TIME]" -4. Verify that all three lines are processed and included in the final document +socket.on("connect", () => { + console.log("Connected to OpenHands WebSocket"); +}); -Expected Evidence: -The final document will contain three lines with dual timestamps: -- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) -- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) -- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) +socket.on("oh_event", (event) => { + console.log("Received event:", event); +}); -The timestamps will show that Message 3 was sent while the agent was running, -but was still successfully processed and written to the document. +socket.on("connect_error", (error) => { + console.error("Connection error:", error); +}); -This proves that: -- The second user message was sent while the agent was processing the first task -- The agent successfully received and processed the second message -- The agent's event system allows for real-time message integration during processing +socket.on("disconnect", (reason) => { + console.log("Disconnected:", reason); +}); +``` -Key Components Demonstrated: -- Conversation.send_message(): Adds messages to events list immediately -- Agent.step(): Processes all events including newly added messages -- Threading: Allows message sending while agent is actively processing -""" # noqa +## Sending Actions to the Agent -import os -import threading -import time -from datetime import datetime +To send an action to the agent, use the `oh_user_action` event: -from pydantic import SecretStr +```javascript +// Send a user message to the agent +socket.emit("oh_user_action", { + type: "message", + source: "user", + message: "Hello, can you help me with my project?" +}); +``` -from openhands.sdk import ( - LLM, - Agent, - Conversation, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +## Receiving Events from the Agent +The server emits events using the `oh_event` event type. Here are some common event types you might receive: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +- User messages (`source: "user", type: "message"`) +- Agent messages (`source: "agent", type: "message"`) +- File edits (`action: "edit"`) +- File writes (`action: "write"`) +- Command executions (`action: "run"`) -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +Example event handler: + +```javascript +socket.on("oh_event", (event) => { + if (event.source === "agent" && event.type === "message") { + console.log("Agent says:", event.message); + } else if (event.action === "run") { + console.log("Command executed:", event.args.command); + console.log("Result:", event.result); + } +}); +``` -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent) +## Using Websocat for Testing +[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. -def timestamp() -> str: - return datetime.now().strftime("%H:%M:%S") +### Installation +```bash +# On macOS +brew install websocat -print("=== Send Message While Processing Example ===") +# On Linux +curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat +chmod +x websocat +sudo mv websocat /usr/local/bin/ +``` -# Step 1: Send initial message -start_time = timestamp() -conversation.send_message( - f"Create a file called document.txt and write this first sentence: " - f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write the line. " - f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa -) +### Connecting to the WebSocket -# Step 2: Start agent processing in background -thread = threading.Thread(target=conversation.run) -thread.start() +```bash +# Connect to the WebSocket and print all received messages +echo "40{}" | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` -# Step 3: Wait then send second message while agent is processing -time.sleep(2) # Give agent time to start working +### Sending a Message -second_time = timestamp() +```bash +# Send a message to the agent +echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` -conversation.send_message( - f"Please also add this second sentence to document.txt: " - f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write this line." -) +### Complete Example with Websocat -# Wait for completion -thread.join() +Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: -# Verification -document_path = os.path.join(cwd, "document.txt") -if os.path.exists(document_path): - with open(document_path) as f: - content = f.read() +```bash +# Start a persistent connection +websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" - print("\nDocument contents:") - print("─────────────────────") - print(content) - print("─────────────────────") +# In another terminal, send a message +echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` - # Check if both messages were processed - if "Message 1" in content and "Message 2" in content: - print("\nSUCCESS: Agent processed both messages!") - print( - "This proves the agent received the second message while processing the first task." # noqa - ) - else: - print("\nWARNING: Agent may not have processed the second message") +## Event Structure - # Clean up - os.remove(document_path) -else: - print("WARNING: Document.txt was not created") +Events sent and received through the WebSocket follow a specific structure: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +```typescript +interface OpenHandsEvent { + id: string; // Unique event ID + source: string; // "user" or "agent" + timestamp: string; // ISO timestamp + message?: string; // For message events + type?: string; // Event type (e.g., "message") + action?: string; // Action type (e.g., "run", "edit", "write") + args?: any; // Action arguments + result?: any; // Action result +} ``` - +## Best Practices -### Sending Messages During Execution +1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. +2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. +3. **Error Handling**: Implement proper error handling for connection errors and failed actions. +4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. -As shown in the example above, use threading to send messages while the agent is running: +## Troubleshooting -```python icon="python" -# Start agent processing in background -thread = threading.Thread(target=conversation.run) -thread.start() +### Connection Issues -# Wait then send second message while agent is processing -time.sleep(2) # Give agent time to start working +- Verify that the OpenHands server is running and accessible +- Check that you're providing the correct conversation ID +- Ensure your WebSocket URL is correctly formatted -second_time = timestamp() +### Authentication Issues -conversation.send_message( - f"Please also add this second sentence to document.txt: " - f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " - f"Replace [CURRENT_TIME] with the actual current time when you write this line." -) +- Make sure you have the necessary authentication cookies if required +- Verify that you have permission to access the specified conversation -# Wait for completion -thread.join() -``` +### Event Handling Issues -The key steps are: -1. Start `conversation.run()` in a background thread -2. Send additional messages using `conversation.send_message()` while the agent is processing -3. Use `thread.join()` to wait for completion +- Check that you're correctly parsing the event data +- Verify that your event handlers are properly registered -The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. +### Environment Variables Reference +Source: https://docs.openhands.dev/openhands/usage/environment-variables.md -## Next Steps +This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. -- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations +## Environment Variable Naming Convention -### Critic (Experimental) -Source: https://docs.openhands.dev/sdk/guides/critic.md +OpenHands follows a consistent naming pattern for environment variables: - -**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. - +- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) +- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) +- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) +- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) +- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) -> A ready-to-run example is available [here](#ready-to-run-example)! +## Core Configuration Variables +These variables correspond to the `[core]` section in `config.toml`: -## What is a Critic? +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | +| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | +| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | +| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | +| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | +| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | +| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | +| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | +| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | +| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | +| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | +| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | +| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | +| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | +| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | +| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | +| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | -A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: +## LLM Configuration Variables -- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success -- **Real-time feedback**: Scores computed during agent execution, not just at completion -- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold +These variables correspond to the `[llm]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | +| `LLM_API_KEY` | string | `""` | API key for the LLM provider | +| `LLM_BASE_URL` | string | `""` | Custom API base URL | +| `LLM_API_VERSION` | string | `""` | API version to use | +| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | +| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | +| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | +| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | +| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | +| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | +| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | +| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | +| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | +| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | +| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | +| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | +| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | +| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | +| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | +| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | +| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | +| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | + +### AWS Configuration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | +| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | +| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | + +## Agent Configuration Variables + +These variables correspond to the `[agent]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | +| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | +| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | +| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | +| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | +| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | +| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | +| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | + +## Sandbox Configuration Variables + +These variables correspond to the `[sandbox]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | +| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | +| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | +| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | +| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | +| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | +| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | +| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | +| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | +| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | +| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | +| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | +| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | +| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | +| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | +| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | +| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | +| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | +| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | +| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | -You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. +### Sandbox Environment Variables +Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: - -This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. - +| Environment Variable | Description | +|---------------------|-------------| +| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | -## Quick Start +## Security Configuration Variables -When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. +These variables correspond to the `[security]` section in `config.toml`: -## Understanding Critic Results +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | +| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | +| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | -Critic evaluations produce scores and feedback: +## Debug and Logging Variables -- **`score`**: Float between 0.0 and 1.0 representing predicted success probability -- **`message`**: Optional feedback with detailed probabilities -- **`success`**: Boolean property (True if score >= 0.5) +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable general debug logging | +| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | +| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | +| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | -Results are automatically displayed in the conversation visualizer: +## Runtime-Specific Variables -![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) +### Docker Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | -### Accessing Results Programmatically +### Remote Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | +| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | -```python icon="python" focus={4-7} -from openhands.sdk import Event, ActionEvent, MessageEvent +### Local Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | +| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | +| `RUNTIME_ID` | string | `""` | Runtime identifier | +| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | -def callback(event: Event): - if isinstance(event, (ActionEvent, MessageEvent)): - if event.critic_result is not None: - print(f"Critic score: {event.critic_result.score:.3f}") - print(f"Success: {event.critic_result.success}") +## Integration Variables -conversation = Conversation(agent=agent, callbacks=[callback]) -``` +### GitHub Integration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | -## Iterative Refinement with a Critic +### Third-Party API Keys +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `OPENAI_API_KEY` | string | `""` | OpenAI API key | +| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | +| `GOOGLE_API_KEY` | string | `""` | Google API key | +| `AZURE_API_KEY` | string | `""` | Azure API key | +| `TAVILY_API_KEY` | string | `""` | Tavily search API key | -The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. +## Server Configuration Variables -### How It Works +These are primarily used when running OpenHands as a server: -1. Agent completes a task and calls `FinishAction` -2. Critic evaluates the result and produces a score -3. If score < `success_threshold`, a follow-up prompt is sent automatically -4. Agent continues working to address issues -5. Process repeats until score meets threshold or `max_iterations` is reached +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `FRONTEND_PORT` | integer | `3000` | Frontend server port | +| `BACKEND_PORT` | integer | `8000` | Backend server port | +| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | +| `BACKEND_HOST` | string | `"localhost"` | Backend host address | +| `WEB_HOST` | string | `"localhost"` | Web server host | +| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | -### Configuration +## Deprecated Variables -Use `IterativeRefinementConfig` to enable automatic retries: +These variables are deprecated and should be replaced: -```python icon="python" focus={1,4-7,12} -from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +| Environment Variable | Replacement | Description | +|---------------------|-------------|-------------| +| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | -# Configure iterative refinement -iterative_config = IterativeRefinementConfig( - success_threshold=0.7, # Retry if score < 70% - max_iterations=3, # Maximum retry attempts -) +## Usage Examples -# Attach to critic -critic = APIBasedCritic( - server_url="https://llm-proxy.eval.all-hands.dev/vllm", - api_key=api_key, - model_name="critic", - iterative_refinement=iterative_config, -) +### Basic Setup with OpenAI +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-openai-api-key" +export DEBUG=true ``` -### Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | -| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | +### Docker Deployment with Custom Volumes +```bash +export RUNTIME="docker" +export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" +export SANDBOX_TIMEOUT=300 +``` -### Custom Follow-up Prompts +### Remote Runtime Configuration +```bash +export RUNTIME="remote" +export SANDBOX_API_KEY="your-remote-api-key" +export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" +``` -By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: +### Security-Enhanced Setup +```bash +export SECURITY_CONFIRMATION_MODE=true +export SECURITY_SECURITY_ANALYZER="llm" +export DEBUG_RUNTIME=true +``` -```python icon="python" focus={4-12} -from openhands.sdk.critic.base import CriticBase, CriticResult +## Notes -class CustomCritic(APIBasedCritic): - def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: - score_percent = critic_result.score * 100 - return f""" -Your solution scored {score_percent:.1f}% (iteration {iteration}). +1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). -Please review your work carefully: -1. Check that all requirements are met -2. Verify tests pass -3. Fix any issues and try again -""" -``` +2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. -### Example Workflow +3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. -Here's what happens during iterative refinement: +4. **Precedence**: Environment variables take precedence over TOML configuration files. -``` -Iteration 1: - → Agent creates files, runs tests - → Agent calls FinishAction - → Critic evaluates: score = 0.45 (below 0.7 threshold) - → Follow-up prompt sent automatically +5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: + ```bash + docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands + ``` -Iteration 2: - → Agent reviews and fixes issues - → Agent calls FinishAction - → Critic evaluates: score = 0.72 (above threshold) - → ✅ Success! Conversation ends -``` +6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. -## Troubleshooting +### Good vs. Bad Instructions +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md -### Critic Evaluations Not Appearing +The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. -- Verify the critic is properly configured and passed to the Agent -- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) +## Concrete Examples of Good/Bad Prompts -### API Authentication Errors +### Bug Fixing Examples -- Verify `LLM_API_KEY` is set correctly -- Check that the API key has not expired +#### Bad Example -### Iterative Refinement Not Triggering +``` +Fix the bug in my code. +``` -- Ensure `iterative_refinement` config is attached to the critic -- Check that `success_threshold` is set appropriately (higher values trigger more retries) -- Verify the agent is using `FinishAction` to complete tasks +**Why it's bad:** +- No information about what the bug is +- No indication of where to look +- No description of expected vs. actual behavior +- OpenHands would have to guess what's wrong -## Ready-to-run Example +#### Good Example - -The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) - +``` +Fix the TypeError in src/api/users.py line 45. -This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. +Error message: +TypeError: 'NoneType' object has no attribute 'get' -```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py -"""Iterative Refinement with Critic Model Example. +Expected behavior: The get_user_preferences() function should return +default preferences when the user has no saved preferences. -This is EXPERIMENTAL. +Actual behavior: It crashes with the error above when user.preferences is None. -This example demonstrates how to use a critic model to shepherd an agent through -complex, multi-step tasks. The critic evaluates the agent's progress and provides -feedback that can trigger follow-up prompts when the agent hasn't completed the -task successfully. +The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. +``` -Key concepts demonstrated: -1. Setting up a critic with IterativeRefinementConfig for automatic retry -2. Conversation.run() automatically handles retries based on critic scores -3. Custom follow-up prompt generation via critic.get_followup_prompt() -4. Iterating until the task is completed successfully or max iterations reached +**Why it works:** +- Specific file and line number +- Exact error message +- Clear expected vs. actual behavior +- Suggested approach for the fix -For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured -using the same base_url with /vllm suffix and "critic" as the model name. -""" +### Feature Development Examples -import os -import re -import tempfile -from pathlib import Path +#### Bad Example -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig -from openhands.sdk.critic.base import CriticBase -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +``` +Add user authentication to my app. +``` +**Why it's bad:** +- Scope is too large and undefined +- No details about authentication requirements +- No mention of existing code or patterns +- Could mean many different things -# Configuration -# Higher threshold (70%) makes it more likely the agent needs multiple iterations, -# which better demonstrates how iterative refinement works. -# Adjust as needed to see different behaviors. -SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) -MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) +#### Good Example +``` +Add email/password login to our Express.js API. -def get_required_env(name: str) -> str: - value = os.getenv(name) - if value: - return value - raise ValueError( - f"Missing required environment variable: {name}. " - f"Set {name} before running this example." - ) +Requirements: +1. POST /api/auth/login endpoint +2. Accept email and password in request body +3. Validate against users in PostgreSQL database +4. Return JWT token on success, 401 on failure +5. Use bcrypt for password comparison (already in dependencies) +Follow the existing patterns in src/api/routes.js for route structure. +Use the existing db.query() helper in src/db/index.js for database access. -def get_default_critic(llm: LLM) -> CriticBase | None: - """Auto-configure critic for All-Hands LLM proxy. +Success criteria: I can call the endpoint with valid credentials +and receive a JWT token that works with our existing auth middleware. +``` - When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an - APIBasedCritic configured with: - - server_url: {base_url}/vllm - - api_key: same as LLM - - model_name: "critic" +**Why it works:** +- Specific, scoped feature +- Clear technical requirements +- Points to existing patterns to follow +- Defines what "done" looks like - Args: - llm: The LLM instance to derive critic configuration from. +### Code Review Examples - Returns: - An APIBasedCritic if the LLM is configured for All-Hands proxy, - None otherwise. +#### Bad Example - Example: - llm = LLM( - model="anthropic/claude-sonnet-4-5", - api_key=api_key, - base_url="https://llm-proxy.eval.all-hands.dev", - ) - critic = get_default_critic(llm) - if critic is None: - # Fall back to explicit configuration - critic = APIBasedCritic( - server_url="https://my-critic-server.com", - api_key="my-api-key", - model_name="my-critic-model", - ) - """ - base_url = llm.base_url - api_key = llm.api_key - if base_url is None or api_key is None: - return None +``` +Review my code. +``` - # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) - pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" - if not re.match(pattern, base_url): - return None +**Why it's bad:** +- No code provided or referenced +- No indication of what to look for +- No context about the code's purpose +- No criteria for the review - return APIBasedCritic( - server_url=f"{base_url.rstrip('/')}/vllm", - api_key=api_key, - model_name="critic", - ) +#### Good Example +``` +Review this pull request for our payment processing module: -# Task prompt designed to be moderately complex with subtle requirements. -# The task is simple enough to complete in 1-2 iterations, but has specific -# requirements that are easy to miss - triggering critic feedback. -INITIAL_TASK_PROMPT = """\ -Create a Python word statistics tool called `wordstats` that analyzes text files. +Focus areas: +1. Security - we're handling credit card data +2. Error handling - payments must never silently fail +3. Idempotency - duplicate requests should be safe -## Structure +Context: +- This integrates with Stripe API +- It's called from our checkout flow +- We have ~10,000 transactions/day -Create directory `wordstats/` with: -- `stats.py` - Main module with `analyze_file(filepath)` function -- `cli.py` - Command-line interface -- `tests/test_stats.py` - Unit tests +Please flag any issues as Critical/Major/Minor with explanations. +``` -## Requirements for stats.py +**Why it works:** +- Clear scope and focus areas +- Important context provided +- Business implications explained +- Requested output format specified -The `analyze_file(filepath)` function must return a dict with these EXACT keys: -- `lines`: total line count (including empty lines) -- `words`: word count -- `chars`: character count (including whitespace) -- `unique_words`: count of unique words (case-insensitive) +### Refactoring Examples -### Important edge cases (often missed!): -1. Empty files must return all zeros, not raise an exception -2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) -3. Numbers like "123" or "3.14" are NOT counted as words -4. Contractions like "don't" count as ONE word -5. File not found must raise FileNotFoundError with a clear message +#### Bad Example -## Requirements for cli.py +``` +Make the code better. +``` -When run as `python cli.py `: -- Print each stat on its own line: "Lines: X", "Words: X", etc. -- Exit with code 1 if file not found, printing error to stderr -- Exit with code 0 on success +**Why it's bad:** +- "Better" is subjective and undefined +- No specific problems identified +- No goals for the refactoring +- No constraints or requirements -## Required Tests (test_stats.py) +#### Good Example -Write tests that verify: -1. Basic counting on normal text -2. Empty file returns all zeros -3. Hyphenated words counted correctly -4. Numbers are excluded from word count -5. FileNotFoundError raised for missing files +``` +Refactor the UserService class in src/services/user.js: -## Verification Steps +Problems to address: +1. The class is 500+ lines - split into smaller, focused services +2. Database queries are mixed with business logic - separate them +3. There's code duplication in the validation methods -1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): -``` -Hello world! -This is a well-known test file. +Constraints: +- Keep the public API unchanged (other code depends on it) +- Maintain test coverage (run npm test after changes) +- Follow our existing service patterns in src/services/ -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. +Goal: Improve maintainability while keeping the same functionality. ``` -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +**Why it works:** +- Specific problems identified +- Clear constraints and requirements +- Points to patterns to follow +- Measurable success criteria -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +## Key Principles for Effective Instructions -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +### Be Specific + +Vague instructions produce vague results. Be concrete about: +| Instead of... | Say... | +|---------------|--------| +| "Fix the error" | "Fix the TypeError on line 45 of api.py" | +| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | +| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | +| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +### Provide Context -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +Help OpenHands understand the bigger picture: -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +``` +Context to include: +- What does this code do? (purpose) +- Who uses it? (users/systems) +- Why does this matter? (business impact) +- What constraints exist? (performance, compatibility) +- What patterns should be followed? (existing conventions) +``` -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +**Example with context:** + +``` +Add rate limiting to our public API endpoints. + +Context: +- This is a REST API serving mobile apps and third-party integrations +- We've been seeing abuse from web scrapers hitting us 1000+ times/minute +- Our infrastructure can handle 100 req/sec per client sustainably +- We use Redis (already available in the project) +- Our API follows the controller pattern in src/controllers/ -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +Requirement: Limit each API key to 100 requests per minute with +appropriate 429 responses and Retry-After headers. +``` -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +### Set Clear Goals -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +Define what success looks like: -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +``` +Success criteria checklist: +✓ What specific outcome do you want? +✓ How will you verify it worked? +✓ What tests should pass? +✓ What should the user experience be? +``` -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +**Example with clear goals:** -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") ``` -Hello world! -This is a well-known test file. +Implement password reset functionality. -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. +Success criteria: +1. User can request reset via POST /api/auth/forgot-password +2. System sends email with secure reset link +3. Link expires after 1 hour +4. User can set new password via POST /api/auth/reset-password +5. Old sessions are invalidated after password change +6. All edge cases return appropriate error messages +7. Existing tests still pass, new tests cover the feature ``` -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +### Include Constraints -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +Specify what you can't or won't change: -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +``` +Constraints to specify: +- API compatibility (can't break existing clients) +- Technology restrictions (must use existing stack) +- Performance requirements (must respond in <100ms) +- Security requirements (must not log PII) +- Time/scope limits (just this one file) +``` +## Common Pitfalls to Avoid -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +### Vague Requirements -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) + + + ``` + Make the dashboard faster. + ``` + + + ``` + The dashboard takes 5 seconds to load. + + Profile it and optimize to load in under 1 second. + + Likely issues: + - N+1 queries in getWidgetData() + - Uncompressed images + - Missing database indexes + + Focus on the biggest wins first. + ``` + + -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +### Missing Context -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) + + + ``` + Add caching to the API. + ``` + + + ``` + Add caching to the product catalog API. + + Context: + - 95% of requests are for the same 1000 products + - Product data changes only via admin panel (rare) + - We already have Redis running for sessions + - Current response time is 200ms, target is <50ms + + Cache strategy: Cache product data in Redis with 5-minute TTL, + invalidate on product update. + ``` + + -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +### Unrealistic Expectations -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) + + + ``` + Rewrite our entire backend from PHP to Go. + ``` + + + ``` + Create a Go microservice for the image processing currently in + src/php/ImageProcessor.php. + + This is the first step in our gradual migration. + The Go service should: + 1. Expose the same API endpoints + 2. Be deployable alongside the existing PHP app + 3. Include a feature flag to route traffic + + Start with just the resize and crop functions. + ``` + + -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +### Incomplete Information -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() + + + ``` + The login is broken, fix it. + ``` + + + ``` + Users can't log in since yesterday's deployment. + + Symptoms: + - Login form submits but returns 500 error + - Server logs show: "Redis connection refused" + - Redis was moved to a new host yesterday + + The issue is likely in src/config/redis.js which may + have the old host hardcoded. + + Expected: Login should work with the new Redis at redis.internal:6380 + ``` + + -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +## Best Practices -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` -Hello world! -This is a well-known test file. +### Structure Your Instructions + +Use clear structure for complex requests: -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. ``` +## Task +[One sentence describing what you want] -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +## Background +[Context and why this matters] + +## Requirements +1. [Specific requirement] +2. [Specific requirement] +3. [Specific requirement] + +## Constraints +- [What you can't change] +- [What must be preserved] -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +## Success Criteria +- [How to verify it works] +``` -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +### Provide Examples +Show what you want through examples: -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +``` +Add input validation to the user registration endpoint. -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +Example of what validation errors should look like: -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +{ + "error": "validation_failed", + "details": [ + {"field": "email", "message": "Invalid email format"}, + {"field": "password", "message": "Must be at least 8 characters"} + ] +} -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +Validate: +- email: valid format, not already registered +- password: min 8 chars, at least 1 number +- username: 3-20 chars, alphanumeric only +``` -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +### Define Success Criteria -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +Be explicit about what "done" means: -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +``` +This task is complete when: +1. All existing tests pass (npm test) +2. New tests cover the added functionality +3. The feature works as described in the acceptance criteria +4. Code follows our style guide (npm run lint passes) +5. Documentation is updated if needed +``` -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +### Iterate and Refine -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +Build on previous work: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") ``` -Hello world! -This is a well-known test file. +In our last session, you added the login endpoint. -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. +Now add the logout functionality: +1. POST /api/auth/logout endpoint +2. Invalidate the current session token +3. Clear any server-side session data +4. Follow the same patterns used in login + +The login implementation is in src/api/auth/login.js for reference. ``` -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +## Quick Reference -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +| Element | Bad | Good | +|---------|-----|------| +| Location | "in the code" | "in src/api/users.py line 45" | +| Problem | "it's broken" | "TypeError when user.preferences is None" | +| Scope | "add authentication" | "add JWT-based login endpoint" | +| Behavior | "make it work" | "return 200 with user data on success" | +| Patterns | (none) | "follow patterns in src/services/" | +| Success | (none) | "all tests pass, endpoint returns correct data" | -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" + +The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. + +### OpenHands in Your SDLC +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +## Integration with Development Workflows -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +### Planning Phase -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +Use OpenHands during planning to accelerate technical decisions: -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +**Technical specification assistance:** +``` +Create a technical specification for adding search functionality: -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +Requirements from product: +- Full-text search across products and articles +- Filter by category, price range, and date +- Sub-200ms response time at 1000 QPS -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +Provide: +1. Architecture options (Elasticsearch vs. PostgreSQL full-text) +2. Data model changes needed +3. API endpoint designs +4. Estimated implementation effort +5. Risks and mitigations +``` -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +**Sprint planning support:** +``` +Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +Story 1: As a user, I can reset my password via email +Story 2: As an admin, I can view user activity logs -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") +For each story, create: +- Technical subtasks +- Estimated effort (hours) +- Dependencies on other work +- Testing requirements ``` -Hello world! -This is a well-known test file. -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. -``` +### Development Phase + +OpenHands excels during active development: + +**Feature implementation:** +- Write new features with clear specifications +- Follow existing code patterns automatically +- Generate tests alongside code +- Create documentation as you go + +**Bug fixing:** +- Analyze error logs and stack traces +- Identify root causes +- Implement fixes with regression tests +- Document the issue and solution -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +**Code improvement:** +- Refactor for clarity and maintainability +- Optimize performance bottlenecks +- Update deprecated APIs +- Improve error handling -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +### Testing Phase -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +Automate test creation and improvement: +``` +Add comprehensive tests for the UserService module: -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +Current coverage: 45% +Target coverage: 85% -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +1. Analyze uncovered code paths using the codecov module +2. Write unit tests for edge cases +3. Add integration tests for API endpoints +4. Create test data factories +5. Document test scenarios -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). +``` -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +### Review Phase -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +Accelerate code reviews: -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +``` +Review this PR for our coding standards: -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +Check for: +1. Security issues (SQL injection, XSS, etc.) +2. Performance concerns +3. Test coverage adequacy +4. Documentation completeness +5. Adherence to our style guide -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +Provide actionable feedback with severity ratings. +``` -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +### Deployment Phase + +Assist with deployment preparation: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") ``` -Hello world! -This is a well-known test file. +Prepare for production deployment: -It has 5 lines, including empty ones. -Numbers like 42 and 3.14 don't count as words. +1. Review all changes since last release +2. Check for breaking API changes +3. Verify database migrations are reversible +4. Update the changelog +5. Create release notes +6. Identify rollback steps if needed ``` -2. Run: `python wordstats/cli.py sample.txt` - Expected output: - - Lines: 5 - - Words: 21 - - Chars: 130 - - Unique words: 21 +## CI/CD Integration -3. Run the tests: `python -m pytest wordstats/tests/ -v` - ALL tests must pass. +OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. -The task is complete ONLY when: -- All files exist -- The CLI outputs the correct stats for sample.txt -- All 5+ tests pass -""" +### GitHub Actions Integration +The Software Agent SDK provides composite GitHub Actions for common workflows: -llm_api_key = get_required_env("LLM_API_KEY") -llm = LLM( - # Use a weaker model to increase likelihood of needing multiple iterations - model="anthropic/claude-haiku-4-5", - api_key=llm_api_key, - top_p=0.95, - base_url=os.getenv("LLM_BASE_URL", None), -) +- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments +- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK -# Setup critic with iterative refinement config -# The IterativeRefinementConfig tells Conversation.run() to automatically -# retry the task if the critic score is below the threshold -iterative_config = IterativeRefinementConfig( - success_threshold=SUCCESS_THRESHOLD, - max_iterations=MAX_ITERATIONS, -) +For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. -# Auto-configure critic for All-Hands proxy or use explicit env vars -critic = get_default_critic(llm) -if critic is None: - print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") - critic = APIBasedCritic( - server_url=get_required_env("CRITIC_SERVER_URL"), - api_key=get_required_env("CRITIC_API_KEY"), - model_name=get_required_env("CRITIC_MODEL_NAME"), - iterative_refinement=iterative_config, - ) -else: - # Add iterative refinement config to the auto-configured critic - critic = critic.model_copy(update={"iterative_refinement": iterative_config}) +### What You Can Automate -# Create agent with critic (iterative refinement is built into the critic) -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], - critic=critic, -) +Using the SDK, you can create GitHub Actions workflows to: -# Create workspace -workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) -print(f"📁 Created workspace: {workspace}") +1. **Automatic code review** when a PR is opened +2. **Automatically update docs** weekly when new functionality is added +3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements +4. **Manage TODO comments** and track technical debt +5. **Assign reviewers** based on code ownership patterns -# Create conversation - iterative refinement is handled automatically -# by Conversation.run() based on the critic's config -conversation = Conversation( - agent=agent, - workspace=str(workspace), -) +### Getting Started -print("\n" + "=" * 70) -print("🚀 Starting Iterative Refinement with Critic Model") -print("=" * 70) -print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") -print(f"Max iterations: {MAX_ITERATIONS}") +To integrate OpenHands into your CI/CD: -# Send the task and run - Conversation.run() handles retries automatically -conversation.send_message(INITIAL_TASK_PROMPT) -conversation.run() +1. Review the [SDK Getting Started guide](/sdk/getting-started) +2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) +3. Set up your `LLM_API_KEY` as a repository secret +4. Use the provided composite actions or build custom workflows -# Print additional info about created files -print("\nCreated files:") -for path in sorted(workspace.rglob("*")): - if path.is_file(): - relative = path.relative_to(workspace) - print(f" - {relative}") +See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"\nEXAMPLE_COST: {cost:.4f}") -``` +## Team Workflows -```bash Running the Example icon="terminal" -LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ - uv run python examples/01_standalone_sdk/34_critic_example.py -``` +### Solo Developer Workflows -### Example Output +For individual developers: -``` -📁 Created workspace: /tmp/critic_demo_abc123 +**Daily workflow:** +1. **Morning review**: Have OpenHands analyze overnight CI results +2. **Feature development**: Use OpenHands for implementation +3. **Pre-commit**: Request review before pushing +4. **Documentation**: Generate/update docs for changes -====================================================================== -🚀 Starting Iterative Refinement with Critic Model -====================================================================== -Success threshold: 70% -Max iterations: 3 +**Best practices:** +- Set up automated reviews on all PRs +- Use OpenHands for boilerplate and repetitive tasks +- Keep AGENTS.md updated with project patterns -... agent works on the task ... +### Small Team Workflows -✓ Critic evaluation: score=0.758, success=True +For teams of 2-10 developers: + +**Collaborative workflow:** +``` +Team Member A: Creates feature branch, writes initial implementation +OpenHands: Reviews code, suggests improvements +Team Member B: Reviews OpenHands suggestions, approves or modifies +OpenHands: Updates documentation, adds missing tests +Team: Merges after final human review +``` -Created files: - - sample.txt - - wordstats/cli.py - - wordstats/stats.py - - wordstats/tests/test_stats.py +**Communication integration:** +- Slack notifications for OpenHands findings +- Automatic issue creation for bugs found +- Weekly summary reports -EXAMPLE_COST: 0.0234 -``` +### Enterprise Team Workflows -## Next Steps +For larger organizations: -- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior -- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics -- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns +**Governance and oversight:** +- Configure approval requirements for OpenHands changes +- Set up audit logging for all AI-assisted changes +- Define scope limits for automated actions +- Establish human review requirements -### Custom Tools -Source: https://docs.openhands.dev/sdk/guides/custom-tools.md +**Scale patterns:** +``` +Central Platform Team: +├── Defines OpenHands policies +├── Manages integrations +└── Monitors usage and quality -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Feature Teams: +├── Use OpenHands within policies +├── Customize for team needs +└── Report issues to platform team +``` -> The ready-to-run example is available [here](#ready-to-run-example)! +## Best Practices -## Understanding the Tool System +### Code Review Integration -The SDK's tool system is built around three core components: +Set up effective automated reviews: -1. **Action** - Defines input parameters (what the tool accepts) -2. **Observation** - Defines output data (what the tool returns) -3. **Executor** - Implements the tool's logic (what the tool does) +```yaml +# .openhands/review-config.yml +review: + focus_areas: + - security + - performance + - test_coverage + - documentation + + severity_levels: + block_merge: + - critical + - security + require_response: + - major + informational: + - minor + - suggestion + + ignore_patterns: + - "*.generated.*" + - "vendor/*" +``` -These components are tied together by a **ToolDefinition** that registers the tool with the agent. +### Pull Request Automation -## Built-in Tools +Automate common PR tasks: -The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. +| Trigger | Action | +|---------|--------| +| PR opened | Auto-review, label by type | +| Tests fail | Analyze failures, suggest fixes | +| Coverage drops | Identify missing tests | +| PR approved | Update changelog, check docs | -```python icon="python" wrap -from openhands.tools import BashTool, FileEditorTool -from openhands.tools.preset import get_default_tools +### Quality Gates -# Use specific tools -agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) +Define automated quality gates: -# Or use preset -tools = get_default_tools() -agent = Agent(llm=llm, tools=tools) +```yaml +quality_gates: + - name: test_coverage + threshold: 80% + action: block_merge + + - name: security_issues + threshold: 0 critical + action: block_merge + + - name: code_review_score + threshold: 7/10 + action: require_review + + - name: documentation + requirement: all_public_apis + action: warn ``` - -See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. - +### Automated Testing -## Creating a Custom Tool +Integrate OpenHands with your testing strategy: -Here's a minimal example of creating a custom grep tool: +**Test generation triggers:** +- New code without tests +- Coverage below threshold +- Bug fix without regression test +- API changes without contract tests - - - ### Define the Action - Defines input parameters (what the tool accepts) +**Example workflow:** +```yaml +on: + push: + branches: [main] - ```python icon="python" wrap - class GrepAction(Action): - pattern: str = Field(description="Regex to search for") - path: str = Field( - default=".", - description="Directory to search (absolute or relative)" - ) - include: str | None = Field( - default=None, - description="Optional glob to filter files (e.g. '*.py')" - ) - ``` - - - ### Define the Observation - Defines output data (what the tool returns) +jobs: + ensure-coverage: + steps: + - name: Check coverage + run: | + COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') + if [ "$COVERAGE" -lt "80" ]; then + openhands generate-tests --target 80 + fi +``` - ```python icon="python" wrap - class GrepObservation(Observation): - matches: list[str] = Field(default_factory=list) - files: list[str] = Field(default_factory=list) - count: int = 0 +## Common Integration Patterns - @property - def to_llm_content(self) -> Sequence[TextContent | ImageContent]: - if not self.count: - return [TextContent(text="No matches found.")] - files_list = "\n".join(f"- {f}" for f in self.files[:20]) - sample = "\n".join(self.matches[:10]) - more = "\n..." if self.count > 10 else "" - ret = ( - f"Found {self.count} matching lines.\n" - f"Files:\n{files_list}\n" - f"Sample:\n{sample}{more}" - ) - return [TextContent(text=ret)] - ``` - - The to_llm_content() property formats observations for the LLM. - - - - ### Define the Executor - Implements the tool’s logic (what the tool does) +### Pre-Commit Hooks - ```python icon="python" wrap - class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): - def __init__(self, terminal: TerminalExecutor): - self.terminal: TerminalExecutor = terminal +Run OpenHands checks before commits: - def __call__( - self, - action: GrepAction, - conversation=None, - ) -> GrepObservation: - root = os.path.abspath(action.path) - pat = shlex.quote(action.pattern) - root_q = shlex.quote(root) +```bash +# .git/hooks/pre-commit +#!/bin/bash - # Use grep -r; add --include when provided - if action.include: - inc = shlex.quote(action.include) - cmd = f"grep -rHnE --include {inc} {pat} {root_q}" - else: - cmd = f"grep -rHnE {pat} {root_q}" - cmd += " 2>/dev/null | head -100" - result = self.terminal(TerminalAction(command=cmd)) +# Quick code review +openhands review --quick --staged-only - matches: list[str] = [] - files: set[str] = set() +if [ $? -ne 0 ]; then + echo "OpenHands found issues. Review and fix before committing." + exit 1 +fi +``` - # grep returns exit code 1 when no matches; treat as empty - output_text = result.text +### Post-Commit Actions - if output_text.strip(): - for line in output_text.strip().splitlines(): - matches.append(line) - # Expect "path:line:content" - # take the file part before first ":" - file_path = line.split(":", 1)[0] - if file_path: - files.add(os.path.abspath(file_path)) +Automate tasks after commits: - return GrepObservation( - matches=matches, - files=sorted(files), - count=len(matches), - ) - ``` - - - ### Finally, define the tool - ```python icon="python" wrap - class GrepTool(ToolDefinition[GrepAction, GrepObservation]): - """Custom grep tool that searches file contents using regular expressions.""" +```yaml +# .github/workflows/post-commit.yml +on: + push: + branches: [main] - @classmethod - def create( - cls, - conv_state, - terminal_executor: TerminalExecutor | None = None - ) -> Sequence[ToolDefinition]: - """Create GrepTool instance with a GrepExecutor. +jobs: + update-docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Update API docs + run: openhands update-docs --api + - name: Commit changes + run: | + git add docs/ + git commit -m "docs: auto-update API documentation" || true + git push +``` - Args: - conv_state: Conversation state to get - working directory from. - terminal_executor: Optional terminal executor to reuse. - If not provided, a new one will be created. +### Scheduled Tasks - Returns: - A sequence containing a single GrepTool instance. - """ - if terminal_executor is None: - terminal_executor = TerminalExecutor( - working_dir=conv_state.workspace.working_dir - ) - grep_executor = GrepExecutor(terminal_executor) +Run regular maintenance: - return [ - cls( - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, - ) - ] - ``` - - +```yaml +# Weekly dependency check +on: + schedule: + - cron: '0 9 * * 1' # Monday 9am -## Good to know -### Tool Registration -Tools are registered using `register_tool()` and referenced by name: +jobs: + dependency-review: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check dependencies + run: | + openhands check-dependencies --security --outdated + - name: Create issues + run: openhands create-issues --from-report deps.json +``` -```python icon="python" wrap -# Register a simple tool class -register_tool("FileEditorTool", FileEditorTool) +### Event-Triggered Workflows -# Register a factory function that creates multiple tools -register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) +You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. -# Use registered tools by name -tools = [ - Tool(name="FileEditorTool"), - Tool(name="BashAndGrepToolSet"), -] -``` +For more event-driven automation patterns, see: +- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events +- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage -### Factory Functions -Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: +### When to Use OpenHands +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md -```python icon="python" wrap -def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: - """Create execute_bash and custom grep tools sharing one executor.""" - bash_executor = BashExecutor( - working_dir=conv_state.workspace.working_dir - ) - # Create and configure tools... - return [bash_tool, grep_tool] -``` +OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. -### Shared Executors -Multiple tools can share executors for efficiency and state consistency: +## Task Complexity Guidance -```python icon="python" wrap -bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) -bash_tool = execute_bash_tool.set_executor(executor=bash_executor) +### Simple Tasks -grep_executor = GrepExecutor(bash_executor) -grep_tool = ToolDefinition( - name="grep", - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, -) -``` +**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. -## When to Create Custom Tools +- Adding a new function or method +- Writing unit tests for existing code +- Fixing simple bugs with clear error messages +- Code formatting and style fixes +- Adding documentation or comments +- Simple refactoring (rename, extract method) +- Configuration changes -Create custom tools when you need to: -- Combine multiple operations into a single, structured interface -- Add typed parameters with validation -- Format complex outputs for LLM consumption -- Integrate with external APIs or services +**Example prompt:** +``` +Add a calculateDiscount() function to src/utils/pricing.js that takes +a price and discount percentage, returns the discounted price. +Add unit tests. +``` -## Ready-to-run Example +### Medium Complexity Tasks - -This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) - +**Good for OpenHands** — These tasks may need more context and possibly some iteration. -```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py -"""Advanced example showing explicit executor usage and custom grep tool.""" +- Implementing a new API endpoint +- Adding a feature to an existing module +- Debugging issues that span multiple files +- Migrating code to a new pattern +- Writing integration tests +- Performance optimization with clear metrics +- Setting up CI/CD workflows -import os -import shlex -from collections.abc import Sequence +**Example prompt:** +``` +Add a user profile endpoint to our API: +- GET /api/users/:id/profile +- Return user data with their recent activity +- Follow patterns in existing controllers +- Add integration tests +- Handle not-found and unauthorized cases +``` -from pydantic import Field, SecretStr +### Complex Tasks -from openhands.sdk import ( - LLM, - Action, - Agent, - Conversation, - Event, - ImageContent, - LLMConvertibleEvent, - Observation, - TextContent, - ToolDefinition, - get_logger, -) -from openhands.sdk.tool import ( - Tool, - ToolExecutor, - register_tool, -) -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import ( - TerminalAction, - TerminalExecutor, - TerminalTool, -) +**May require iteration** — These benefit from breaking down into smaller pieces. +- Large refactoring across many files +- Architectural changes +- Implementing complex business logic +- Multi-service integrations +- Performance optimization without clear cause +- Security audits +- Framework or major dependency upgrades -logger = get_logger(__name__) +**Recommended approach:** +``` +Break large tasks into phases: -# --- Action / Observation --- +Phase 1: "Analyze the current authentication system and document +all touch points that need to change for OAuth2 migration." +Phase 2: "Implement the OAuth2 provider configuration and basic +token flow, keeping existing auth working in parallel." -class GrepAction(Action): - pattern: str = Field(description="Regex to search for") - path: str = Field( - default=".", description="Directory to search (absolute or relative)" - ) - include: str | None = Field( - default=None, description="Optional glob to filter files (e.g. '*.py')" - ) +Phase 3: "Migrate the user login flow to use OAuth2, maintaining +backwards compatibility." +``` +## Best Use Cases -class GrepObservation(Observation): - matches: list[str] = Field(default_factory=list) - files: list[str] = Field(default_factory=list) - count: int = 0 +### Ideal Scenarios - @property - def to_llm_content(self) -> Sequence[TextContent | ImageContent]: - if not self.count: - return [TextContent(text="No matches found.")] - files_list = "\n".join(f"- {f}" for f in self.files[:20]) - sample = "\n".join(self.matches[:10]) - more = "\n..." if self.count > 10 else "" - ret = ( - f"Found {self.count} matching lines.\n" - f"Files:\n{files_list}\n" - f"Sample:\n{sample}{more}" - ) - return [TextContent(text=ret)] +OpenHands is **most effective** when: +| Scenario | Why It Works | +|----------|--------------| +| Clear requirements | OpenHands can work independently | +| Well-defined scope | Less ambiguity, fewer iterations | +| Existing patterns to follow | Consistency with codebase | +| Good test coverage | Easy to verify changes | +| Isolated changes | Lower risk of side effects | -# --- Executor --- +**Perfect use cases:** +- **Bug fixes with reproduction steps**: Clear problem, measurable solution +- **Test additions**: Existing code provides the specification +- **Documentation**: Code is the source of truth +- **Boilerplate generation**: Follows established patterns +- **Code review and analysis**: Read-only, analytical tasks -class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): - def __init__(self, terminal: TerminalExecutor): - self.terminal: TerminalExecutor = terminal +### Good Fit Scenarios - def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 - root = os.path.abspath(action.path) - pat = shlex.quote(action.pattern) - root_q = shlex.quote(root) +OpenHands works **well with some guidance** for: - # Use grep -r; add --include when provided - if action.include: - inc = shlex.quote(action.include) - cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" - else: - cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" +- **Feature implementation**: When requirements are documented +- **Refactoring**: When goals and constraints are clear +- **Debugging**: When you can provide logs and context +- **Code modernization**: When patterns are established +- **API development**: When specs exist - result = self.terminal(TerminalAction(command=cmd)) +**Tips for these scenarios:** - matches: list[str] = [] - files: set[str] = set() +1. Provide clear acceptance criteria +2. Point to examples of similar work in the codebase +3. Specify constraints and non-goals +4. Be ready to iterate and clarify - # grep returns exit code 1 when no matches; treat as empty - output_text = result.text +### Poor Fit Scenarios - if output_text.strip(): - for line in output_text.strip().splitlines(): - matches.append(line) - # Expect "path:line:content" — take the file part before first ":" - file_path = line.split(":", 1)[0] - if file_path: - files.add(os.path.abspath(file_path)) +**Consider alternatives** when: - return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) +| Scenario | Challenge | Alternative | +|----------|-----------|-------------| +| Vague requirements | Unclear what "done" means | Define requirements first | +| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | +| Highly sensitive code | Risk tolerance is zero | Human review essential | +| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | +| Visual design | Subjective aesthetic judgments | Use design tools | +**Red flags that a task may not be suitable:** -# Tool description -_GREP_DESCRIPTION = """Fast content search tool. -* Searches file contents using regular expressions -* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) -* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") -* Returns matching file paths sorted by modification time. -* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. -* Use this tool when you need to find files containing specific patterns -* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead -""" # noqa: E501 +- "Make it look better" (subjective) +- "Figure out what's wrong" (too vague) +- "Rewrite everything" (too large) +- "Do what makes sense" (unclear requirements) +- Changes to production infrastructure without review +## Limitations -# --- Tool Definition --- +### Current Limitations +Be aware of these constraints: -class GrepTool(ToolDefinition[GrepAction, GrepObservation]): - """A custom grep tool that searches file contents using regular expressions.""" +- **Long-running processes**: Sessions have time limits +- **Interactive debugging**: Can't set breakpoints interactively +- **Visual verification**: Can't see rendered UI easily +- **External system access**: May need credentials configured +- **Large codebase analysis**: Memory and time constraints - @classmethod - def create( - cls, conv_state, terminal_executor: TerminalExecutor | None = None - ) -> Sequence[ToolDefinition]: - """Create GrepTool instance with a GrepExecutor. +### Technical Constraints - Args: - conv_state: Conversation state to get working directory from. - terminal_executor: Optional terminal executor to reuse. If not provided, - a new one will be created. +| Constraint | Impact | Workaround | +|------------|--------|------------| +| Session duration | Very long tasks may timeout | Break into smaller tasks | +| Context window | Can't see entire large codebase at once | Focus on relevant files | +| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | +| Network access | Some external services may be blocked | Use local resources when possible | - Returns: - A sequence containing a single GrepTool instance. - """ - if terminal_executor is None: - terminal_executor = TerminalExecutor( - working_dir=conv_state.workspace.working_dir - ) - grep_executor = GrepExecutor(terminal_executor) +### Scope Boundaries - return [ - cls( - description=_GREP_DESCRIPTION, - action_type=GrepAction, - observation_type=GrepObservation, - executor=grep_executor, - ) - ] +OpenHands works within your codebase but has boundaries: +**Can do:** +- Read and write files in the repository +- Run tests and commands +- Access configured services and APIs +- Browse documentation and reference material -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +**Cannot do:** +- Access your local environment outside the sandbox +- Make decisions requiring business context it doesn't have +- Replace human judgment for critical decisions +- Guarantee production-safe changes without review -# Tools - demonstrating both simplified and advanced patterns -cwd = os.getcwd() +## Pre-Task Checklist +### Prerequisites -def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: - """Create terminal and custom grep tools sharing one executor.""" +Before starting a task, ensure: - terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) - # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) - terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] +- [ ] Clear description of what you want +- [ ] Expected outcome is defined +- [ ] Relevant files are identified +- [ ] Dependencies are available +- [ ] Tests can be run - # Use the GrepTool.create() method with shared terminal_executor - grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] +### Environment Setup - return [terminal_tool, grep_tool] +Prepare your repository: +```markdown +## AGENTS.md Checklist -register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) +- [ ] Build commands documented +- [ ] Test commands documented +- [ ] Code style guidelines noted +- [ ] Architecture overview included +- [ ] Common patterns described +``` -tools = [ - Tool(name=FileEditorTool.name), - Tool(name="BashAndGrepToolSet"), -] +See [Repository Setup](/openhands/usage/customization/repository) for details. -# Agent -agent = Agent(llm=llm, tools=tools) +### Repository Preparation -llm_messages = [] # collect raw LLM messages +Optimize for success: +1. **Clean state**: Commit or stash uncommitted changes +2. **Working build**: Ensure the project builds +3. **Passing tests**: Start from a green state +4. **Updated dependencies**: Resolve any dependency issues +5. **Clear documentation**: Update AGENTS.md if needed -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Post-Task Review +### Quality Checks -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) +After OpenHands completes a task: -conversation.send_message( - "Hello! Can you use the grep tool to find all files " - "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 - "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 -) -conversation.run() +- [ ] Review all changed files +- [ ] Understand each change made +- [ ] Check for unintended modifications +- [ ] Verify code style consistency +- [ ] Look for hardcoded values or credentials -conversation.send_message("Great! Now delete that file.") -conversation.run() +### Validation Steps -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +1. **Run tests**: `npm test`, `pytest`, etc. +2. **Check linting**: Ensure style compliance +3. **Build the project**: Verify it still compiles +4. **Manual testing**: Test the feature yourself +5. **Edge cases**: Try unusual inputs -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +### Learning from Results - +After each significant task: -## Next Steps +**What went well?** +- Note effective prompt patterns +- Document successful approaches +- Update AGENTS.md with learnings -- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers -- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation +**What could improve?** +- Identify unclear instructions +- Note missing context +- Plan better for next time -### Assign Reviews -Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md +**Update your repository:** +```markdown +## Things OpenHands Should Know (add to AGENTS.md) -> The reference workflow is available [here](#reference-workflow)! +- When adding API endpoints, always add to routes/index.js +- Our date format is ISO 8601 everywhere +- All database queries go through the repository pattern +``` -Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. +## Decision Framework -## How it works +Use this framework to decide if a task is right for OpenHands: -It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. +``` +Is the task well-defined? +├── No → Define it better first +└── Yes → Continue -**Core Components:** -- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts -- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent +Do you have clear success criteria? +├── No → Define acceptance criteria +└── Yes → Continue -**Prompt Options:** -1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) -2. **`PROMPT_LOCATION`** - URL or file path for external prompts +Is the scope manageable (< 100 LOC)? +├── No → Break into smaller tasks +└── Yes → Continue -The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. +Do examples exist in the codebase? +├── No → Provide examples or patterns +└── Yes → Continue -## Assign Reviews Use Case +Can you verify the result? +├── No → Add tests or verification steps +└── Yes → ✅ Good candidate for OpenHands +``` -This specific implementation uses the basic action template to handle three PR management scenarios: +OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! -**1. Need Reviewer Action** -- Identifies PRs waiting for review -- Notifies reviewers to take action +But it can be particularly useful for certain types of tasks. For instance: -**2. Need Author Action** -- Finds stale PRs with no activity for 5+ days -- Prompts authors to update, request review, or close +- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. +- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. +- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. +- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. +- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. -**3. Need Reviewers** -- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) -- Uses git blame analysis to identify relevant contributors -- Automatically assigns reviewers based on file ownership and contribution history -- Balances reviewer workload across team members +There are also some tasks where agent struggle a little more. -## Quick Start +- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. +- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. - - - ```bash icon="terminal" - cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml - ``` - - - Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` - (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). - - - Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". - - - The default is: Daily at 12 PM UTC. - - +### Tutorial Library +Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials.md -## Features +Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. -- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership -- **Automated Notifications** - Sends contextual reminders to reviewers and authors -- **Workload Balancing** - Distributes review requests evenly across team members -- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch +## Categories Overview -## Reference Workflow +| Category | Best For | Complexity | +|----------|----------|------------| +| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | +| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | +| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | +| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | +| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | +| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | -This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. -```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml ---- -# To set this up: -# 1. Change the name below to something relevant to your task -# 2. Modify the "env" section below with your prompt -# 3. Add your LLM_API_KEY to the repository secrets -# 4. Commit this file to your repository -# 5. Trigger the workflow manually or set up a schedule -name: Assign Reviews - -on: - # Manual trigger - workflow_dispatch: - # Scheduled trigger (disabled by default, uncomment and customize as needed) - schedule: - # Run at 12 PM UTC every day - - cron: 0 12 * * * - -permissions: - contents: write - pull-requests: write - issues: write +## Task Complexity Guidance -jobs: - run-task: - runs-on: ubuntu-24.04 - env: - # Configuration (modify these values as needed) - AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py - # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both - # Option 1: Use a URL or file path for the prompt - PROMPT_LOCATION: '' - # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' - # Option 2: Use direct text for the prompt - PROMPT_STRING: > - Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. - Read the sections below in order, and perform each in order. Do NOT take action - on the same issue or PR twice. +Before starting, assess your task's complexity: - # Issues with needs-info - Check for OP Response +**Simple tasks** (5-15 minutes): +- Single file changes +- Clear, well-defined requirements +- Existing patterns to follow - Find all open issues that have the "needs-info" label. For each issue: - 1. Identify the original poster (issue author) - 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added - 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline - and look for "labeled" events with the label "needs-info" - 4. If the original poster has commented after the label was added: - - Remove the "needs-info" label - - Add the "needs-triage" label - - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." +**Medium tasks** (15-45 minutes): +- Multiple file changes +- Some discovery required +- Integration with existing code - # Issues with needs-triage +**Complex tasks** (45+ minutes): +- Architectural changes +- Multiple components +- Requires iteration - Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last - activity: - 1. First, check if the issue has already been triaged by verifying it does NOT have: - - The "enhancement" label - - Any "priority" label (priority:low, priority:medium, priority:high, etc.) - 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label - 3. For issues that have NOT been triaged yet: - - Read the issue description and comments - - Determine if it requires maintainer attention by checking: - * Is it a bug report, feature request, or question? - * Does it have enough information to be actionable? - * Has a maintainer already commented? - * Is the last comment older than 4 days? - - If it needs maintainer attention and no maintainer has commented: - * Find an appropriate maintainer based on the issue topic and recent activity - * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have - a chance?" + +Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. + - # Need Reviewer Action +## Best Use Cases - Find all open PRs where: - 1. The PR is waiting for review (there are no open review comments or change requests) - 2. The PR is in a "clean" state (CI passing, no merge conflicts) - 3. The PR is not marked as draft (draft: false) - 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. +OpenHands excels at: - In this case, send a message to the reviewers: - [Automatic Post]: This PR seems to be currently waiting for review. - {reviewer_names}, could you please take a look when you have a chance? +- **Repetitive tasks**: Boilerplate code, test generation +- **Pattern application**: Following established conventions +- **Analysis**: Code review, debugging, documentation +- **Exploration**: Understanding new codebases - # Need Author Action +## Example Tutorials by Category - Find all open PRs where the most recent change or comment was made on the pull - request more than 5 days ago (use 14 days if the PR is marked as draft). +### Testing - And send a message to the author: +#### Tutorial: Add Unit Tests for a Module - [Automatic Post]: It has been a while since there was any activity on this PR. - {author}, are you still working on it? If so, please go ahead, if not then - please request review, close it, or request that someone else follow up. +**Goal**: Achieve 80%+ test coverage for a service module - # Need Reviewers +**Prompt**: +``` +Add unit tests for the UserService class in src/services/user.js. - Find all open pull requests that: - 1. Have no reviewers assigned to them. - 2. Are not marked as draft. - 3. Were created more than 1 day ago. - 4. CI is passing and there are no merge conflicts. +Current coverage: 35% +Target coverage: 80% - For each of these pull requests, read the git blame information for the files, - and find the most recent and active contributors to the file/location of the changes. - Assign one of these people as a reviewer, but try not to assign too many reviews to - any single person. Add this message: +Requirements: +1. Test all public methods +2. Cover edge cases (null inputs, empty arrays, etc.) +3. Mock external dependencies (database, API calls) +4. Follow our existing test patterns in tests/services/ +5. Use Jest as the testing framework - [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. - Thanks in advance for the help! +Focus on these methods: +- createUser() +- updateUser() +- deleteUser() +- getUserById() +``` - LLM_MODEL: - LLM_BASE_URL: - steps: - - name: Checkout repository - uses: actions/checkout@v5 +**What OpenHands does**: +1. Analyzes the UserService class +2. Identifies untested code paths +3. Creates test file with comprehensive tests +4. Mocks dependencies appropriately +5. Runs tests to verify they pass - - name: Set up Python - uses: actions/setup-python@v6 - with: - python-version: '3.13' +**Tips**: +- Provide existing test files as examples +- Specify the testing framework +- Mention any mocking conventions - - name: Install uv - uses: astral-sh/setup-uv@v7 - with: - enable-cache: true +--- - - name: Install OpenHands dependencies - run: | - # Install OpenHands SDK and tools from git repository - uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" - uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" +#### Tutorial: Add Integration Tests for an API - - name: Check required configuration - env: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - run: | - if [ -z "$LLM_API_KEY" ]; then - echo "Error: LLM_API_KEY secret is not set." - exit 1 - fi +**Goal**: Test API endpoints end-to-end - # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set - if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then - echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." - echo "Please provide only one in the env section of the workflow file." - exit 1 - fi +**Prompt**: +``` +Add integration tests for the /api/products endpoints. - if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then - echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." - echo "Please set one in the env section of the workflow file." - exit 1 - fi +Endpoints to test: +- GET /api/products (list all) +- GET /api/products/:id (get one) +- POST /api/products (create) +- PUT /api/products/:id (update) +- DELETE /api/products/:id (delete) - if [ -n "$PROMPT_LOCATION" ]; then - echo "Prompt location: $PROMPT_LOCATION" - else - echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" - fi - echo "LLM model: $LLM_MODEL" - if [ -n "$LLM_BASE_URL" ]; then - echo "LLM base URL: $LLM_BASE_URL" - fi +Requirements: +1. Use our test database (configured in jest.config.js) +2. Set up and tear down test data properly +3. Test success cases and error cases +4. Verify response bodies and status codes +5. Follow patterns in tests/integration/ +``` - - name: Run task - env: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - PYTHONPATH: '' - run: | - echo "Running agent script: $AGENT_SCRIPT_URL" +--- - # Download script if it's a URL - if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then - echo "Downloading agent script from URL..." - curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py - AGENT_SCRIPT_PATH="/tmp/agent_script.py" - else - AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" - fi +### Data Analysis - # Run with appropriate prompt argument - if [ -n "$PROMPT_LOCATION" ]; then - echo "Using prompt from: $PROMPT_LOCATION" - uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" - else - echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" - uv run python "$AGENT_SCRIPT_PATH" - fi +#### Tutorial: Create a Data Processing Script - - name: Upload logs as artifact - uses: actions/upload-artifact@v4 - if: always() - with: - name: openhands-task-logs - path: | - *.log - output/ - retention-days: 7 +**Goal**: Process CSV data and generate a report + +**Prompt**: ``` +Create a Python script to analyze our sales data. -## Related Files +Input: sales_data.csv with columns: date, product, quantity, price, region -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) -- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) +Requirements: +1. Load and validate the CSV data +2. Calculate: + - Total revenue by product + - Monthly sales trends + - Top 5 products by quantity + - Revenue by region +3. Generate a summary report (Markdown format) +4. Create visualizations (bar chart for top products, line chart for trends) +5. Save results to reports/ directory -### PR Review -Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md +Use pandas for data processing and matplotlib for charts. +``` -> The reference workflow is available [here](#reference-workflow)! +**What OpenHands does**: +1. Creates a Python script with proper structure +2. Implements data loading with validation +3. Calculates requested metrics +4. Generates formatted report +5. Creates and saves visualizations -Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: -- Requesting `openhands-agent` as a reviewer -- Adding the `review-this` label to the PR +--- - -The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. - +#### Tutorial: Database Query Analysis -## Quick Start +**Goal**: Analyze and optimize slow database queries -```bash -# 1. Copy workflow to your repository -cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml +**Prompt**: +``` +Analyze our slow query log and identify optimization opportunities. -# 2. Configure secrets in GitHub Settings → Secrets -# Add: LLM_API_KEY +File: logs/slow_queries.log -# 3. (Optional) Create a "review-this" label in your repository -# Go to Issues → Labels → New label -# You can also trigger reviews by requesting "openhands-agent" as a reviewer -``` +For each slow query: +1. Explain why it's slow +2. Suggest index additions if helpful +3. Rewrite the query if it can be optimized +4. Estimate the improvement -## Features +Create a report in reports/query_optimization.md with: +- Summary of findings +- Prioritized recommendations +- SQL for suggested changes +``` -- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes -- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices -- **GitHub Integration** - Posts comments directly to the PR -- **Customizable** - Add your own code review guidelines without forking +--- -## Security +### Web Scraping -- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. -- Maintainers need to read the PR to make sure it's safe to run. +#### Tutorial: Build a Web Scraper -## Customizing the Code Review +**Goal**: Extract product data from a website -Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. +**Prompt**: +``` +Create a web scraper to extract product information from our competitor's site. -### How It Works +Target URL: https://example-store.com/products -The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. +Extract for each product: +- Name +- Price +- Description +- Image URL +- SKU (if available) - -**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. - +Requirements: +1. Use Python with BeautifulSoup or Scrapy +2. Handle pagination (site has 50 pages) +3. Respect rate limits (1 request/second) +4. Save results to products.json +5. Handle errors gracefully +6. Log progress to console -### Example: Custom Code Review Skill +Include a README with usage instructions. +``` -Create `.agents/skills/custom-codereview-guide.md` in your repository: +**Tips**: +- Specify rate limiting requirements +- Mention error handling expectations +- Request logging for debugging -```markdown ---- -name: custom-codereview-guide -description: Project-specific review guidelines for MyProject -triggers: -- /codereview --- -# MyProject-Specific Review Guidelines +### Code Review -In addition to general code review practices, check for: + +For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). + -## Project Conventions +#### Tutorial: Security-Focused Code Review -- All API endpoints must have OpenAPI documentation -- Database migrations must be reversible -- Feature flags required for new features +**Goal**: Identify security vulnerabilities in a PR -## Architecture Rules +**Prompt**: +``` +Review this pull request for security issues: -- No direct database access from controllers -- All external API calls must go through the gateway service +Focus areas: +1. Input validation - check all user inputs are sanitized +2. Authentication - verify auth checks are in place +3. SQL injection - check for parameterized queries +4. XSS - verify output encoding +5. Sensitive data - ensure no secrets in code -## Communication Style +For each issue found, provide: +- File and line number +- Severity (Critical/High/Medium/Low) +- Description of the vulnerability +- Suggested fix with code example -- Be direct and constructive -- Use GitHub suggestion syntax for code fixes +Output format: Markdown suitable for PR comments ``` - -**Note**: These rules supplement the default `code-review` skill, not replace it. - +--- - -**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. +#### Tutorial: Performance Review -If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. - +**Goal**: Identify performance issues in code - -**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. - +**Prompt**: +``` +Review the OrderService class for performance issues. -### Benefits of Custom Skills +File: src/services/order.js -1. **No forking required**: Keep using the official SDK while customizing behavior -2. **Version controlled**: Your review guidelines live in your repository -3. **Easy updates**: SDK updates don't overwrite your customizations -4. **Team alignment**: Everyone uses the same review standards -5. **Composable**: Add project-specific rules alongside default guidelines +Check for: +1. N+1 database queries +2. Missing indexes (based on query patterns) +3. Inefficient loops or algorithms +4. Missing caching opportunities +5. Unnecessary data fetching - -See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. - +For each issue: +- Explain the impact +- Show the problematic code +- Provide an optimized version +- Estimate the improvement +``` -## Reference Workflow +--- + +### Bug Fixing -This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) +For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. -```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml ---- -# OpenHands PR Review Workflow -# -# To set this up: -# 1. Copy this file to .github/workflows/pr-review.yml in your repository -# 2. Add LLM_API_KEY to repository secrets -# 3. Customize the inputs below as needed -# 4. Commit this file to your repository -# 5. Trigger the review by either: -# - Adding the "review-this" label to any PR, OR -# - Requesting openhands-agent as a reviewer -# -# For more information, see: -# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review -name: PR Review by OpenHands +#### Tutorial: Fix a Crash Bug -on: - # Trigger when a label is added or a reviewer is requested - pull_request: - types: [labeled, review_requested] +**Goal**: Diagnose and fix an application crash + +**Prompt**: +``` +Fix the crash in the checkout process. + +Error: +TypeError: Cannot read property 'price' of undefined + at calculateTotal (src/checkout/calculator.js:45) + at processOrder (src/checkout/processor.js:23) -permissions: - contents: read - pull-requests: write - issues: write +Steps to reproduce: +1. Add item to cart +2. Apply discount code "SAVE20" +3. Click checkout +4. Crash occurs -jobs: - pr-review: - # Run when review-this label is added OR openhands-agent is requested as reviewer - if: | - github.event.label.name == 'review-this' || - github.event.requested_reviewer.login == 'openhands-agent' - runs-on: ubuntu-latest - steps: - - name: Checkout for composite action - uses: actions/checkout@v4 - with: - repository: OpenHands/software-agent-sdk - # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') - ref: main - sparse-checkout: .github/actions/pr-review +The bug was introduced in commit abc123 (yesterday's deployment). - - name: Run PR Review - uses: ./.github/actions/pr-review - with: - # LLM configuration - llm-model: anthropic/claude-sonnet-4-5-20250929 - llm-base-url: '' - # Review style: roasted (other option: standard) - review-style: roasted - # SDK version to use (version tag or branch name) - sdk-version: main - # Secrets - llm-api-key: ${{ secrets.LLM_API_KEY }} - github-token: ${{ secrets.GITHUB_TOKEN }} +Requirements: +1. Identify the root cause +2. Fix the bug +3. Add a regression test +4. Verify the fix doesn't break other functionality ``` -### Action Inputs - -| Input | Description | Required | Default | -|-------|-------------|----------|---------| -| `llm-model` | LLM model to use | Yes | - | -| `llm-base-url` | LLM base URL (optional) | No | `''` | -| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | -| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | -| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | -| `llm-api-key` | LLM API key | Yes | - | -| `github-token` | GitHub token for API access | Yes | - | +**What OpenHands does**: +1. Analyzes the stack trace +2. Reviews recent changes +3. Identifies the null reference issue +4. Implements a defensive fix +5. Creates test to prevent regression -## Related Files +--- -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) -- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) -- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) +#### Tutorial: Fix a Memory Leak -### TODO Management -Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md +**Goal**: Identify and fix a memory leak -> The reference workflow is available [here](#reference-workflow)! +**Prompt**: +``` +Investigate and fix the memory leak in our Node.js application. +Symptoms: +- Memory usage grows 100MB/hour +- After 24 hours, app becomes unresponsive +- Restarting temporarily fixes the issue -Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership +Suspected areas: +- Event listeners in src/events/ +- Cache implementation in src/cache/ +- WebSocket connections in src/ws/ -## Quick Start +Analyze these areas and: +1. Identify the leak source +2. Explain why it's leaking +3. Implement a fix +4. Add monitoring to detect future leaks +``` - - - ```bash icon="terminal" - cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml - ``` - - - Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` - (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). - - - Go to `Settings → Actions → General → Workflow permissions` and enable: - - `Read and write permissions` - - `Allow GitHub Actions to create and approve pull requests` - - - Trigger the agent by adding TODO comments into your code. +--- - Example: `# TODO(openhands): Add input validation for user email` +### Feature Development - - The workflow is configurable and any identifier can be used in place of `TODO(openhands)` - - - +#### Tutorial: Add a REST API Endpoint +**Goal**: Create a new API endpoint with full functionality -## Features +**Prompt**: +``` +Add a user preferences API endpoint. -- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. -- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it -- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers +Endpoint: /api/users/:id/preferences -## Best Practices +Operations: +- GET: Retrieve user preferences +- PUT: Update user preferences +- PATCH: Partially update preferences -- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow -- **Clear Descriptions** - Write descriptive TODO comments -- **Review PRs** - Always review the generated PRs before merging +Preferences schema: +{ + theme: "light" | "dark", + notifications: { email: boolean, push: boolean }, + language: string, + timezone: string +} -## Reference Workflow +Requirements: +1. Follow patterns in src/api/routes/ +2. Add request validation with Joi +3. Use UserPreferencesService for business logic +4. Add appropriate error handling +5. Document the endpoint in OpenAPI format +6. Add unit and integration tests +``` - -This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) - +**What OpenHands does**: +1. Creates route handler following existing patterns +2. Implements validation middleware +3. Creates or updates the service layer +4. Adds error handling +5. Generates API documentation +6. Creates comprehensive tests -```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml --- -# Automated TODO Management Workflow -# Make sure to replace and with -# appropriate values for your LLM setup. -# -# This workflow automatically scans for TODO(openhands) comments and creates -# pull requests to implement them using the OpenHands agent. -# -# Setup: -# 1. Add LLM_API_KEY to repository secrets -# 2. Ensure GITHUB_TOKEN has appropriate permissions -# 3. Make sure Github Actions are allowed to create and review PRs -# 4. Commit this file to .github/workflows/ in your repository -# 5. Configure the schedule or trigger manually -name: Automated TODO Management +#### Tutorial: Implement a Feature Flag System -on: - # Manual trigger - workflow_dispatch: - inputs: - max_todos: - description: Maximum number of TODOs to process in this run - required: false - default: '3' - type: string - todo_identifier: - description: TODO identifier to search for (e.g., TODO(openhands)) - required: false - default: TODO(openhands) - type: string +**Goal**: Add feature flags to the application - # Trigger when 'automatic-todo' label is added to a PR - pull_request: - types: [labeled] +**Prompt**: +``` +Implement a feature flag system for our application. - # Scheduled trigger (disabled by default, uncomment and customize as needed) - # schedule: - # # Run every Monday at 9 AM UTC - # - cron: "0 9 * * 1" +Requirements: +1. Create a FeatureFlags service +2. Support these flag types: + - Boolean (on/off) + - Percentage (gradual rollout) + - User-based (specific user IDs) +3. Load flags from environment variables initially +4. Add a React hook: useFeatureFlag(flagName) +5. Add middleware for API routes -permissions: - contents: write - pull-requests: write - issues: write +Initial flags to configure: +- new_checkout: boolean, default false +- dark_mode: percentage, default 10% +- beta_features: user-based -jobs: - scan-todos: - runs-on: ubuntu-latest - # Only run if triggered manually or if 'automatic-todo' label was added - if: > - github.event_name == 'workflow_dispatch' || - (github.event_name == 'pull_request' && - github.event.label.name == 'automatic-todo') - outputs: - todos: ${{ steps.scan.outputs.todos }} - todo-count: ${{ steps.scan.outputs.todo-count }} - steps: - - name: Checkout repository - uses: actions/checkout@v4 - with: - fetch-depth: 0 # Full history for better context +Include documentation and tests. +``` - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.13' +--- + +## Contributing Tutorials + +Have a great use case? Share it with the community! + +**What makes a good tutorial:** +- Solves a common problem +- Has clear, reproducible steps +- Includes example prompts +- Explains expected outcomes +- Provides tips for success + +**How to contribute:** +1. Create a detailed example following this format +2. Test it with OpenHands to verify it works +3. Submit via GitHub pull request to the docs repository +4. Include any prerequisites or setup required - - name: Copy TODO scanner - run: | - cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py - chmod +x /tmp/scanner.py + +These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. + - - name: Scan for TODOs - id: scan - run: | - echo "Scanning for TODO comments..." +### Key Features +Source: https://docs.openhands.dev/openhands/usage/key-features.md - # Run the scanner and capture output - TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" - python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json + + + - Displays the conversation between the user and OpenHands. + - OpenHands explains its actions in this panel. - # Count TODOs - TODO_COUNT=$(python -c \ - "import json; data=json.load(open('todos.json')); print(len(data))") - echo "Found $TODO_COUNT $TODO_IDENTIFIER items" + ![overview](/openhands/static/img/chat-panel.png) + + + - Shows the file changes performed by OpenHands. - # Limit the number of TODOs to process - MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" - if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then - echo "Limiting to first $MAX_TODOS TODOs" - python -c " - import json - data = json.load(open('todos.json')) - limited = data[:$MAX_TODOS] - json.dump(limited, open('todos.json', 'w'), indent=2) - " - TODO_COUNT=$MAX_TODOS - fi + ![overview](/openhands/static/img/changes-tab.png) + + + - Embedded VS Code for browsing and modifying files. + - Can also be used to upload and download files. - # Set outputs - echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT - echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT + ![overview](/openhands/static/img/vs-tab.png) + + + - A space for OpenHands and users to run terminal commands. - # Display found TODOs - echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY - if [ "$TODO_COUNT" -eq 0 ]; then - echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY - else - echo "Found $TODO_COUNT TODO(openhands) items:" \ - >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - python -c " - import json - data = json.load(open('todos.json')) - for i, todo in enumerate(data, 1): - print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + - f'{todo[\"description\"]}') - " >> $GITHUB_STEP_SUMMARY - fi + ![overview](/openhands/static/img/terminal-tab.png) + + + - Displays the web server when OpenHands runs an application. + - Users can interact with the running application. - process-todos: - needs: scan-todos - if: needs.scan-todos.outputs.todo-count > 0 - runs-on: ubuntu-latest - strategy: - matrix: - todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} - max-parallel: 1 # Process one TODO at a time to avoid conflicts - steps: - - name: Checkout repository - uses: actions/checkout@v4 - with: - fetch-depth: 0 - token: ${{ secrets.GITHUB_TOKEN }} + ![overview](/openhands/static/img/app-tab.png) + + + - Used by OpenHands to browse websites. + - The browser is non-interactive. - - name: Switch to feature branch with TODO management files - run: | - git checkout openhands/todo-management-example - git pull origin openhands/todo-management-example + ![overview](/openhands/static/img/browser-tab.png) + + - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.13' +### Azure +Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms.md - - name: Install uv - uses: astral-sh/setup-uv@v6 - with: - enable-cache: true +## Azure OpenAI Configuration - - name: Install OpenHands dependencies - run: | - # Install OpenHands SDK and tools from git repository - uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" - uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" +When running OpenHands, you'll need to set the following environment variable using `-e` in the +docker run command: - - name: Copy agent files - run: | - cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py - cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py - chmod +x agent.py +``` +LLM_API_VERSION="" # e.g. "2023-05-15" +``` - - name: Configure Git - run: | - git config --global user.name "openhands-bot" - git config --global user.email \ - "openhands-bot@users.noreply.github.com" +Example: +```bash +docker run -it --pull=always \ + -e LLM_API_VERSION="2023-05-15" + ... +``` - - name: Process TODO - env: - LLM_MODEL: - LLM_BASE_URL: - LLM_API_KEY: ${{ secrets.LLM_API_KEY }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - GITHUB_REPOSITORY: ${{ github.repository }} - TODO_FILE: ${{ matrix.todo.file }} - TODO_LINE: ${{ matrix.todo.line }} - TODO_DESCRIPTION: ${{ matrix.todo.description }} - PYTHONPATH: '' - run: | - echo "Processing TODO: $TODO_DESCRIPTION" - echo "File: $TODO_FILE:$TODO_LINE" +Then in the OpenHands UI Settings under the `LLM` tab: - # Create a unique branch name for this TODO - BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ - sed 's/[^a-zA-Z0-9]/-/g' | \ - sed 's/--*/-/g' | \ - sed 's/^-\|-$//g' | \ - tr '[:upper:]' '[:lower:]' | \ - cut -c1-50)" - echo "Branch name: $BRANCH_NAME" + +You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as +<deployment-name> below. + - # Create and switch to new branch (force create if exists) - git checkout -B "$BRANCH_NAME" +1. Enable `Advanced` options. +2. Set the following: + - `Custom Model` to azure/<deployment-name> + - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) + - `API Key` to your Azure API key - # Run the agent to process the TODO - # Stay in repository directory for git operations +### Azure OpenAI Configuration - # Create JSON payload for the agent - TODO_JSON=$(cat <&1 | tee agent_output.log - AGENT_EXIT_CODE=$? - set -e +## How It Works - echo "Agent exit code: $AGENT_EXIT_CODE" - echo "Agent output log:" - cat agent_output.log +Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: - # Show files in working directory - echo "Files in working directory:" - ls -la +```toml +# Default LLM configuration +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 - # If agent failed, show more details - if [ $AGENT_EXIT_CODE -ne 0 ]; then - echo "Agent failed with exit code $AGENT_EXIT_CODE" - echo "Last 50 lines of agent output:" - tail -50 agent_output.log - exit $AGENT_EXIT_CODE - fi +# Custom LLM configuration for a cheaper model +[llm.gpt3] +model = "gpt-3.5-turbo" +api_key = "your-api-key" +temperature = 0.2 - # Check if any changes were made - cd "$GITHUB_WORKSPACE" - if git diff --quiet; then - echo "No changes made by agent, skipping PR creation" - exit 0 - fi +# Another custom configuration with different parameters +[llm.high-creativity] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.8 +top_p = 0.9 +``` - # Commit changes - git add -A - git commit -m "Implement TODO: $TODO_DESCRIPTION +Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. - Automatically implemented by OpenHands agent. +## Using Custom Configurations + +### With Agents - Co-authored-by: openhands " +You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: - # Push branch - git push origin "$BRANCH_NAME" +```toml +[agent.RepoExplorerAgent] +# Use the cheaper GPT-3 configuration for this agent +llm_config = 'gpt3' - # Create pull request - PR_TITLE="Implement TODO: $TODO_DESCRIPTION" - PR_BODY="## 🤖 Automated TODO Implementation +[agent.CodeWriterAgent] +# Use the high creativity configuration for this agent +llm_config = 'high-creativity' +``` - This PR automatically implements the following TODO: +### Configuration Options - **File:** \`$TODO_FILE:$TODO_LINE\` - **Description:** $TODO_DESCRIPTION +Each named LLM configuration supports all the same options as the default LLM configuration. These include: - ### Implementation - The OpenHands agent has analyzed the TODO and implemented the - requested functionality. +- Model selection (`model`) +- API configuration (`api_key`, `base_url`, etc.) +- Model parameters (`temperature`, `top_p`, etc.) +- Retry settings (`num_retries`, `retry_multiplier`, etc.) +- Token limits (`max_input_tokens`, `max_output_tokens`) +- And all other LLM configuration options - ### Review Notes - - Please review the implementation for correctness - - Test the changes in your development environment - - The original TODO comment will be updated with this PR URL - once merged +For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. - --- - *This PR was created automatically by the TODO Management workflow.*" +## Use Cases - # Create PR using GitHub CLI or API - curl -X POST \ - -H "Authorization: token $GITHUB_TOKEN" \ - -H "Accept: application/vnd.github.v3+json" \ - "https://api.github.com/repos/${{ github.repository }}/pulls" \ - -d "{ - \"title\": \"$PR_TITLE\", - \"body\": \"$PR_BODY\", - \"head\": \"$BRANCH_NAME\", - \"base\": \"${{ github.ref_name }}\" - }" +Custom LLM configurations are particularly useful in several scenarios: - summary: - needs: [scan-todos, process-todos] - if: always() - runs-on: ubuntu-latest - steps: - - name: Generate Summary - run: | - echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY +- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. +- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. +- **Different Providers**: Use different LLM providers or API endpoints for different tasks. +- **Testing and Development**: Easily switch between different model configurations during development and testing. - TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" - echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY +## Example: Cost Optimization - if [ "$TODO_COUNT" -gt 0 ]; then - echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - echo "Check the pull requests created for each TODO" \ - "implementation." >> $GITHUB_STEP_SUMMARY - else - echo "**Status:** ℹ️ No TODOs found to process" \ - >> $GITHUB_STEP_SUMMARY - fi +A practical example of using custom LLM configurations to optimize costs: - echo "" >> $GITHUB_STEP_SUMMARY - echo "---" >> $GITHUB_STEP_SUMMARY - echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY -``` +```toml +# Default configuration using GPT-4 for high-quality responses +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 -## Related Documentation +# Cheaper configuration for repository exploration +[llm.repo-explorer] +model = "gpt-3.5-turbo" +temperature = 0.2 -- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) -- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) -- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) -- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) +# Configuration for code generation +[llm.code-gen] +model = "gpt-4" +temperature = 0.0 +max_output_tokens = 2000 -### Hello World -Source: https://docs.openhands.dev/sdk/guides/hello-world.md +[agent.RepoExplorerAgent] +llm_config = 'repo-explorer' -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +[agent.CodeWriterAgent] +llm_config = 'code-gen' +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +In this example: +- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code +- Code generation uses GPT-4 with a higher token limit for generating larger code blocks +- The default configuration remains available for other tasks -## Your First Agent +# Custom Configurations with Reserved Names -This is the most basic example showing how to set up and run an OpenHands agent. +OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. - - - ### LLM Configuration +## Draft Editor Configuration - Configure the language model that will power your agent: - ```python icon="python" - llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, # Optional - service_id="agent" - ) - ``` - - - ### Select an Agent - Use the preset agent with common built-in tools: - ```python icon="python" - agent = get_default_agent(llm=llm, cli_mode=True) - ``` - The default agent includes `BashTool`, `FileEditorTool`, etc. - - For the complete list of available tools see the - [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). - +The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. - - - ### Start a Conversation - Start a conversation to manage the agent's lifecycle: - ```python icon="python" - conversation = Conversation(agent=agent, workspace=cwd) - conversation.send_message( - "Write 3 facts about the current project into FACTS.txt." - ) - conversation.run() - ``` - - - ### Expected Behavior - When you run this example: - 1. The agent analyzes the current directory - 2. Gathers information about the project - 3. Creates `FACTS.txt` with 3 relevant facts - 4. Completes and exits +For example, you can define in `config.toml` a draft editor like this: - Example output file: +```toml +[llm.draft_editor] +model = "gpt-4" +temperature = 0.2 +top_p = 0.95 +presence_penalty = 0.0 +frequency_penalty = 0.0 +``` - ```text icon="text" wrap - FACTS.txt - --------- - 1. This is a Python project using the OpenHands Software Agent SDK. - 2. The project includes examples demonstrating various agent capabilities. - 3. The SDK provides tools for file manipulation, bash execution, and more. - ``` - - +This configuration: +- Uses GPT-4 for high-quality edits and suggestions +- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility +- Uses a high top_p value (0.95) to consider a wide range of token options +- Disables presence and frequency penalties to maintain focus on the specific edits needed -## Ready-to-run Example +Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: +- Review and suggest code improvements +- Refine existing content while maintaining its core meaning +- Make precise, focused changes to code or text -This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) +Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. -```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py -import os - -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +### Google Gemini/Vertex +Source: https://docs.openhands.dev/openhands/usage/llms/google-llms.md +## Gemini - Google AI Studio Configs -llm = LLM( - model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), - api_key=os.getenv("LLM_API_KEY"), - base_url=os.getenv("LLM_BASE_URL", None), -) +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Gemini` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). +- `API Key` to your Gemini API key -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], -) +## VertexAI - Google Cloud Platform Configs -cwd = os.getcwd() -conversation = Conversation(agent=agent, workspace=cwd) +To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment +variables using `-e` in the docker run command: -conversation.send_message("Write 3 facts about the current project into FACTS.txt.") -conversation.run() -print("All done!") +``` +GOOGLE_APPLICATION_CREDENTIALS="" +VERTEXAI_PROJECT="" +VERTEXAI_LOCATION="" ``` - +Then set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `VertexAI` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. vertex_ai/<model-name>). -## Next Steps +### Groq +Source: https://docs.openhands.dev/openhands/usage/llms/groq.md -- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs -- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers -- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage +## Configuration -### Hooks -Source: https://docs.openhands.dev/sdk/guides/hooks.md +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Groq` +- `LLM Model` to the model you will be using. [Visit here to see the list of +models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, +enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). +- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). + +## Using Groq as an OpenAI-Compatible Endpoint -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you +would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) + - `Base URL` to `https://api.groq.com/openai/v1` + - `API Key` to your Groq API key -> A ready-to-run example is available [here](#ready-to-run-example)! +### LiteLLM Proxy +Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md -## Overview +## Configuration -Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: -- Logging and analytics -- Emitting custom metrics -- Auditing or compliance -- Tracing and debugging +To use LiteLLM proxy with OpenHands, you need to: -## Hook Types +1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) +2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: + * Enable `Advanced` options + * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) + * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) + * `API Key` to your LiteLLM proxy API key -| Hook | When it runs | Can block? | -|------|--------------|------------| -| PreToolUse | Before tool execution | Yes (exit 2) | -| PostToolUse | After tool execution | No | -| UserPromptSubmit | Before processing user message | Yes (exit 2) | -| Stop | When agent tries to finish | Yes (exit 2) | -| SessionStart | When conversation starts | No | -| SessionEnd | When conversation ends | No | +## Supported Models -## Key Concepts +The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy +is configured to handle. -- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution -- Isolation: hooks run outside the agent loop logic, avoiding core modifications -- Composition: enable or disable hooks per environment (local vs. prod) +Refer to your LiteLLM proxy configuration for the list of available models and their names. -## Ready-to-run Example +### Overview +Source: https://docs.openhands.dev/openhands/usage/llms/llms.md -This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) +This section is for users who want to connect OpenHands to different LLMs. -```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py -"""OpenHands Agent SDK — Hooks Example - -Demonstrates the OpenHands hooks system. -Hooks are shell scripts that run at key lifecycle events: - -- PreToolUse: Block dangerous commands before execution -- PostToolUse: Log tool usage after execution -- UserPromptSubmit: Inject context into user messages -- Stop: Enforce task completion criteria - -The hook scripts are in the scripts/ directory alongside this file. -""" + +OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this +page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation +for the canonical list of supported parameters. + -import os -import signal -import tempfile -from pathlib import Path +## Model Recommendations -from pydantic import SecretStr +Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some +recommendations for model selection. Our latest benchmarking results can be found in +[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). -from openhands.sdk import LLM, Conversation -from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher -from openhands.tools.preset.default import get_default_agent +Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: +### Cloud / API-Based Models -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) +- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) +- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) +- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) +- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) +- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) +- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) -SCRIPT_DIR = Path(__file__).parent / "hook_scripts" +If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process +to help others using the same provider! -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +For a full list of the providers and models available, please consult the +[litellm documentation](https://docs.litellm.ai/docs/providers). -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + +OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending +limits and monitor usage. + -# Create temporary workspace with git repo -with tempfile.TemporaryDirectory() as tmpdir: - workspace = Path(tmpdir) - os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") +### Local / Self-Hosted Models - log_file = workspace / "tool_usage.log" - summary_file = workspace / "summary.txt" +- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) +- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) - # Configure hooks using the typed approach (recommended) - # This provides better type safety and IDE support - hook_config = HookConfig( - pre_tool_use=[ - HookMatcher( - matcher="terminal", - hooks=[ - HookDefinition( - command=str(SCRIPT_DIR / "block_dangerous.sh"), - timeout=10, - ) - ], - ) - ], - post_tool_use=[ - HookMatcher( - matcher="*", - hooks=[ - HookDefinition( - command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), - timeout=5, - ) - ], - ) - ], - user_prompt_submit=[ - HookMatcher( - hooks=[ - HookDefinition( - command=str(SCRIPT_DIR / "inject_git_context.sh"), - ) - ], - ) - ], - stop=[ - HookMatcher( - hooks=[ - HookDefinition( - command=( - f"SUMMARY_FILE={summary_file} " - f"{SCRIPT_DIR / 'require_summary.sh'}" - ), - ) - ], - ) - ], - ) +### Known Issues - # Alternative: You can also use .from_dict() for loading from JSON config files - # Example with a single hook matcher: - # hook_config = HookConfig.from_dict({ - # "hooks": { - # "PreToolUse": [{ - # "matcher": "terminal", - # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] - # }] - # } - # }) + +Most current local and open source models are not as powerful. When using such models, you may see long +wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the +models driving it. However, if you do find ones that work, please add them to the verified list above. + - agent = get_default_agent(llm=llm) - conversation = Conversation( - agent=agent, - workspace=str(workspace), - hook_config=hook_config, - ) +## LLM Configuration - # Demo 1: Safe command (PostToolUse logs it) - print("=" * 60) - print("Demo 1: Safe command - logged by PostToolUse") - print("=" * 60) - conversation.send_message("Run: echo 'Hello from hooks!'") - conversation.run() +The following can be set in the OpenHands UI through the Settings. Each option is serialized into the +`LLM.load_from_env()` schema before being passed to the Agent SDK: - if log_file.exists(): - print(f"\n[Log: {log_file.read_text().strip()}]") +- `LLM Provider` +- `LLM Model` +- `API Key` +- `Base URL` (through `Advanced` settings) - # Demo 2: Dangerous command (PreToolUse blocks it) - print("\n" + "=" * 60) - print("Demo 2: Dangerous command - blocked by PreToolUse") - print("=" * 60) - conversation.send_message("Run: rm -rf /tmp/test") - conversation.run() +There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them +as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: - # Demo 3: Context injection + Stop hook enforcement - print("\n" + "=" * 60) - print("Demo 3: Context injection + Stop hook") - print("=" * 60) - print("UserPromptSubmit injects git status; Stop requires summary.txt\n") - conversation.send_message( - "Check what files have changes, then create summary.txt describing the repo." - ) - conversation.run() +- `LLM_API_VERSION` +- `LLM_EMBEDDING_MODEL` +- `LLM_EMBEDDING_DEPLOYMENT_NAME` +- `LLM_DROP_PARAMS` +- `LLM_DISABLE_VISION` +- `LLM_CACHING_PROMPT` - if summary_file.exists(): - print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") +## LLM Provider Guides - print("\n" + "=" * 60) - print("Example Complete!") - print("=" * 60) +We have a few guides for running OpenHands with specific model providers: - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"\nEXAMPLE_COST: {cost}") -``` - +- [Azure](/openhands/usage/llms/azure-llms) +- [Google](/openhands/usage/llms/google-llms) +- [Groq](/openhands/usage/llms/groq) +- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) +- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) +- [Moonshot AI](/openhands/usage/llms/moonshot) +- [OpenAI](/openhands/usage/llms/openai-llms) +- [OpenHands](/openhands/usage/llms/openhands-llms) +- [OpenRouter](/openhands/usage/llms/openrouter) +These pages remain the authoritative provider references for both the Agent SDK +and the OpenHands interfaces. -### Hook Scripts +## Model Customization -The example uses external hook scripts in the `hook_scripts/` directory: +LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: - -```bash -#!/bin/bash -# PreToolUse hook: Block dangerous rm -rf commands -# Uses jq for JSON parsing (needed for nested fields like tool_input.command) +- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. +- **Native Tool Calling**: Toggle native function/tool calling capabilities. -input=$(cat) -command=$(echo "$input" | jq -r '.tool_input.command // ""') +For detailed information about model customization, see +[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). -# Block rm -rf commands -if [[ "$command" =~ "rm -rf" ]]; then - echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' - exit 2 # Exit code 2 = block the operation -fi +### API retries and rate limits -exit 0 # Exit code 0 = allow the operation -``` - +LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically +retry requests if it receives a Rate Limit Error (429 error code). - -```bash -#!/bin/bash -# PostToolUse hook: Log all tool usage -# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) +You can customize these options as you need for the provider you're using. Check their documentation, and set the +following environment variables to control the number of retries and the time between retries: -# LOG_FILE should be set by the calling script -LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" +- `LLM_NUM_RETRIES` (Default of 4 times) +- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) +- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) +- `LLM_RETRY_MULTIPLIER` (Default of 2) -echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" -exit 0 +If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: + +```toml +[llm] +num_retries = 4 +retry_min_wait = 5 +retry_max_wait = 30 +retry_multiplier = 2 ``` - - -```bash -#!/bin/bash -# UserPromptSubmit hook: Inject git status when user asks about code changes +### Local LLMs +Source: https://docs.openhands.dev/openhands/usage/llms/local-llms.md -input=$(cat) +## News -# Check if user is asking about changes, diff, or git -if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then - # Get git status if in a git repo - if git rev-parse --git-dir > /dev/null 2>&1; then - status=$(git status --short 2>/dev/null | head -10) - if [ -n "$status" ]; then - # Escape for JSON - escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') - echo "{\"additionalContext\": \"Current git status: $escaped\"}" - fi - fi -fi -exit 0 -``` - +- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! - -```bash -#!/bin/bash -# Stop hook: Require a summary.txt file before allowing agent to finish -# SUMMARY_FILE should be set by the calling script +## Quickstart: Running OpenHands with a Local LLM using LM Studio -SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" +This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. -if [ ! -f "$SUMMARY_FILE" ]; then - echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' - exit 2 -fi -exit 0 -``` - +We recommend: +- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. +- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. +### Hardware Requirements -## Next Steps +Running Qwen3-Coder-30B-A3B-Instruct requires: +- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or +- A Mac with Apple Silicon with at least 32GB of RAM -- See also: [Metrics and Observability](/sdk/guides/metrics) -- Architecture: [Events](/sdk/arch/events) +### 1. Install LM Studio -### Iterative Refinement -Source: https://docs.openhands.dev/sdk/guides/iterative-refinement.md +Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### 2. Download the Model -> The ready-to-run example is available [here](#ready-to-run-example)! +1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. +2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. -## Overview +![image](./screenshots/01_lm_studio_open_model_hub.png) -Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: -1. A **refactoring agent** performs the main task (e.g., code conversion) -2. A **critique agent** evaluates the quality and provides detailed feedback -3. If quality is below threshold, the refactoring agent tries again with the feedback +3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. -This pattern is useful for: -- Code refactoring and modernization (e.g., COBOL to Java) -- Document translation and localization -- Content generation with quality requirements -- Any task requiring iterative improvement +![image](./screenshots/02_lm_studio_download_devstral.png) -## How It Works +4. Wait for the download to finish. -### The Iteration Loop +### 3. Load the Model -The core workflow runs in a loop until quality threshold is met: +1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. +2. Click the "Select a model to load" dropdown at the top of the application window. -```python icon="python" wrap -QUALITY_THRESHOLD = 90.0 -MAX_ITERATIONS = 5 +![image](./screenshots/03_lm_studio_open_load_model.png) -while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - # Phase 1: Refactoring agent converts COBOL to Java - refactoring_agent = get_default_agent(llm=llm, cli_mode=True) - refactoring_conversation = Conversation( - agent=refactoring_agent, - workspace=str(workspace_dir) - ) - refactoring_conversation.send_message(refactoring_prompt) - refactoring_conversation.run() +3. Enable the "Manually choose model load parameters" switch. +4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. - # Phase 2: Critique agent evaluates the conversion - critique_agent = get_default_agent(llm=llm, cli_mode=True) - critique_conversation = Conversation( - agent=critique_agent, - workspace=str(workspace_dir) - ) - critique_conversation.send_message(critique_prompt) - critique_conversation.run() +![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) - # Parse score and decide whether to continue - current_score = parse_critique_score(critique_file) +5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. +6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. +7. Click "Load Model" to start loading the model. - iteration += 1 -``` +![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) -### Critique Scoring +### 4. Start the LLM server -The critique agent evaluates each file on four dimensions (0-25 pts each): -- **Correctness**: Does the Java code preserve the original business logic? -- **Code Quality**: Is the code clean and following Java conventions? -- **Completeness**: Are all COBOL features properly converted? -- **Best Practices**: Does it use proper OOP, error handling, and documentation? +1. Enable the switch next to "Status" at the top-left of the Window. +2. Take note of the Model API Identifier shown on the sidebar on the right. -### Feedback Loop +![image](./screenshots/06_lm_studio_start_server.png) -When the score is below threshold, the refactoring agent receives the critique file location: +### 5. Start OpenHands -```python icon="python" wrap -if critique_file and critique_file.exists(): - base_prompt += f""" -IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. -Please review the critique at: {critique_file} -Address all issues mentioned in the critique to improve the conversion quality. -""" +1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: + +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 ``` -## Customization +2. Wait until the server is running (see log below): +``` +Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f +Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 +Starting OpenHands... +Running OpenHands as root +14:22:13 - openhands:INFO: server_config.py:50 - Using config class None +INFO: Started server process [8] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) +``` -### Adjusting Thresholds +3. Visit `http://localhost:3000` in your browser. -```python icon="python" wrap -QUALITY_THRESHOLD = 95.0 # Require higher quality -MAX_ITERATIONS = 10 # Allow more iterations -``` +### 6. Configure OpenHands to use the LLM server -### Using Real COBOL Files +Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. -The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). +When started for the first time, OpenHands will prompt you to set up the LLM provider. -## Ready-to-run Example +1. Click "see advanced settings" to open the LLM Settings page. - -This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) - +![image](./screenshots/07_openhands_open_advanced_settings.png) -```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py -#!/usr/bin/env python3 -""" -Iterative Refinement Example: COBOL to Java Refactoring +2. Enable the "Advanced" switch at the top of the page to show all the available settings. -This example demonstrates an iterative refinement workflow where: -1. A refactoring agent converts COBOL files to Java files -2. A critique agent evaluates the quality of each conversion and provides scores -3. If the average score is below 90%, the process repeats with feedback +3. Set the following values: + - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") + - **Base URL**: `http://host.docker.internal:1234/v1` + - **API Key**: `local-llm` -The workflow continues until the refactoring meets the quality threshold. +4. Click "Save Settings" to save the configuration. -Source COBOL files can be obtained from: -https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl -""" +![image](./screenshots/08_openhands_configure_local_llm_parameters.png) -import os -import re -import tempfile -from pathlib import Path +That's it! You can now start using OpenHands with the local LLM server. -from pydantic import SecretStr +If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). -from openhands.sdk import LLM, Conversation -from openhands.tools.preset.default import get_default_agent +## Advanced: Alternative LLM Backends +This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. -QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) -MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) +### Create an OpenAI-Compatible Endpoint with Ollama +- Install Ollama following [the official documentation](https://ollama.com/download). +- Example launch command for Qwen3-Coder-30B-A3B-Instruct: -def setup_workspace() -> tuple[Path, Path, Path]: - """Create workspace directories for the refactoring workflow.""" - workspace_dir = Path(tempfile.mkdtemp()) - cobol_dir = workspace_dir / "cobol" - java_dir = workspace_dir / "java" - critique_dir = workspace_dir / "critiques" +```bash +# ⚠️ WARNING: OpenHands requires a large context size to work properly. +# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. +# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. +OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & +ollama pull qwen3-coder:30b +``` - cobol_dir.mkdir(parents=True, exist_ok=True) - java_dir.mkdir(parents=True, exist_ok=True) - critique_dir.mkdir(parents=True, exist_ok=True) +### Create an OpenAI-Compatible Endpoint with vLLM or SGLang - return workspace_dir, cobol_dir, java_dir +First, download the model checkpoint: +```bash +huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct +``` -def create_sample_cobol_files(cobol_dir: Path) -> list[str]: - """Create sample COBOL files for demonstration. +#### Serving the model using SGLang - In a real scenario, you would clone files from: - https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl - """ - sample_files = { - "CBACT01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBACT01C. - ***************************************************************** - * Program: CBACT01C - Account Display Program - * Purpose: Display account information for a given account number - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-ACCOUNT-ID PIC 9(11). - 01 WS-ACCOUNT-STATUS PIC X(1). - 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. - 01 WS-CUSTOMER-NAME PIC X(50). - 01 WS-ERROR-MSG PIC X(80). +- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). +- Example launch command (with at least 2 GPUs): - PROCEDURE DIVISION. - PERFORM 1000-INIT. - PERFORM 2000-PROCESS. - PERFORM 3000-TERMINATE. - STOP RUN. +```bash +SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ + --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --port 8000 \ + --tp 2 --dp 1 \ + --host 0.0.0.0 \ + --api-key mykey --context-length 131072 +``` - 1000-INIT. - INITIALIZE WS-ACCOUNT-ID - INITIALIZE WS-ACCOUNT-STATUS - INITIALIZE WS-ACCOUNT-BALANCE - INITIALIZE WS-CUSTOMER-NAME. +#### Serving the model using vLLM - 2000-PROCESS. - DISPLAY "ENTER ACCOUNT NUMBER: " - ACCEPT WS-ACCOUNT-ID - IF WS-ACCOUNT-ID = ZEROS - MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG - DISPLAY WS-ERROR-MSG - ELSE - DISPLAY "ACCOUNT: " WS-ACCOUNT-ID - DISPLAY "STATUS: " WS-ACCOUNT-STATUS - DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE - END-IF. +- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). +- Example launch command (with at least 2 GPUs): - 3000-TERMINATE. - DISPLAY "PROGRAM COMPLETE". -""", - "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBCUS01C. - ***************************************************************** - * Program: CBCUS01C - Customer Information Program - * Purpose: Manage customer data operations - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-CUSTOMER-ID PIC 9(9). - 01 WS-FIRST-NAME PIC X(25). - 01 WS-LAST-NAME PIC X(25). - 01 WS-ADDRESS PIC X(100). - 01 WS-PHONE PIC X(15). - 01 WS-EMAIL PIC X(50). - 01 WS-OPERATION PIC X(1). - 88 OP-ADD VALUE 'A'. - 88 OP-UPDATE VALUE 'U'. - 88 OP-DELETE VALUE 'D'. - 88 OP-DISPLAY VALUE 'V'. +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --enable-prefix-caching +``` - PROCEDURE DIVISION. - PERFORM 1000-MAIN-PROCESS. - STOP RUN. +If you are interested in further improved inference speed, you can also try Snowflake's version +of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), +which can achieve up to 2x speedup in some cases. - 1000-MAIN-PROCESS. - DISPLAY "CUSTOMER MANAGEMENT SYSTEM" - DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" - ACCEPT WS-OPERATION - EVALUATE TRUE - WHEN OP-ADD - PERFORM 2000-ADD-CUSTOMER - WHEN OP-UPDATE - PERFORM 3000-UPDATE-CUSTOMER - WHEN OP-DELETE - PERFORM 4000-DELETE-CUSTOMER - WHEN OP-DISPLAY - PERFORM 5000-DISPLAY-CUSTOMER - WHEN OTHER - DISPLAY "INVALID OPERATION" - END-EVALUATE. +1. Install the Arctic Inference library that automatically patches vLLM: - 2000-ADD-CUSTOMER. - DISPLAY "ADDING NEW CUSTOMER" - ACCEPT WS-CUSTOMER-ID - ACCEPT WS-FIRST-NAME - ACCEPT WS-LAST-NAME - DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. +```bash +pip install git+https://github.com/snowflakedb/ArcticInference.git +``` - 3000-UPDATE-CUSTOMER. - DISPLAY "UPDATING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. +2. Run the launch command with speculative decoding enabled: - 4000-DELETE-CUSTOMER. - DISPLAY "DELETING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --speculative-config '{"method": "suffix"}' +``` - 5000-DISPLAY-CUSTOMER. - DISPLAY "DISPLAYING CUSTOMER" - ACCEPT WS-CUSTOMER-ID - DISPLAY "ID: " WS-CUSTOMER-ID - DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. -""", - "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. - PROGRAM-ID. CBTRN01C. - ***************************************************************** - * Program: CBTRN01C - Transaction Processing Program - * Purpose: Process financial transactions - ***************************************************************** - ENVIRONMENT DIVISION. - DATA DIVISION. - WORKING-STORAGE SECTION. - 01 WS-TRANS-ID PIC 9(16). - 01 WS-TRANS-TYPE PIC X(2). - 88 TRANS-CREDIT VALUE 'CR'. - 88 TRANS-DEBIT VALUE 'DB'. - 88 TRANS-TRANSFER VALUE 'TR'. - 01 WS-TRANS-AMOUNT PIC S9(13)V99. - 01 WS-FROM-ACCOUNT PIC 9(11). - 01 WS-TO-ACCOUNT PIC 9(11). - 01 WS-TRANS-DATE PIC 9(8). - 01 WS-TRANS-STATUS PIC X(10). +### Run OpenHands (Alternative Backends) - PROCEDURE DIVISION. - PERFORM 1000-INITIALIZE. - PERFORM 2000-PROCESS-TRANSACTION. - PERFORM 3000-FINALIZE. - STOP RUN. +#### Using Docker - 1000-INITIALIZE. - MOVE ZEROS TO WS-TRANS-ID - MOVE SPACES TO WS-TRANS-TYPE - MOVE ZEROS TO WS-TRANS-AMOUNT - MOVE "PENDING" TO WS-TRANS-STATUS. +Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). - 2000-PROCESS-TRANSACTION. - DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " - ACCEPT WS-TRANS-TYPE - DISPLAY "ENTER AMOUNT: " - ACCEPT WS-TRANS-AMOUNT - EVALUATE TRUE - WHEN TRANS-CREDIT - PERFORM 2100-PROCESS-CREDIT - WHEN TRANS-DEBIT - PERFORM 2200-PROCESS-DEBIT - WHEN TRANS-TRANSFER - PERFORM 2300-PROCESS-TRANSFER - WHEN OTHER - MOVE "INVALID" TO WS-TRANS-STATUS - END-EVALUATE. +#### Using Development Mode - 2100-PROCESS-CREDIT. - DISPLAY "PROCESSING CREDIT" - ACCEPT WS-TO-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. +Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. - 2200-PROCESS-DEBIT. - DISPLAY "PROCESSING DEBIT" - ACCEPT WS-FROM-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. +Start OpenHands using `make run`. - 2300-PROCESS-TRANSFER. - DISPLAY "PROCESSING TRANSFER" - ACCEPT WS-FROM-ACCOUNT - ACCEPT WS-TO-ACCOUNT - MOVE "COMPLETED" TO WS-TRANS-STATUS - DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. +### Configure OpenHands (Alternative Backends) - 3000-FINALIZE. - DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. -""", - } +Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. - created_files = [] - for filename, content in sample_files.items(): - file_path = cobol_dir / filename - file_path.write_text(content) - created_files.append(filename) +1. Click **"see advanced settings"** to access the full configuration panel. +2. Enable the **Advanced** toggle at the top of the page. +3. Set the following parameters, if you followed the examples above: + - **Custom Model**: `openai/` + - For **Ollama**: `openai/qwen3-coder:30b` + - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` + - **Base URL**: `http://host.docker.internal:/v1` + Use port `11434` for Ollama, or `8000` for SGLang and vLLM. + - **API Key**: + - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) + - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) - return created_files +### Moonshot AI +Source: https://docs.openhands.dev/openhands/usage/llms/moonshot.md +## Using Moonshot AI with OpenHands -def get_refactoring_prompt( - cobol_dir: Path, - java_dir: Path, - cobol_files: list[str], - critique_file: Path | None = None, -) -> str: - """Generate the prompt for the refactoring agent.""" - files_list = "\n".join(f" - {f}" for f in cobol_files) +[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. - base_prompt = f"""Convert the following COBOL files to Java: +### Setup -COBOL Source Directory: {cobol_dir} -Java Target Directory: {java_dir} +1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) +2. Generate an API key from your account settings +3. Configure OpenHands to use Moonshot AI: -Files to convert: -{files_list} +| Setting | Value | +| --- | --- | +| LLM Provider | `moonshot` | +| LLM Model | `kimi-k2-0711-preview` | +| API Key | Your Moonshot API key | -Requirements: -1. Create a Java class for each COBOL program -2. Preserve the business logic and data structures -3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) -4. Convert COBOL data types to appropriate Java types -5. Implement proper error handling with try-catch blocks -6. Add JavaDoc comments explaining the purpose of each class and method -7. In JavaDoc comments, include traceability to the original COBOL source using - the format: @source : (e.g., @source CBACT01C.cbl:73-77) -8. Create a clean, maintainable object-oriented design -9. Each Java file should be compilable and follow Java best practices +### Recommended Models -Read each COBOL file and create the corresponding Java file in the target directory. -""" +- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. - if critique_file and critique_file.exists(): - base_prompt += f""" +### OpenAI +Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms.md -IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. -Please review the critique at: {critique_file} -Address all issues mentioned in the critique to improve the conversion quality. -""" +## Configuration - return base_prompt +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenAI` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). +* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). +## Using OpenAI-Compatible Endpoints -def get_critique_prompt( - cobol_dir: Path, - java_dir: Path, - cobol_files: list[str], -) -> str: - """Generate the prompt for the critique agent.""" - files_list = "\n".join(f" - {f}" for f in cobol_files) +Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). - return f"""Evaluate the quality of COBOL to Java refactoring. +## Using an OpenAI Proxy -COBOL Source Directory: {cobol_dir} -Java Target Directory: {java_dir} +If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) + - `Base URL` to the URL of your OpenAI proxy + - `API Key` to your OpenAI API key -Original COBOL files: -{files_list} +### OpenHands +Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md -Please evaluate each converted Java file against its original COBOL source. +## Obtain Your OpenHands LLM API Key -For each file, assess: -1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) -2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) -3. Completeness: Are all COBOL features properly converted? (0-25 pts) -4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) +1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. -Create a critique report in the following EXACT format: +![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) -# COBOL to Java Refactoring Critique Report +## Configuration -## Summary -[Brief overall assessment] +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `OpenHands` +- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) +- `API Key` to your OpenHands LLM API key copied from above -## File Evaluations +## Using OpenHands LLM Provider in the CLI -### [Original COBOL filename] -- **Java File**: [corresponding Java filename or "NOT FOUND"] -- **Correctness**: [score]/25 - [brief explanation] -- **Code Quality**: [score]/25 - [brief explanation] -- **Completeness**: [score]/25 - [brief explanation] -- **Best Practices**: [score]/25 - [brief explanation] -- **File Score**: [total]/100 -- **Issues to Address**: - - [specific issue 1] - - [specific issue 2] - ... +1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). +2. To select OpenHands as the LLM provider: + - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. + - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then + choose `openhands` and finally the model. -[Repeat for each file] +![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) -## Overall Score -- **Average Score**: [calculated average of all file scores] -- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] -## Priority Improvements -1. [Most critical improvement needed] -2. [Second priority] -3. [Third priority] + +When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy + -Save this report to: {java_dir.parent}/critiques/critique_report.md -""" +## Using OpenHands LLM Provider with the SDK +You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. -def parse_critique_score(critique_file: Path) -> float: - """Parse the average score from the critique report.""" - if not critique_file.exists(): - return 0.0 +### Configuration - content = critique_file.read_text() +The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: - # Look for "Average Score: X" pattern - patterns = [ - r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", - r"Average Score:\s*(\d+(?:\.\d+)?)", - r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", - ] +```bash +export LLM_API_KEY="your-openhands-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-20250514" +``` - for pattern in patterns: - match = re.search(pattern, content, re.IGNORECASE) - if match: - return float(match.group(1)) +### Example - return 0.0 +```python +from openhands.sdk import LLM +# The openhands/ prefix auto-configures the base URL +llm = LLM.load_from_env() -def run_iterative_refinement() -> None: - """Run the iterative refinement workflow.""" - # Setup - api_key = os.getenv("LLM_API_KEY") - assert api_key is not None, "LLM_API_KEY environment variable is not set." - model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") - base_url = os.getenv("LLM_BASE_URL") +# Or configure directly +llm = LLM( + model="openhands/claude-sonnet-4-20250514", + api_key="your-openhands-api-key", +) +``` - llm = LLM( - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - usage_id="iterative_refinement", - ) +The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. - workspace_dir, cobol_dir, java_dir = setup_workspace() - critique_dir = workspace_dir / "critiques" +### Available Models - print(f"Workspace: {workspace_dir}") - print(f"COBOL Directory: {cobol_dir}") - print(f"Java Directory: {java_dir}") - print(f"Critique Directory: {critique_dir}") - print() +When using the SDK, prefix any model from the pricing table below with `openhands/`: +- `openhands/claude-sonnet-4-20250514` +- `openhands/claude-sonnet-4-5-20250929` +- `openhands/claude-opus-4-20250514` +- `openhands/gpt-5-2025-08-07` +- etc. - # Create sample COBOL files - cobol_files = create_sample_cobol_files(cobol_dir) - print(f"Created {len(cobol_files)} sample COBOL files:") - for f in cobol_files: - print(f" - {f}") - print() + +If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. + - critique_file = critique_dir / "critique_report.md" - current_score = 0.0 - iteration = 0 +## Pricing - while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: - iteration += 1 - print("=" * 80) - print(f"ITERATION {iteration}") - print("=" * 80) +Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: - # Phase 1: Refactoring - print("\n--- Phase 1: Refactoring Agent ---") - refactoring_agent = get_default_agent(llm=llm, cli_mode=True) - refactoring_conversation = Conversation( - agent=refactoring_agent, - workspace=str(workspace_dir), - ) - previous_critique = critique_file if iteration > 1 else None - refactoring_prompt = get_refactoring_prompt( - cobol_dir, java_dir, cobol_files, previous_critique - ) +| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | +|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| +| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | +| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | +| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | +| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | +| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | +| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | +| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | +| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | +| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | +| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | +| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | - refactoring_conversation.send_message(refactoring_prompt) - refactoring_conversation.run() - print("Refactoring phase complete.") +**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. - # Phase 2: Critique - print("\n--- Phase 2: Critique Agent ---") - critique_agent = get_default_agent(llm=llm, cli_mode=True) - critique_conversation = Conversation( - agent=critique_agent, - workspace=str(workspace_dir), - ) +### OpenRouter +Source: https://docs.openhands.dev/openhands/usage/llms/openrouter.md + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenRouter` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). +If the model is not in the list, enable `Advanced` options, and enter it in +`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). +* `API Key` to your OpenRouter API key. + +### OpenHands GitHub Action +Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md - critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) - critique_conversation.send_message(critique_prompt) - critique_conversation.run() - print("Critique phase complete.") +## Using the Action in the OpenHands Repository - # Parse the score - current_score = parse_critique_score(critique_file) - print(f"\nCurrent Score: {current_score:.1f}%") +To use the OpenHands GitHub Action in a repository, you can: - if current_score >= QUALITY_THRESHOLD: - print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") - else: - print( - f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " - "Continuing refinement..." - ) +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. - # Final summary - print("\n" + "=" * 80) - print("ITERATIVE REFINEMENT COMPLETE") - print("=" * 80) - print(f"Total iterations: {iteration}") - print(f"Final score: {current_score:.1f}%") - print(f"Workspace: {workspace_dir}") +The action will automatically trigger and attempt to resolve the issue. - # List created Java files - print("\nCreated Java files:") - for java_file in java_dir.glob("*.java"): - print(f" - {java_file.name}") +## Installing the Action in a New Repository - # Show critique file location - if critique_file.exists(): - print(f"\nFinal critique report: {critique_file}") +To install the OpenHands GitHub Action in your own repository, follow +the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). - # Report cost - cost = llm.metrics.accumulated_cost - print(f"\nEXAMPLE_COST: {cost}") +## Usage Tips +### Iterative resolution -if __name__ == "__main__": - run_iterative_refinement() -``` +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. +3. Review the attempt to resolve the issue by checking the pull request. +4. Follow up with feedback through general comments, review comments, or inline thread comments. +5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. - +### Label versus Macro -## Next Steps +- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. +- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. -- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents -- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow +## Advanced Settings -### Exception Handling -Source: https://docs.openhands.dev/sdk/guides/llm-error-handling.md +### Add custom repository settings -The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. +You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). -This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. +### Custom configurations -## Why typed exceptions? +GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. +The customization options you can set are: -LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: +| **Attribute name** | **Type** | **Purpose** | **Example** | +| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | +| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | +| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | +| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | +| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | +| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | -- One code path to handle auth, rate limits, timeouts, service issues, and bad requests -- Clear behavior when conversation history exceeds the context window -- Backward compatibility when you switch providers or SDK versions +### Configure +Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md -## Quick start: Using agents and conversations +## Prerequisites -Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. +- [OpenHands is running](/openhands/usage/run-openhands/local-setup) -```python icon="python" wrap -from pydantic import SecretStr -from openhands.sdk import Agent, Conversation, LLM -from openhands.sdk.llm.exceptions import ( - LLMError, - LLMAuthenticationError, - LLMRateLimitError, - LLMTimeoutError, - LLMServiceUnavailableError, - LLMBadRequestError, - LLMContextWindowExceedError, -) +## Launching the GUI Server -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) -agent = Agent(llm=llm, tools=[]) -conversation = Conversation( - agent=agent, - persistence_dir="./.conversations", - workspace=".", -) +### Using the CLI Command -try: - conversation.send_message( - "Continue the long analysis we started earlier…" - ) - conversation.run() +You can launch the OpenHands GUI server directly from the command line using the `serve` command: -except LLMContextWindowExceedError: - # Conversation is longer than the model’s context window - # Options: - # 1) Enable a condenser (recommended for long sessions) - # 2) Shorten inputs or reset conversation - print("Hit the context limit. Consider enabling a condenser.") + +**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` +installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker +directly (see the [Docker section](#using-docker-directly) below). + -except LLMAuthenticationError: - print( - "Invalid or missing API credentials." - "Check your API key or auth setup." - ) +```bash +openhands serve +``` -except LLMRateLimitError: - print("Rate limit exceeded. Back off and retry later.") +This command will: +- Check that Docker is installed and running +- Pull the required Docker images +- Launch the OpenHands GUI server at http://localhost:3000 +- Use the same configuration directory (`~/.openhands`) as the CLI mode -except LLMTimeoutError: - print("Request timed out. Consider increasing timeout or retrying.") +#### Mounting Your Current Directory -except LLMServiceUnavailableError: - print("Service unavailable or connectivity issue. Retry with backoff.") +To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: -except LLMBadRequestError: - print("Bad request to provider. Validate inputs and arguments.") +```bash +openhands serve --mount-cwd +``` -except LLMError as e: - # Fallback for other SDK LLM errors (parsing/validation, etc.) - print(f"Unhandled LLM error: {e}") +This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. + +#### Using GPU Support + +If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: + +```bash +openhands serve --gpu ``` +This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: +```bash +openhands serve --gpu --mount-cwd +``` -### Avoiding context‑window errors with a condenser +**Prerequisites for GPU support:** +- NVIDIA GPU drivers must be installed on your host system +- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured -If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. +#### Requirements -```python icon="python" focus={5-6, 9-14} wrap -from openhands.sdk.context.condenser import LLMSummarizingCondenser +Before using the `openhands serve` command, ensure that: +- Docker is installed and running on your system +- You have internet access to pull the required Docker images +- Port 3000 is available on your system -condenser = LLMSummarizingCondenser( - llm=llm.model_copy(update={"usage_id": "condenser"}), - max_size=10, - keep_first=2, -) +The CLI will automatically check these requirements and provide helpful error messages if anything is missing. -agent = Agent(llm=llm, tools=[], condenser=condenser) -conversation = Conversation( - agent=agent, - persistence_dir="./.conversations", - workspace=".", -) -``` +### Using Docker Directly - - See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). - +Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. -## Handling errors with direct LLM calls +## Overview -The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. +### Initial Setup -### Example: Using `.completion()` +1. Upon first launch, you'll see a settings popup. +2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, + select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the + `Custom Model` text box. +3. Enter the corresponding `API Key` for your chosen provider. +4. Click `Save Changes` to apply the settings. -```python icon="python" wrap -from pydantic import SecretStr -from openhands.sdk import LLM -from openhands.sdk.llm import Message, TextContent -from openhands.sdk.llm.exceptions import ( - LLMError, - LLMAuthenticationError, - LLMRateLimitError, - LLMTimeoutError, - LLMServiceUnavailableError, - LLMBadRequestError, - LLMContextWindowExceedError, -) +### Settings -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +You can use the Settings page at any time to: -try: - response = llm.completion([ - Message.user([TextContent(text="Summarize our design doc")]) - ]) - print(response.message) +- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). +- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). +- [Configure MCP servers](/openhands/usage/settings/mcp-settings). +- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), + [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) + and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). +- Set application settings like your preferred language, notifications and other preferences. +- [Manage custom secrets](/openhands/usage/settings/secrets-settings). -except LLMContextWindowExceedError: - print("Context window exceeded. Consider enabling a condenser.") -except LLMAuthenticationError: - print("Invalid or missing API credentials.") -except LLMRateLimitError: - print("Rate limit exceeded. Back off and retry later.") -except LLMTimeoutError: - print("Request timed out. Consider increasing timeout or retrying.") -except LLMServiceUnavailableError: - print("Service unavailable or connectivity issue. Retry with backoff.") -except LLMBadRequestError: - print("Bad request to provider. Validate inputs and arguments.") -except LLMError as e: - print(f"Unhandled LLM error: {e}") -``` +### Key Features -### Example: Using `.responses()` +For an overview of the key features available inside a conversation, please refer to the +[Key Features](/openhands/usage/key-features) section of the documentation. + +## Other Ways to Run Openhands +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) -```python icon="python" wrap -from pydantic import SecretStr -from openhands.sdk import LLM -from openhands.sdk.llm import Message, TextContent -from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError +### Setup +Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md -llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +## Recommended Methods for Running Openhands on Your Local System -try: - resp = llm.responses([ - Message.user( - [TextContent(text="Write a one-line haiku about code.")] - ) - ]) - print(resp.message) -except LLMContextWindowExceedError: - print("Context window exceeded. Consider enabling a condenser.") -except LLMError as e: - print(f"LLM error: {e}") -``` +### System Requirements -## Exception reference +- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) +- Linux +- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) -All exceptions live under `openhands.sdk.llm.exceptions` unless noted. +A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. -| Category | Error | Description | -|--------|------|-------------| -| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | -| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | -| | `LLMRateLimitError` | Provider rate limit exceeded. | -| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | -| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | -| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | -| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | -| | `LLMNoActionError` | Model did not return an action when one was expected. | -| | `LLMResponseError` | Could not extract an action from the response. | -| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | -| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | -| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | -| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | -| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | -| | `OperationCancelled` | A running operation was cancelled programmatically. | +### Prerequisites - - All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all - for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. - + -### LLM Fallback Strategy -Source: https://docs.openhands.dev/sdk/guides/llm-fallback.md + -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + **Docker Desktop** -> A ready-to-run example is available [here](#ready-to-run-example)! + 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). + 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. + -`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. + -## Basic Usage + + Tested with Ubuntu 22.04. + -Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): + **Docker Desktop** -```python icon="python" wrap focus={16, 17, 21, 22, 23} -from pydantic import SecretStr -from openhands.sdk import LLM, LLMProfileStore -from openhands.sdk.llm import FallbackStrategy + 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). -# Menage persisted LLM profiles -# default store directory: .openhands/profiles -store = LLMProfileStore() + -fallback_llm = LLM( - usage_id="fallback-1", - model="openai/gpt-4o", - api_key=SecretStr("your-openai-key"), -) -store.save("fallback-1", fallback_llm, include_secrets=True) + -# Configure an LLM with a fallback strategy -primary_llm = LLM( - usage_id="agent-primary", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("your-api-key"), - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1"], - ), -) -``` + **WSL** -## How It Works + 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). + 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. -1. The primary LLM handles the request as normal -2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order -3. The first successful fallback response is returned to the caller -4. If all fallbacks fail, the original primary error is raised -5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + **Ubuntu (Linux Distribution)** - -Only transient errors trigger fallback. -Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. -For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) - + 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. + 2. Restart computer when prompted. + 3. Open Ubuntu from Start menu to complete setup. + 4. Verify installation: `wsl --list` should show Ubuntu. -## Multiple Fallback Levels + **Docker Desktop** -Chain as many fallback LLMs as you need. They are tried in list order: + 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). + 2. Open Docker Desktop, go to `Settings` and confirm the following: + - General: `Use the WSL 2 based engine` is enabled. + - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. -```python icon="python" wrap focus={5-7} -llm = LLM( - usage_id="agent-primary", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - ), -) -``` + + The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. + -If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. + -## Custom Profile Store Directory + -By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: +### Start the App -```python icon="python" wrap focus={3} -FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - profile_store_dir="/path/to/my/profiles", -) -``` +#### Option 1: Using the CLI Launcher with uv (Recommended) -## Metrics +We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). -Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: +**Install uv** (if you haven't already): -```python icon="python" wrap -# After running a conversation -metrics = llm.metrics -print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. -for usage in metrics.token_usages: - print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +**Install OpenHands**: +```bash +uv tool install openhands --python 3.12 ``` -Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. +**Launch OpenHands**: +```bash +# Launch the GUI server +openhands serve -## Use Cases +# Or with GPU support (requires nvidia-docker) +openhands serve --gpu -- **Rate limit handling** — When one provider throttles you, seamlessly switch to another -- **High availability** — Keep your agent running during provider outages -- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure -- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. +# Or with current directory mounted +openhands serve --mount-cwd +``` -## Ready-to-run Example +This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. - -This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) - +**Upgrade OpenHands**: +```bash +uv tool upgrade openhands --python 3.12 +``` -```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py -"""Example: Using FallbackStrategy for LLM resilience. + -When the primary LLM fails with a transient error (rate limit, timeout, etc.), -FallbackStrategy automatically tries alternate LLMs in order. Fallback is -per-call: each new request starts with the primary model. Token usage and -cost from fallback calls are merged into the primary LLM's metrics. +If you prefer to use pip and have Python 3.12+ installed: -This example: - 1. Saves two fallback LLM profiles to a temporary store. - 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. - 3. Runs a conversation — if the primary model is unavailable, the agent - transparently falls back to the next available model. -""" +```bash +# Install OpenHands +pip install openhands -import os -import tempfile +# Launch the GUI server +openhands serve +``` -from pydantic import SecretStr +Note that you'll still need `uv` installed for the default MCP servers to work properly. -from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool -from openhands.sdk.llm import FallbackStrategy -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool + +#### Option 2: Using Docker Directly -# Read configuration from environment -api_key = os.getenv("LLM_API_KEY", None) -assert api_key is not None, "LLM_API_KEY environment variable is not set." -base_url = os.getenv("LLM_BASE_URL") -primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + -# Use a temporary directory so this example doesn't pollute your home folder. -# In real usage you can omit base_dir to use the default (~/.openhands/profiles). -profile_store_dir = tempfile.mkdtemp() -store = LLMProfileStore(base_dir=profile_store_dir) +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` -fallback_1 = LLM( - usage_id="fallback-1", - model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), - api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), - base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), -) -store.save("fallback-1", fallback_1, include_secrets=True) + -fallback_2 = LLM( - usage_id="fallback-2", - model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), - api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), - base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), -) -store.save("fallback-2", fallback_2, include_secrets=True) +> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. + +You'll find OpenHands running at http://localhost:3000! + +### Setup -print(f"Saved fallback profiles: {store.list()}") +After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. +This can be done during the initial settings popup or by selecting the `Settings` +button (gear icon) in the UI. +If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options +and manually enter it with the correct prefix in the `Custom Model` text box. +The `Advanced` options also allow you to specify a `Base URL` if required. -# Configure the primary LLM with a FallbackStrategy -primary_llm = LLM( - usage_id="agent-primary", - model=primary_model, - api_key=SecretStr(api_key), - base_url=base_url, - fallback_strategy=FallbackStrategy( - fallback_llms=["fallback-1", "fallback-2"], - profile_store_dir=profile_store_dir, - ), -) +#### Getting an API Key +OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: -# Run a conversation -agent = Agent( - llm=primary_llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - ], -) + -conversation = Conversation(agent=agent, workspace=os.getcwd()) -conversation.send_message("Write a haiku about resilience into HAIKU.txt.") -conversation.run() + +1. [Log in to OpenHands Cloud](https://app.all-hands.dev). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. -# Inspect metrics (includes any fallback usage) -metrics = primary_llm.metrics -print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") -print(f"Token usage records: {len(metrics.token_usages)}") -for usage in metrics.token_usages: - print( - f" model={usage.model}" - f" prompt={usage.prompt_tokens}" - f" completion={usage.completion_tokens}" - ) +OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). -print(f"EXAMPLE_COST: {metrics.accumulated_cost}") -``` + - + -## Next Steps +1. [Create an Anthropic account](https://console.anthropic.com/). +2. [Generate an API key](https://console.anthropic.com/settings/keys). +3. [Set up billing](https://console.anthropic.com/settings/billing). -- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles -- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) -- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application -- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models + -### Image Input -Source: https://docs.openhands.dev/sdk/guides/llm-image-input.md + -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +1. [Create an OpenAI account](https://platform.openai.com/). +2. [Generate an API key](https://platform.openai.com/api-keys). +3. [Set up billing](https://platform.openai.com/account/billing/overview). -> A ready-to-run example is available [here](#ready-to-run-example)! + + -### Sending Images +1. Create a Google account if you don't already have one. +2. [Generate an API key](https://aistudio.google.com/apikey). +3. [Set up billing](https://aistudio.google.com/usage?tab=billing). -The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + -Pass images along with text in the message content: + -```python focus={14} icon="python" wrap -from openhands.sdk import ImageContent +If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. -IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" -conversation.send_message( - Message( - role="user", - content=[ - TextContent( - text=( - "Study this image and describe the key elements you see. " - "Summarize them in a short paragraph and suggest a catchy caption." - ) - ), - ImageContent(image_urls=[IMAGE_URL]), - ], - ) -) -``` + -Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. + -## Ready-to-run Example +Consider setting usage limits to control costs. + +#### Using a Local LLM -This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) +Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. -You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: +To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. -```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py -"""OpenHands Agent SDK — Image Input Example. +#### Setting Up Search Engine -This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds -vision support by sending an image to the agent alongside text instructions. -""" +OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. -import os +To enable search functionality in OpenHands: -from pydantic import SecretStr +1. Get a Tavily API key from [tavily.com](https://tavily.com/). +2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - ImageContent, - LLMConvertibleEvent, - Message, - TextContent, - get_logger, -) -from openhands.sdk.tool.spec import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.task_tracker import TaskTrackerTool -from openhands.tools.terminal import TerminalTool +For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. +### Versions -logger = get_logger(__name__) +The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: +- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. +For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. +- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. +This version is unstable and is recommended for testing or development purposes only. -# Configure LLM (vision-capable model) -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="vision-llm", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -assert llm.vision_is_active(), "The selected LLM model does not support vision input." +## Next Steps -cwd = os.getcwd() +- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) +- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) -agent = Agent( - llm=llm, - tools=[ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), - Tool(name=TaskTrackerTool.name), - ], -) +### Docker Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker.md -llm_messages = [] # collect raw LLM messages for inspection +The **Docker sandbox** runs the agent server inside a Docker container. This is +the default and recommended option for most users. + + In some self-hosted deployments, the sandbox provider is controlled via the + legacy RUNTIME environment variable. Docker is the default. + -def conversation_callback(event: Event) -> None: - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Why Docker? -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) +- Isolation: reduces risk when the agent runs commands. +- Reproducibility: consistent environment across machines. -IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" +## Mounting your code into the sandbox -conversation.send_message( - Message( - role="user", - content=[ - TextContent( - text=( - "Study this image and describe the key elements you see. " - "Summarize them in a short paragraph and suggest a catchy caption." - ) - ), - ImageContent(image_urls=[IMAGE_URL]), - ], - ) -) -conversation.run() +If you want OpenHands to work directly on a local repository, mount it into the +sandbox. -conversation.send_message( - "Great! Please save your description and caption into image_report.md." -) -conversation.run() +### Recommended: CLI launcher -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +If you start OpenHands via: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +```bash +openhands serve --mount-cwd ``` - - -## Next Steps - -- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns -- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently +your current directory will be mounted into the sandbox workspace. -### LLM Profile Store -Source: https://docs.openhands.dev/sdk/guides/llm-profile-store.md +### Using SANDBOX_VOLUMES -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +You can also configure mounts via the SANDBOX_VOLUMES environment +variable (format: host_path:container_path[:mode]): -> A ready-to-run example is available [here](#ready-to-run-example)! +```bash +export SANDBOX_VOLUMES=$PWD:/workspace:rw +``` -The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. -Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. + + Anything mounted read-write into /workspace can be modified by the + agent. + -## Benefits -- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. -- **Reusability:** Import a defined profile into any script or session with a single identifier. -- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. +## Custom sandbox images -## How It Works +To customize the container image (extra tools, system deps, etc.), see +[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). - - - ### Create a Store +### Overview +Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview.md - The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, - but you can point it anywhere. +A **sandbox** is the environment where OpenHands runs commands, edits files, and +starts servers while working on your task. - ```python icon="python" focus={3, 4, 6, 7} - from openhands.sdk import LLMProfileStore +In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. - # Default location: ~/.openhands/profiles - store = LLMProfileStore() +## Sandbox providers - # Or bring your own directory - store = LLMProfileStore(base_dir="./my-profiles") - ``` - - - ### Save a Profile +OpenHands supports multiple sandbox “providers”, with different tradeoffs: - Got an LLM configured just right? Save it for later. +- **Docker sandbox (recommended)** + - Runs the agent server inside a Docker container. + - Good isolation from your host machine. - ```python icon="python" focus={11, 12} - from pydantic import SecretStr - from openhands.sdk import LLM, LLMProfileStore +- **Process sandbox (unsafe, but fast)** + - Runs the agent server as a regular process on your machine. + - No container isolation. - fast_llm = LLM( - usage_id="fast", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr("sk-..."), - temperature=0.0, - ) +- **Remote sandbox** + - Runs the agent server in a remote environment. + - Used by managed deployments and some hosted setups. - store = LLMProfileStore() - store.save("fast", fast_llm) - ``` +## Selecting a provider (current behavior) - - API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to - persist them; otherwise, they will be read from the environment at load time. - - - - ### Load a Profile +In some deployments, the provider selection is still controlled via the legacy +RUNTIME environment variable: - Next time you need that LLM, just load it: +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote - ```python icon="python" - # Same model, ready to go. - llm = store.load("fast") - ``` - - - ### List and Clean Up + + The user-facing terminology in V1 is sandbox, but the configuration knob + may still be called RUNTIME while the migration is in progress. + - See what you've got, delete what you don't need: +## Terminology note (V0 vs V1) - ```python icon="python" focus={1, 3, 4} - print(store.list()) # ['fast.json', 'creative.json'] +Older documentation refers to these environments as **runtimes**. +Those legacy docs are now in the Legacy (V0) section of the Web tab. - store.delete("creative") - print(store.list()) # ['fast.json'] - ``` - - +### Process Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/process.md -## Good to Know +The **Process sandbox** runs the agent server directly on your machine as a +regular process. -Profile names must be simple filenames (no slashes, no dots at the start). + + This mode provides **no sandbox isolation**. -## Ready-to-run Example + The agent can read/write files your user account can access and execute + commands on your host system. - -This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) - + Only use this in controlled environments. + -```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py -"""Example: Using LLMProfileStore to save and reuse LLM configurations. +## When to use it -LLMProfileStore persists LLM configurations as JSON files, so you can define -a profile once and reload it across sessions without repeating setup code. -""" +- Local development when Docker is unavailable +- Some CI environments +- Debugging issues that only reproduce outside containers -import os -import tempfile +## Choosing process mode -from pydantic import SecretStr +In some deployments, this is selected via the legacy RUNTIME +environment variable: -from openhands.sdk import LLM, LLMProfileStore +```bash +export RUNTIME=process +# (legacy alias) +# export RUNTIME=local +``` +If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). -# Use a temporary directory so this example doesn't pollute your home folder. -# In real usage you can omit base_dir to use the default (~/.openhands/profiles). -store = LLMProfileStore(base_dir=tempfile.mkdtemp()) +### Remote Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote.md +A **remote sandbox** runs the agent server in a remote execution environment +instead of on your local machine. -# 1. Create two LLM profiles with different usage +This is typically used by managed deployments (e.g., OpenHands Cloud) and +advanced self-hosted setups. -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -base_url = os.getenv("LLM_BASE_URL") -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +## Selecting remote mode -fast_llm = LLM( - usage_id="fast", - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - temperature=0.0, -) +In some self-hosted deployments, remote sandboxes are selected via the legacy +RUNTIME environment variable: -creative_llm = LLM( - usage_id="creative", - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - temperature=0.9, -) +```bash +export RUNTIME=remote +``` -# 2. Save profiles +Remote sandboxes require additional configuration (API URL + API key). The exact +variable names depend on your deployment, but you may see legacy names like: -# Note that secrets are excluded by default for safety. -store.save("fast", fast_llm) -store.save("creative", creative_llm) +- SANDBOX_REMOTE_RUNTIME_API_URL +- SANDBOX_API_KEY -# To persist the API key as well, pass `include_secrets=True`: -# store.save("fast", fast_llm, include_secrets=True) +## Notes -# 3. List available persisted profiles +- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) + depending on the provider. +- Configuration and credentials vary by deployment. -print(f"Stored profiles: {store.list()}") +If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). -# 4. Load a profile +### API Keys Settings +Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md -loaded = store.load("fast") -assert isinstance(loaded, LLM) -print( - "Loaded profile. " - f"usage:{loaded.usage_id}, " - f"model: {loaded.model}, " - f"temperature: {loaded.temperature}." -) + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + -# 5. Delete a profile +## Overview -store.delete("creative") -print(f"After deletion: {store.list()}") +Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to +OpenHands Cloud -print("EXAMPLE_COST: 0") -``` +## OpenHands LLM Key - + +You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. + -## Next Steps +You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will +use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime -- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models -- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully +## OpenHands API Key -### Reasoning -Source: https://docs.openhands.dev/sdk/guides/llm-reasoning.md +These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the +[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Create API Key -View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. +1. Navigate to the `Settings > API Keys` page. +2. Click `Create API Key`. +3. Give your API key a name and click `Create`. -This guide demonstrates two provider-specific approaches: -1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning -2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter +### Delete API Key -## Anthropic Extended Thinking +1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. +2. Click `Delete` to confirm removal. -> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! +### Application Settings +Source: https://docs.openhands.dev/openhands/usage/settings/application-settings.md -Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process -through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. +## Overview -### How It Works +The Application settings allows you to customize various application-level behaviors in OpenHands, including +language preferences, notification settings, custom Git author configuration and more. -The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: +## Setting Maximum Budget Per Conversation -```python focus={6-11} icon="python" wrap -def show_thinking(event: Event): - if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") - for block in message.thinking_blocks: - if isinstance(block, RedactedThinkingBlock): - print(f"Redacted: {block.data}") - elif isinstance(block, ThinkingBlock): - print(f"Thinking: {block.thinking}") +To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) +in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but +you can choose to continue the conversation with a prompt. -conversation = Conversation(agent=agent, callbacks=[show_thinking]) -``` +## Git Author Settings -### Understanding Thinking Blocks +OpenHands provides the ability to customize the Git author information used when making commits and creating +pull requests on your behalf. -Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: +By default, OpenHands uses the following Git author information for all commits and pull requests: -- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process -- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data +- **Username**: `openhands` +- **Email**: `openhands@all-hands.dev` -By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, -giving you insight into how Claude is approaching the problem. +To override the defaults: -### Ready-to-run Example Antrophic +1. Navigate to the `Settings > Application` page. +2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. +3. Click `Save Changes` -This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + When you configure a custom Git author, OpenHands will use your specified username and email as the primary author + for commits and pull requests. OpenHands will remain as a co-author. -```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py -"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" - -import os - -from pydantic import SecretStr +### Integrations Settings +Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - RedactedThinkingBlock, - ThinkingBlock, -) -from openhands.sdk.tool import Tool -from openhands.tools.terminal import TerminalTool +## Overview +OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some +integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether +you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). -# Configure LLM for Anthropic Claude with extended thinking -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +## OpenHands Cloud Integrations Settings -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + -# Setup agent with bash tool -agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) +### GitHub Settings +- `Configure GitHub Repositories` - Allows you to +[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. -# Callback to display thinking blocks -def show_thinking(event: Event): - if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") - for i, block in enumerate(message.thinking_blocks): - if isinstance(block, RedactedThinkingBlock): - print(f" Block {i + 1}: {block.data}") - elif isinstance(block, ThinkingBlock): - print(f" Block {i + 1}: {block.thinking}") +### Slack Settings +- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in + your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. -conversation = Conversation( - agent=agent, callbacks=[show_thinking], workspace=os.getcwd() -) +## Running on Your Own Integrations Settings -conversation.send_message( - "Calculate compound interest for $10,000 at 5% annually, " - "compounded quarterly for 3 years. Show your work.", -) -conversation.run() + + These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). + -conversation.send_message( - "Now, write that number to RESULTs.txt.", -) -conversation.run() -print("✅ Done!") +### Version Control Integrations -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +#### GitHub Setup - +OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: -## OpenAI Reasoning via Responses API + + -> A ready-to-run example is available [here](#ready-to-run-example-openai)! + 1. **Generate a Personal Access Token (PAT)**: + - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. + - **Tokens (classic)** + - Required scopes: + - `repo` (Full control of private repositories) + - **Fine-grained tokens** + - All Repositories (You can select specific repositories, but this will impact what returns in repo search) + - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitHub Token` field. + - Click `Save Changes` to apply the changes. -OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) -that provides access to the model's reasoning process. -By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. + If you're working with organizational repositories, additional setup may be required: -### How It Works + 1. **Check organization requirements**: + - Organization admins may enforce specific token policies. + - Some organizations require tokens to be created with SSO enabled. + - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). + 2. **Verify organization access**: + - Go to your token settings on GitHub. + - Look for the organization under `Organization access`. + - If required, click `Enable SSO` next to your organization. + - Complete the SSO authorization process. + -Configure the LLM with the `reasoning_effort` parameter to enable reasoning: + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + - Try regenerating the token. -```python focus={5} icon="python" wrap -llm = LLM( - model="openhands/gpt-5-codex", - api_key=SecretStr(api_key), - base_url=base_url, - # Enable reasoning with effort level - reasoning_effort="high", -) -``` + - **Organization Access Denied**: + - Check if SSO is required but not enabled. + - Verify organization membership. + - Contact organization admin if token policies are blocking access. + + -The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of -reasoning performed by the model. +#### GitLab Setup -Then capture reasoning traces in your callback: +OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: -```python focus={3-4} icon="python" wrap -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - msg = event.to_llm_message() - llm_messages.append(msg) -``` + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitLab, go to `User Settings > Access Tokens`. + - Create a new token with the following scopes: + - `api` (API access) + - `read_user` (Read user information) + - `read_repository` (Read repository) + - `write_repository` (Write repository) + - Set an expiration date or leave it blank for a non-expiring token. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitLab Token` field. + - Click `Save Changes` to apply the changes. -### Understanding Reasoning Traces + 3. **(Optional): Restrict agent permissions** + - Create another PAT using Step 1 and exclude `api` scope . + - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. + - OpenHands will use the higher scope token, and the agent will use the lower scope token. + -The OpenAI Responses API provides reasoning traces that show how the model approached the problem. -These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. -Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. -### Ready-to-run Example OpenAI + - **Access Denied**: + - Verify project access permissions. + - Check if the token has the necessary scopes. + - For group/organization repositories, ensure you have proper access. + + - -This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) - +#### BitBucket Setup + + +1. **Generate an App password**: + - On Bitbucket, go to `Account Settings > App Password`. + - Create a new password with the following scopes: + - `account`: `read` + - `repository: write` + - `pull requests: write` + - `issues: write` + - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `BitBucket Token` field. + - Click `Save Changes` to apply the changes. + -```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py -""" -Example: Responses API path via LiteLLM in a Real Agent Conversation + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + -- Runs a real Agent/Conversation to verify /responses path works -- Demonstrates rendering of Responses reasoning within normal conversation events -""" + -from __future__ import annotations +### Language Model (LLM) Settings +Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings.md -import os +## Overview -from pydantic import SecretStr +The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is +supported by litellm, but it requires a powerful model to work properly. +[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some +additional LLM settings on this page. -from openhands.sdk import ( - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.llm import LLM -from openhands.tools.preset.default import get_default_agent +## Basic LLM Settings +The most popular providers and models are available in the basic settings. Some of the providers have been verified to +work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and +Mistral AI. -logger = get_logger(__name__) +1. Choose your preferred provider using the `LLM Provider` dropdown. +2. Choose your favorite model using the `LLM Model` dropdown. +3. Set the `API Key` for your chosen provider and model and click `Save Changes`. -api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") -assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." +This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first +restart older conversations. -model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API -base_url = os.getenv("LLM_BASE_URL") +## Advanced LLM Settings -llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - # Responses-path options - reasoning_effort="high", - # Logging / behavior tweaks - log_completions=False, - usage_id="agent", -) +Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use +this when your preferred provider or model does not exist in the basic settings dropdowns. -print("\n=== Agent Conversation using /responses path ===") -agent = get_default_agent( - llm=llm, - cli_mode=True, # disable browser tools for env simplicity -) +1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the + custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have + [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). +2. `Base URL`: If your provider has a specific base URL, specify it here. +3. `API Key`: Set the API key for your custom model. +4. Click `Save Changes` -llm_messages = [] # collect raw LLM-convertible messages for inspection +### Memory Condensation +The memory condenser manages the language model's context by ensuring only the most important and relevant information +is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running +conversations. -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +- `Enable memory condensation` - Turn on this setting to activate this feature. +- `Memory condenser max history size` - The condenser will summarize the history after this many events. +### Model Context Protocol (MCP) +Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=os.getcwd(), -) +## Overview -# Keep the tasks short for demo purposes -conversation.send_message("Read the repo and write one fact into FACTS.txt.") -conversation.run() +Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These +servers can provide additional functionality to the agent, such as specialized data processing, external API access, +or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). -conversation.send_message("Now delete FACTS.txt.") -conversation.run() +## Supported MCPs -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - ms = str(message) - print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") +OpenHands supports the following MCP transport protocols: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) +* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) +* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) - +## How MCP Works -## Use Cases +When OpenHands starts, it: -**Debugging**: Understand why the agent made specific decisions or took certain actions. +1. Reads the MCP configuration. +2. Connects to any configured SSE and SHTTP servers. +3. Starts any configured stdio servers. +4. Registers the tools provided by these servers with the agent. -**Transparency**: Show users how the AI arrived at its conclusions. +The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: -**Quality Assurance**: Identify flawed reasoning patterns or logic errors. +1. OpenHands routes the call to the appropriate MCP server. +2. The server processes the request and returns a response. +3. OpenHands converts the response to an observation and presents it to the agent. -**Learning**: Study how models approach complex problems. +## Configuration -## Next Steps +MCP configuration can be defined in: +* The OpenHands UI in the `Settings > MCP` page. +* The `config.toml` file under the `[mcp]` section if not using the UI. -- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance -- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities +### Configuration Options -### LLM Registry -Source: https://docs.openhands.dev/sdk/guides/llm-registry.md + + + SSE servers are configured using either a string URL or an object with the following properties: -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + - `url` (required) + - Type: `str` + - Description: The URL of the SSE server. -> A ready-to-run example is available [here](#ready-to-run-example)! + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + + SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: -Use the LLM registry to manage multiple LLM providers and dynamically switch between models. + - `url` (required) + - Type: `str` + - Description: The URL of the SHTTP server. -## Using the Registry + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. -You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. + - `timeout` (optional) + - Type: `int` + - Default: `60` + - Range: `1-3600` seconds (1 hour maximum) + - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. + - **Use Cases:** + - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. + - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. + - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. + + This timeout only applies to individual tool calls, not server connection establishment. + + + + + While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for + better reliability and performance. + -```python icon="python" focus={9,10,13} -main_llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + Stdio servers are configured using an object with the following properties: -# define the registry and add an LLM -llm_registry = LLMRegistry() -llm_registry.add(main_llm) -... -# retrieve the LLM by its usage ID -llm = llm_registry.get("agent") -``` + - `name` (required) + - Type: `str` + - Description: A unique name for the server. -## Ready-to-run Example + - `command` (required) + - Type: `str` + - Description: The command to run the server. - -This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) - + - `args` (optional) + - Type: `list of str` + - Default: `[]` + - Description: Command-line arguments to pass to the server. -```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py -import os + - `env` (optional) + - Type: `dict of str to str` + - Default: `{}` + - Description: Environment variables to set for the server process. + + -from pydantic import SecretStr +#### When to Use Direct Stdio -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - LLMRegistry, - Message, - TextContent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.terminal import TerminalTool +Direct stdio connections may still be appropriate in these scenarios: +- **Development and testing**: Quick prototyping of MCP servers. +- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. +- **Local-only environments**: When you don't want to manage additional proxy processes. +### Configuration Examples -logger = get_logger(__name__) + + + For stdio-based MCP servers, we recommend using MCP proxy tools like + [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. + [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to + HTTP/SSE endpoints. -# Configure LLM using LLMRegistry -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") + Start the proxy servers separately: + ```bash + # Terminal 1: Filesystem server proxy + supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 -# Create LLM instance -main_llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) + # Terminal 2: Fetch server proxy + supergateway --stdio "uvx mcp-server-fetch" --port 8081 + ``` -# Create LLM registry and add the LLM -llm_registry = LLMRegistry() -llm_registry.add(main_llm) + Then configure OpenHands to use the HTTP endpoint: -# Get LLM from registry -llm = llm_registry.get("agent") + ```toml + [mcp] + # SSE Servers - Recommended approach using proxy tools + sse_servers = [ + # Basic SSE server with just a URL + "http://example.com:8080/mcp", -# Tools -cwd = os.getcwd() -tools = [Tool(name=TerminalTool.name)] + # SuperGateway proxy for fetch server + "http://localhost:8081/sse", -# Agent -agent = Agent(llm=llm, tools=tools) + # External MCP service with authentication + {url="https://api.example.com/mcp/sse", api_key="your-api-key"} + ] -llm_messages = [] # collect raw LLM messages + # SHTTP Servers - Modern streamable HTTP transport (recommended) + shttp_servers = [ + # Basic SHTTP server with default 60s timeout + "https://api.example.com/mcp/shttp", + + # Server with custom timeout for heavy operations + { + url = "https://files.example.com/mcp/shttp", + api_key = "your-api-key", + timeout = 1800 # 30 minutes for large file processing + } + ] + ``` + + + + This setup is not Recommended for production. + + ```toml + [mcp] + # Direct stdio servers - use only for development/testing + stdio_servers = [ + # Basic stdio server + {name="fetch", command="uvx", args=["mcp-server-fetch"]}, + # Stdio server with environment variables + { + name="filesystem", + command="npx", + args=["@modelcontextprotocol/server-filesystem", "/"], + env={ + "DEBUG": "true" + } + } + ] + ``` -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) + For production use, we recommend using proxy tools like SuperGateway. + + +Other options include: -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) +- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. +- **Docker-based proxies**: Containerized solutions for better isolation. +- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. -conversation.send_message("Please echo 'Hello!'") -conversation.run() +### Secrets Management +Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +## Overview -print("=" * 100) -print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") +OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be +accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment +variables in the agent's runtime environment. -# Demonstrate getting the same LLM instance from registry -same_llm = llm_registry.get("agent") -print(f"Same LLM instance: {llm is same_llm}") +## Accessing the Secrets Manager -# Demonstrate requesting a completion directly from an LLM -resp = llm.completion( - messages=[ - Message(role="user", content=[TextContent(text="Say hello in one word.")]) - ] -) -# Access the response content via OpenHands LLMResponse -msg = resp.message -texts = [c.text for c in msg.content if isinstance(c, TextContent)] -print(f"Direct completion response: {texts[0] if texts else str(msg)}") +Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +## Adding a New Secret +1. Click `Add a new secret`. +2. Fill in the following fields: + - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. + - **Value**: The sensitive information you want to store. + - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. +3. Click `Add secret` to save. - +## Editing a Secret +1. Click the `Edit` button next to the secret you want to modify. +2. You can update the name and description of the secret. + + For security reasons, you cannot view or edit the value of an existing secret. If you need to change the + value, delete the secret and create a new one. + -## Next Steps +## Deleting a Secret -- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs +1. Click the `Delete` button next to the secret you want to remove. +2. Select `Confirm` to delete the secret. -### Model Routing -Source: https://docs.openhands.dev/sdk/guides/llm-routing.md +## Using Secrets in the Agent + - All custom secrets are automatically exported as environment variables in the agent's runtime environment. + - You can access them in your code using standard environment variable access methods. For example, if you create a + secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or + `os.environ['OPENAI_API_KEY']` in Python. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Prompting Best Practices +Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md -This feature is under active development and more default routers will be available in future releases. +## Characteristics of Good Prompts -> A ready-to-run example is available [here](#ready-to-run-example)! +Good prompts are: -### Using the built-in MultimodalRouter +- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. +- **Location-specific**: Specify the locations in the codebase that should be modified, if known. +- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. -Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: +## Examples -```python icon="python" wrap focus={13-16} -primary_llm = LLM( - usage_id="agent-primary", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -secondary_llm = LLM( - usage_id="agent-secondary", - model="litellm_proxy/mistral/devstral-small-2507", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) -multimodal_router = MultimodalRouter( - usage_id="multimodal-router", - llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, -) -``` +### Good Prompt Examples -You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. +- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. +- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. +- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. -## Ready-to-run Example +### Bad Prompt Examples - -This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) - +- Make the code better. (Too vague, not concrete) +- Rewrite the entire backend to use a different framework. (Not appropriately scoped) +- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) -Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: +## Tips for Effective Prompting -```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py -import os +- Be as specific as possible about the desired outcome or the problem to be solved. +- Provide context, including relevant file paths and line numbers if available. +- Break large tasks into smaller, manageable prompts. +- Include relevant error messages or logs. +- Specify the programming language or framework, if not obvious. -from pydantic import SecretStr +The more precise and informative your prompt, the better OpenHands can assist you. -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - ImageContent, - LLMConvertibleEvent, - Message, - TextContent, - get_logger, -) -from openhands.sdk.llm.router import MultimodalRouter -from openhands.tools.preset.default import get_default_tools +See [First Projects](/overview/first-projects) for more examples of helpful prompts. +### Troubleshooting +Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md -logger = get_logger(__name__) + +OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. + -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +### Launch docker client failed -primary_llm = LLM( - usage_id="agent-primary", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -secondary_llm = LLM( - usage_id="agent-secondary", - model="openhands/devstral-small-2507", - base_url=base_url, - api_key=SecretStr(api_key), -) -multimodal_router = MultimodalRouter( - usage_id="multimodal-router", - llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, -) +**Description** -# Tools -tools = get_default_tools() # Use our default openhands experience +When running OpenHands, the following error is seen: +``` +Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. +``` -# Agent -agent = Agent(llm=multimodal_router, tools=tools) +**Resolution** -llm_messages = [] # collect raw LLM messages +Try these in order: +* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. +* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. +* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. +* Reinstall Docker Desktop. +### Permission Error -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +**Description** +On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() -) +**Resolution** -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text=("Hi there, who trained you?"))], - ) -) -conversation.run() +* Check if the `~/.openhands` is owned by `root`. If so, you can: + * Change the directory's ownership: `sudo chown : ~/.openhands`. + * or update permissions on the directory: `sudo chmod 777 ~/.openhands` + * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. +* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running + OpenHands. -conversation.send_message( - message=Message( - role="user", - content=[ - ImageContent( - image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] - ), - TextContent(text=("What do you see in the image above?")), - ], - ) -) -conversation.run() +### On Linux, Getting ConnectTimeout Error -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text=("Who trained you as an LLM?"))], - ) -) -conversation.run() +**Description** -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +When running on Linux, you might run into the error `ERROR:root:: timed out`. -# Report cost -cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +**Resolution** - +If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that +these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker +[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. +If that does not solve the issue, try incrementally adding the following parameters to the docker run command: +* `--network host` +* `-e SANDBOX_USE_HOST_NETWORK=true` +* `-e DOCKER_HOST_ADDR=127.0.0.1` -## Next Steps +### Internal Server Error. Ports are not available -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations -- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs +**Description** -### LLM Streaming -Source: https://docs.openhands.dev/sdk/guides/llm-streaming.md +When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP +...: bind: An attempt was made to access a socket in a +way forbidden by its access permissions.")` is encountered. -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +**Resolution** - -This is currently only supported for the chat completion endpoint. - +* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: +``` +Restart-Service -Name "winnat" +``` -> A ready-to-run example is available [here](#ready-to-run-example)! +### Unable to access VS Code tab via local IP +**Description** -Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use -streaming callbacks to process and display tokens as they arrive from the language model. +When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" +error, while other parts of the UI work fine. +**Resolution** -## How It Works +This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. +To fix this: -Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the -complete response. This creates a more responsive user experience, especially for long-form content generation. +1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: + ```bash + docker run -it --rm \ + -e SANDBOX_VSCODE_PORT=41234 \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + -p 41234:41234 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:latest + ``` - - - ### Enable Streaming on LLM - Configure the LLM with streaming enabled: + > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. - ```python focus={6} icon="python" wrap - llm = LLM( - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), - base_url=base_url, - usage_id="stream-demo", - stream=True, # Enable streaming - ) - ``` - - - ### Define Token Callback - Create a callback function that processes streaming chunks as they arrive: +2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. +3. If running with the development workflow, you can set this in your `config.toml` file: + ```toml + [sandbox] + vscode_port = 41234 + ``` - ```python icon="python" wrap - def on_token(chunk: ModelResponseStream) -> None: - """Process each streaming chunk as it arrives.""" - choices = chunk.choices - for choice in choices: - delta = choice.delta - if delta is not None: - content = getattr(delta, "content", None) - if isinstance(content, str): - sys.stdout.write(content) - sys.stdout.flush() - ``` +### GitHub Organization Rename Issues - The callback receives a `ModelResponseStream` object containing: - - **`choices`**: List of response choices from the model - - **`delta`**: Incremental content changes for each choice - - **`content`**: The actual text tokens being streamed - - - ### Register Callback with Conversation +**Description** - Pass your token callback to the conversation: +After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. - ```python focus={3} icon="python" wrap - conversation = Conversation( - agent=agent, - token_callbacks=[on_token], # Register streaming callback - workspace=os.getcwd(), - ) - ``` +**Resolution** - The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers - if needed (e.g., one for display, another for logging). - - +* Update your git remote URL: + ```bash + # Check current remote + git remote get-url origin + + # Update SSH remote + git remote set-url origin git@github.com:OpenHands/OpenHands.git + + # Or update HTTPS remote + git remote set-url origin https://github.com/OpenHands/OpenHands.git + ``` +* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` +* Find and update any hardcoded references: + ```bash + git grep -i "all-hands-ai" + git grep -i "ghcr.io/all-hands-ai" + ``` -## Ready-to-run Example +### COBOL Modernization +Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md + +Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. -This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) +This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). -```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py -import os -import sys -from typing import Literal - -from pydantic import SecretStr - -from openhands.sdk import ( - Conversation, - get_logger, -) -from openhands.sdk.llm import LLM -from openhands.sdk.llm.streaming import ModelResponseStream -from openhands.tools.preset.default import get_default_agent +## The COBOL Modernization Challenge +[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. -logger = get_logger(__name__) +The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. +## Overview -api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") -if not api_key: - raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") +COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - usage_id="stream-demo", - stream=True, -) +OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: -agent = get_default_agent(llm=llm, cli_mode=True) +1. **Understanding**: Analyze and document existing COBOL code +2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# +3. **Validation**: Ensure the modernized code behaves identically to the original +In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. -# Define streaming states -StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] -# Track state across on_token calls for boundary detection -_current_state: StreamingState | None = None +## Understanding +A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. -def on_token(chunk: ModelResponseStream) -> None: - """ - Handle all types of streaming tokens including content, - tool calls, and thinking blocks with dynamic boundary detection. - """ - global _current_state +Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: - choices = chunk.choices - for choice in choices: - delta = choice.delta - if delta is not None: - # Handle thinking blocks (reasoning content) - reasoning_content = getattr(delta, "reasoning_content", None) - if isinstance(reasoning_content, str) and reasoning_content: - if _current_state != "thinking": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("THINKING: ") - _current_state = "thinking" - sys.stdout.write(reasoning_content) - sys.stdout.flush() +``` +For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. - # Handle regular content - content = getattr(delta, "content", None) - if isinstance(content, str) and content: - if _current_state != "content": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("CONTENT: ") - _current_state = "content" - sys.stdout.write(content) - sys.stdout.flush() +Save the results in `business_functions.json` in the following format: - # Handle tool calls - tool_calls = getattr(delta, "tool_calls", None) - if tool_calls: - for tool_call in tool_calls: - tool_name = ( - tool_call.function.name if tool_call.function.name else "" - ) - tool_args = ( - tool_call.function.arguments - if tool_call.function.arguments - else "" - ) - if tool_name: - if _current_state != "tool_name": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("TOOL NAME: ") - _current_state = "tool_name" - sys.stdout.write(tool_name) - sys.stdout.flush() - if tool_args: - if _current_state != "tool_args": - if _current_state is not None: - sys.stdout.write("\n") - sys.stdout.write("TOOL ARGS: ") - _current_state = "tool_args" - sys.stdout.write(tool_args) - sys.stdout.flush() +{ + ..., + "COBIL00C.cbl": { + "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", + "references": [ + "docs/billing.md#bill-payment", + "docs/transactions.md#transaction-action" + ], + }, + ... +} +``` +OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. -conversation = Conversation( - agent=agent, - workspace=os.getcwd(), - token_callbacks=[on_token], -) +## Translation -story_prompt = ( - "Tell me a long story about LLM streaming, write it a file, " - "make sure it has multiple paragraphs. " -) -conversation.send_message(story_prompt) -print("Token Streaming:") -print("-" * 100 + "\n") -conversation.run() +With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. -cleanup_prompt = ( - "Thank you. Please delete the streaming story file now that I've read it, " - "then confirm the deletion." -) -conversation.send_message(cleanup_prompt) -print("Token Streaming:") -print("-" * 100 + "\n") -conversation.run() +One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +An example prompt is below: - +``` +Convert the COBOL files in `/src` to Java in `/src/java`. -## Next Steps +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures (see `business_functions.json`) +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices +``` -- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully -- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming -- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI +Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. -### LLM Subscriptions -Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions.md +## Validation -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. - -OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. - +Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. -> A ready-to-run example is available [here](#ready-to-run-example)! +## Scaling Up -Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. +The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. -## How It Works +The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): - - - ### Call subscription_login() +```python +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Migrating agent converts COBOL to Java + migration_conversation.send_message(migration_prompt) + migration_conversation.run() + + # Critiquing agent evaluates the conversion + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse the score and decide whether to continue + current_score = parse_critique_score(critique_file) +``` - The `LLM.subscription_login()` class method handles the entire authentication flow: +By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: - ```python icon="python" - from openhands.sdk import LLM +``` +Evaluate the quality of the COBOL to Java migration in `/src`. - llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") - ``` +For each Java file, assess using the following criteria: +1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? +2. Code Quality: Is the code clean, readable, and following Java 17 conventions? +3. Completeness: Are all COBOL features properly converted? +4. Best Practices: Does it use proper OOP, error handling, and documentation? - On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. - - - ### Use the LLM +For each instance of a criteria not met, deduct a point. - Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. - - +Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. -## Supported Models +Save the results in `critique.json` in the following format: -The following models are available via ChatGPT subscription: +{ + "total_score": -12, + "files": [ + { + "cobol": "COBIL00C.cbl", + "java": "bill_payment.java", + "scores": { + "correctness": 0, + "code_quality": 0, + "completeness": -1, + "best_practices": -2 + }, + "feedback": [ + "Rename single-letter variables to meaningful names.", + "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", + ], + }, + ... + ] +} +``` -| Model | Description | -|-------|-------------| -| `gpt-5.2-codex` | Latest Codex model (default) | -| `gpt-5.2` | GPT-5.2 base model | -| `gpt-5.1-codex-max` | High-capacity Codex model | -| `gpt-5.1-codex-mini` | Lightweight Codex model | +In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. -## Configuration Options +This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. -### Force Fresh Login +## Try It Yourself -If your cached credentials become stale or you want to switch accounts: +The full iterative refinement example is available in the OpenHands SDK: -```python icon="python" -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", - force_login=True, # Always perform fresh OAuth login -) +```bash +export LLM_API_KEY="your-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/31_iterative_refinement.py ``` -### Disable Browser Auto-Open +For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. -For headless environments or when you prefer to manually open the URL: -```python icon="python" -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", - open_browser=False, # Prints URL to console instead -) -``` +## Related Resources -### Check Subscription Mode +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -Verify that the LLM is using subscription-based authentication: +### Automated Code Review +Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review.md -```python icon="python" -llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") -print(f"Using subscription: {llm.is_subscription}") # True -``` +Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. -## Credential Storage +## Overview -Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. +The OpenHands PR Review workflow is a GitHub Actions workflow that: -## Ready-to-run Example +- **Triggers automatically** when PRs are opened or when you request a review +- **Analyzes code changes** in the context of your entire repository +- **Posts inline comments** directly on specific lines of code in the PR +- **Provides fast feedback** - typically within 2-3 minutes - -This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) - +## How It Works -```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py -"""Example: Using ChatGPT subscription for Codex models. +The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: -This example demonstrates how to use your ChatGPT Plus/Pro subscription -to access OpenAI's Codex models without consuming API credits. +1. **Trigger**: The workflow runs when: + - A new non-draft PR is opened + - A draft PR is marked as ready for review + - The `review-this` label is added to a PR + - `openhands-agent` is requested as a reviewer -The subscription_login() method handles: -- OAuth PKCE authentication flow -- Credential caching (~/.openhands/auth/) -- Automatic token refresh +2. **Analysis**: The agent receives the complete PR diff and uses two skills: + - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices + - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API -Supported models: -- gpt-5.2-codex -- gpt-5.2 -- gpt-5.1-codex-max -- gpt-5.1-codex-mini +3. **Output**: Review comments are posted directly on the PR with: + - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) + - Specific line references + - Actionable suggestions with code examples -Requirements: -- Active ChatGPT Plus or Pro subscription -- Browser access for initial OAuth login -""" +### Review Styles -import os +Choose between two review styles: -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +| Style | Description | Best For | +|-------|-------------|----------| +| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | +| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | + +## Quick Start + + + + Create `.github/workflows/pr-review-by-openhands.yml` in your repository: + + ```yaml + name: PR Review by OpenHands + + on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] + + permissions: + contents: read + pull-requests: write + issues: write + jobs: + pr-review: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + review-style: standard + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} + ``` + -# First time: Opens browser for OAuth login -# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) -llm = LLM.subscription_login( - vendor="openai", - model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" -) + + Go to your repository's **Settings → Secrets and variables → Actions** and add: + - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) + -# Alternative: Force a fresh login (useful if credentials are stale) -# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) + + Create a `review-this` label in your repository: + 1. Go to **Issues → Labels** + 2. Click **New label** + 3. Name: `review-this` + 4. Description: `Trigger OpenHands PR review` + -# Alternative: Disable auto-opening browser (prints URL to console instead) -# llm = LLM.subscription_login( -# vendor="openai", model="gpt-5.2-codex", open_browser=False -# ) + + Open a PR and either: + - Add the `review-this` label, OR + - Request `openhands-agent` as a reviewer + + -# Verify subscription mode is active -print(f"Using subscription mode: {llm.is_subscription}") +## Composite Action -# Use the LLM with an agent as usual -agent = Agent( - llm=llm, - tools=[ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), - ], -) +The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: -cwd = os.getcwd() -conversation = Conversation(agent=agent, workspace=cwd) +- Checking out the SDK at the specified version +- Setting up Python and dependencies +- Running the PR review agent +- Uploading logs as artifacts -conversation.send_message("List the files in the current directory.") -conversation.run() -print("Done!") -``` +### Action Inputs - +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | +| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | -## Next Steps + +Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. + -- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations -- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token -- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces +## Customization -### Model Context Protocol -Source: https://docs.openhands.dev/sdk/guides/mcp.md +### Repository-Specific Review Guidelines -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: - - ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. - Read more about MCP [here](https://modelcontextprotocol.io/). - +```markdown +--- +name: code-review +description: Custom code review guidelines for this repository +triggers: +- /codereview +--- +# Repository Code Review Guidelines +You are reviewing code for [Your Project Name]. Follow these guidelines: -## Basic MCP Usage +## Review Decisions -> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! +### When to APPROVE +- Configuration changes following existing patterns +- Documentation-only changes +- Test-only changes without production code changes +- Simple additions following established conventions - - - ### MCP Configuration - Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) +### When to COMMENT +- Issues that need attention (bugs, security concerns) +- Suggestions for improvement +- Questions about design decisions - ```python mcp_config icon="python" wrap focus={3-10} - mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "repomix": { - "command": "npx", - "args": ["-y", "repomix@1.4.2", "--mcp"] - }, - } - } - ``` - - - ### Tool Filtering - Use `filter_tools_regex` to control which MCP tools are available to the agent +## Core Principles - ```python filter_tools_regex focus={4-5} icon="python" - agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", - ) - ``` - - +1. **[Your Principle 1]**: Description +2. **[Your Principle 2]**: Description -## MCP with OAuth +## What to Check -> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! +- **[Category 1]**: What to look for +- **[Category 2]**: What to look for -For MCP servers requiring OAuth authentication: -- Configure OAuth-enabled MCP servers by specifying the URL and auth type -- The SDK automatically handles the OAuth flow when first connecting -- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) -- User will be prompted to authenticate -- Access tokens are securely stored and automatically refreshed by FastMCP as needed +## Repository Conventions -```python mcp_config focus={5} icon="python" wrap -mcp_config = { - "mcpServers": { - "Notion": { - "url": "https://mcp.notion.com/mcp", - "auth": "oauth" - } - } -} +- Use [your linter] for style checking +- Follow [your style guide] +- Tests should be in [your test directory] ``` -## Ready-to-Run Basic MCP Usage Example - -This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) +The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. -Here's an example integrating MCP servers with an agent: +### Workflow Configuration -```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py -import os +Customize the workflow by modifying the action inputs: -from pydantic import SecretStr +```yaml +- name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + # Change the LLM model + llm-model: anthropic/claude-sonnet-4-5-20250929 + # Use a custom LLM endpoint + llm-base-url: https://your-llm-proxy.example.com + # Switch to "roasted" style for brutally honest reviews + review-style: roasted + # Pin to a specific SDK version for stability + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### Trigger Customization +Modify when reviews are triggered by editing the workflow conditions: -logger = get_logger(__name__) +```yaml +# Only trigger on label (disable auto-review on PR open) +if: github.event.label.name == 'review-this' -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +# Only trigger when specific reviewer is requested +if: github.event.requested_reviewer.login == 'openhands-agent' -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +# Trigger on all PRs (including drafts) +if: | + github.event.action == 'opened' || + github.event.action == 'synchronize' +``` -# Add MCP Tools -mcp_config = { - "mcpServers": { - "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, - "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, - } -} -# Agent -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - # This regex filters out all repomix tools except pack_codebase - filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", -) +## Security Considerations -llm_messages = [] # collect raw LLM messages +The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. + + +**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. + +To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. + +## Example Reviews -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +See real automated reviews in action on the OpenHands Software Agent SDK repository: +| PR | Description | Review Highlights | +|----|-------------|-------------------| +| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | +| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | +| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | +| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, -) -conversation.set_security_analyzer(LLMSecurityAnalyzer()) +## Troubleshooting -logger.info("Starting conversation with MCP integration...") -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands and write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() + + + - Ensure the `LLM_API_KEY` secret is set correctly + - Check that the label name matches exactly (`review-this`) + - Verify the workflow file is in `.github/workflows/` + - Check the Actions tab for workflow run errors + + + + - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission + - Check the workflow logs for API errors + - Verify the PR is not from a fork with restricted permissions + + + + - Large PRs may take longer to analyze + - Consider splitting large PRs into smaller ones + - Check if the LLM API is experiencing delays + + -conversation.send_message("Great! Now delete that file.") -conversation.run() +## Related Resources -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews +- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows +- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud +- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +### Dependency Upgrades +Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md - +Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. -## Ready-to-Run MCP with OAuth Example +## Overview - -This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) - +OpenHands helps with dependency management by: -```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py -import os +- **Analyzing dependencies**: Identifying outdated packages and their versions +- **Planning upgrades**: Creating upgrade strategies and migration guides +- **Implementing changes**: Updating code to handle breaking changes +- **Validating results**: Running tests and verifying functionality -from pydantic import SecretStr +## Dependency Analysis Examples -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### Identifying Outdated Dependencies +Start by understanding your current dependency state: -logger = get_logger(__name__) +``` +Analyze the dependencies in this project and create a report: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +1. List all direct dependencies with current and latest versions +2. Identify dependencies more than 2 major versions behind +3. Flag any dependencies with known security vulnerabilities +4. Highlight dependencies that are deprecated or unmaintained +5. Prioritize which updates are most important +``` -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +**Example output:** -mcp_config = { - "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} -} -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) +| Package | Current | Latest | Risk | Priority | +|---------|---------|--------|------|----------| +| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | +| react | 16.8.0 | 18.2.0 | Outdated | Medium | +| express | 4.17.1 | 4.18.2 | Minor update | Low | +| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | -llm_messages = [] # collect raw LLM messages +### Security-Related Dependency Upgrades +Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +- Automating vulnerability detection and remediation +- Integrating with security scanners (Snyk, Dependabot, CodeQL) +- Building automated pipelines for security fixes +- Using OpenHands agents to create pull requests automatically +### Compatibility Checking -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], -) +Check for compatibility issues before upgrading: -logger.info("Starting conversation with MCP integration...") -conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") -conversation.run() +``` +Check compatibility for upgrading React from 16 to 18: -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +1. Review our codebase for deprecated React patterns +2. List all components using lifecycle methods +3. Identify usage of string refs or findDOMNode +4. Check third-party library compatibility with React 18 +5. Estimate the effort required for migration ``` - +**Compatibility matrix:** -## Next Steps +| Dependency | React 16 | React 17 | React 18 | Action Needed | +|------------|----------|----------|----------|---------------| +| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | +| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | +| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | -- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools -- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage -- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation +## Automated Upgrade Examples -### Metrics Tracking -Source: https://docs.openhands.dev/sdk/guides/metrics.md +### Version Updates -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Perform straightforward version updates: -## Overview + + + ``` + Update all patch and minor versions in package.json: + + 1. Review each update for changelog notes + 2. Update package.json with new versions + 3. Update package-lock.json + 4. Run the test suite + 5. List any deprecation warnings + ``` + + + ``` + Update dependencies in requirements.txt: + + 1. Check each package for updates + 2. Update requirements.txt with compatible versions + 3. Update requirements-dev.txt similarly + 4. Run tests and verify functionality + 5. Note any deprecation warnings + ``` + + + ``` + Update dependencies in pom.xml: + + 1. Check for newer versions of each dependency + 2. Update version numbers in pom.xml + 3. Run mvn dependency:tree to check conflicts + 4. Run the test suite + 5. Document any API changes encountered + ``` + + -The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: -- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. -- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). +### Breaking Change Handling -## Getting Metrics from Individual LLMs +When major versions introduce breaking changes: -> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! +``` +Upgrade axios from v0.x to v1.x and handle breaking changes: -Track token usage, costs, and performance metrics from LLM interactions: +1. List all breaking changes in axios 1.0 changelog +2. Find all axios usages in our codebase +3. For each breaking change: + - Show current code + - Show updated code + - Explain the change +4. Create a git commit for each logical change +5. Verify all tests pass +``` -### Accessing Individual LLM Metrics +**Example transformation:** -Access metrics directly from the LLM object after running the conversation: +```javascript +// Before (axios 0.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const response = await axios.get('/users', { + cancelToken: source.token +}); + +// After (axios 1.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const controller = new AbortController(); +const response = await axios.get('/users', { + signal: controller.signal +}); +``` + +### Code Adaptation + +Adapt code to new API patterns: -```python icon="python" focus={3-4} -conversation.run() +``` +Migrate our codebase from moment.js to date-fns: -assert llm.metrics is not None -print(f"Final LLM metrics: {llm.metrics.model_dump()}") +1. List all moment.js usages in our code +2. Map moment methods to date-fns equivalents +3. Update imports throughout the codebase +4. Handle any edge cases where APIs differ +5. Remove moment.js from dependencies +6. Verify all date handling still works correctly ``` -The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: +**Migration map:** -- `accumulated_cost` - Total accumulated cost across all API calls -- `accumulated_token_usage` - Aggregated token usage with fields like: - - `prompt_tokens` - Number of input tokens processed - - `completion_tokens` - Number of output tokens generated - - `cache_read_tokens` - Cache hits (if supported by the model) - - `cache_write_tokens` - Cache writes (if supported by the model) - - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) - - `context_window` - Context window size used -- `costs` - List of individual cost records per API call -- `token_usages` - List of detailed token usage records per API call -- `response_latencies` - List of response latency metrics per API call +| moment.js | date-fns | Notes | +|-----------|----------|-------| +| `moment()` | `new Date()` | Different return type | +| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | +| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | +| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | - - For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). - +## Testing and Validation Examples -### Ready-to-run Example (LLM metrics) - -This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) - +### Automated Test Execution -```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py -import os +Run comprehensive tests after upgrades: -from pydantic import SecretStr +``` +After the dependency upgrades, validate the application: -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +1. Run the full test suite (unit, integration, e2e) +2. Check test coverage hasn't decreased +3. Run type checking (if applicable) +4. Run linting with new lint rule versions +5. Build the application for production +6. Report any failures with analysis +``` +### Integration Testing -logger = get_logger(__name__) +Verify integrations still work: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +``` +Test our integrations after upgrading the AWS SDK: -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +1. Test S3 operations (upload, download, list) +2. Test DynamoDB operations (CRUD) +3. Test Lambda invocations +4. Test SQS send/receive +5. Compare behavior to before the upgrade +6. Note any subtle differences +``` -# Add MCP Tools -mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} +### Regression Detection -# Agent -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) +Detect regressions from upgrades: -llm_messages = [] # collect raw LLM messages +``` +Check for regressions after upgrading the ORM: +1. Run database operation benchmarks +2. Compare query performance before and after +3. Verify all migrations still work +4. Check for any N+1 queries introduced +5. Validate data integrity in test database +6. Document any behavioral changes +``` -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +## Additional Examples +### Security-Driven Upgrade -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, -) +``` +We have a critical security vulnerability in jsonwebtoken. -logger.info("Starting conversation with MCP integration...") -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands and write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() +Current: jsonwebtoken@8.5.1 +Required: jsonwebtoken@9.0.0 -conversation.send_message("Great! Now delete that file.") -conversation.run() +Perform the upgrade: +1. Check for breaking changes in v9 +2. Find all usages of jsonwebtoken in our code +3. Update any deprecated methods +4. Update the package version +5. Verify all JWT operations work +6. Run security tests +``` -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +### Framework Major Upgrade -assert llm.metrics is not None -print( - f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" -) +``` +Upgrade our Next.js application from 12 to 14: -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") +Key areas to address: +1. App Router migration (pages -> app) +2. New metadata API +3. Server Components by default +4. New Image component +5. Route handlers replacing API routes + +For each area: +- Show current implementation +- Show new implementation +- Test the changes ``` - +### Multi-Package Coordinated Upgrade -## Using LLM Registry for Cost Tracking +``` +Upgrade our React ecosystem packages together: -> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! +Current: +- react: 17.0.2 +- react-dom: 17.0.2 +- react-router-dom: 5.3.0 +- @testing-library/react: 12.1.2 -The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. +Target: +- react: 18.2.0 +- react-dom: 18.2.0 +- react-router-dom: 6.x +- @testing-library/react: 14.x -### How the LLM Registry Works +Create an upgrade plan that handles all these together, +addressing breaking changes in the correct order. +``` -Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: +## Related Resources -1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` -2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` -3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` -4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID +- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities +- [Security Guide](/sdk/guides/security) - Security best practices for AI agents +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. +### Incident Triage +Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md + +When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). -### Ready-to-run Example (LLM Registry) -This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) +This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). +## Overview -```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py -import os +Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. -from pydantic import SecretStr +What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - LLMRegistry, - Message, - TextContent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.terminal import TerminalTool +OpenHands accelerates incident response by: +- **Automated error analysis**: AI agents investigate errors and provide detailed reports +- **Root cause identification**: Connect symptoms to underlying issues in your codebase +- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues +- **Integration with monitoring tools**: Work directly with platforms like Datadog -logger = get_logger(__name__) +## Automated Datadog Error Analysis -# Configure LLM using LLMRegistry -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. -# Create LLM instance -main_llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +### How It Works -# Create LLM registry and add the LLM -llm_registry = LLMRegistry() -llm_registry.add(main_llm) +[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. -# Get LLM from registry -llm = llm_registry.get("agent") +[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. -# Tools -cwd = os.getcwd() -tools = [Tool(name=TerminalTool.name)] +### Triggering Automated Debugging -# Agent -agent = Agent(llm=llm, tools=tools) +The GitHub Actions workflow can be triggered in two ways: -llm_messages = [] # collect raw LLM messages +1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. +2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +### Automated Investigation Process +When the workflow runs, it automatically performs the following steps: -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) +1. Get detailed info from the DataDog API +2. Create or find an existing GitHub issue to track the error +3. Clone all relevant repositories to get full code context +4. Run an OpenHands agent to analyze the error and investigate the code +5. Post the findings as a comment on the GitHub issue -conversation.send_message("Please echo 'Hello!'") -conversation.run() +The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") + +The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. + -print("=" * 100) -print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") +## Setting Up the Workflow -# Demonstrate getting the same LLM instance from registry -same_llm = llm_registry.get("agent") -print(f"Same LLM instance: {llm is same_llm}") +To set up automated Datadog debugging in your own repository: -# Demonstrate requesting a completion directly from an LLM -resp = llm.completion( - messages=[ - Message(role="user", content=[TextContent(text="Say hello in one word.")]) - ] -) -# Access the response content via OpenHands LLMResponse -msg = resp.message -texts = [c.text for c in msg.content if isinstance(c, TextContent)] -print(f"Direct completion response: {texts[0] if texts else str(msg)}") +1. Copy the workflow file to `.github/workflows/` in your repository +2. Configure the required secrets (Datadog API keys, LLM API key) +3. Customize the default queries and repository lists for your needs +4. Run the workflow manually or set up scheduled runs -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` - +The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. -### Getting Aggregated Conversation Costs +Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. - -This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) - +## Manual Incident Investigation -Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. +You can also use OpenHands directly to investigate incidents without the automated workflow. -```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py -import os +### Log Analysis -from pydantic import SecretStr -from tabulate import tabulate +OpenHands can analyze logs to identify patterns and anomalies: -from openhands.sdk import ( - LLM, - Agent, - Conversation, - LLMSummarizingCondenser, - Message, - TextContent, - get_logger, -) -from openhands.sdk.tool.spec import Tool -from openhands.tools.terminal import TerminalTool +``` +Analyze these application logs for the incident that occurred at 14:32 UTC: +1. Identify the first error or warning that appeared +2. Trace the sequence of events leading to the failure +3. Find any correlated errors across services +4. Identify the user or request that triggered the issue +5. Summarize the timeline of events +``` -logger = get_logger(__name__) +**Log analysis capabilities:** -# Configure LLM using LLMRegistry -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") +| Log Type | Analysis Capabilities | +|----------|----------------------| +| Application logs | Error patterns, exception traces, timing anomalies | +| Access logs | Traffic patterns, slow requests, error responses | +| System logs | Resource exhaustion, process crashes, system errors | +| Database logs | Slow queries, deadlocks, connection issues | -# Create LLM instance -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +### Stack Trace Analysis -llm_condenser = LLM( - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - usage_id="condenser", -) +Deep dive into stack traces: -# Tools -condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) +``` +Analyze this stack trace from our production error: -cwd = os.getcwd() -agent = Agent( - llm=llm, - tools=[ - Tool( - name=TerminalTool.name, - ), - ], - condenser=condenser, -) +[paste full stack trace] -conversation = Conversation(agent=agent, workspace=cwd) -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text="Please echo 'Hello!'")], - ) -) -conversation.run() +1. Identify the exception type and message +2. Trace back to our code (not framework code) +3. Identify the likely cause +4. Check if this code path has changed recently +5. Suggest a fix +``` -# Demonstrate extraneous costs part of the conversation -second_llm = LLM( - usage_id="demo-secondary", - model=model, - base_url=os.getenv("LLM_BASE_URL"), - api_key=SecretStr(api_key), -) -conversation.llm_registry.add(second_llm) -completion_response = second_llm.completion( - messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] -) +**Multi-language support:** -# Access total spend -spend = conversation.conversation_stats.get_combined_metrics() -print("\n=== Total Spend for Conversation ===\n") -print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") -if spend.accumulated_token_usage: - print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") - print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") - print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") - print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + + + ``` + Analyze this Java exception: + + java.lang.OutOfMemoryError: Java heap space + at java.util.Arrays.copyOf(Arrays.java:3210) + at java.util.ArrayList.grow(ArrayList.java:265) + at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) + + Identify: + 1. What operation is consuming memory? + 2. Is there a memory leak or just too much data? + 3. What's the fix? + ``` + + + ``` + Analyze this Python traceback: + + Traceback (most recent call last): + File "app/api/orders.py", line 45, in create_order + order = OrderService.create(data) + File "app/services/order.py", line 89, in create + inventory.reserve(item_id, quantity) + AttributeError: 'NoneType' object has no attribute 'reserve' + + What's None and why? + ``` + + + ``` + Analyze this Node.js error: + + TypeError: Cannot read property 'map' of undefined + at processItems (/app/src/handlers/items.js:23:15) + at async handleRequest (/app/src/api/router.js:45:12) + + What's undefined and how should we handle it? + ``` + + -spend_per_usage = conversation.conversation_stats.usage_to_metrics -print("\n=== Spend Breakdown by Usage ID ===\n") -rows = [] -for usage_id, metrics in spend_per_usage.items(): - rows.append( - [ - usage_id, - f"${metrics.accumulated_cost:.6f}", - metrics.accumulated_token_usage.prompt_tokens - if metrics.accumulated_token_usage - else 0, - metrics.accumulated_token_usage.completion_tokens - if metrics.accumulated_token_usage - else 0, - ] - ) +### Root Cause Analysis -print( - tabulate( - rows, - headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], - tablefmt="github", - ) -) +Identify the underlying cause of an incident: -# Report cost -cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost -print(f"EXAMPLE_COST: {cost}") ``` +Perform root cause analysis for this incident: - +Symptoms: +- API response times increased 5x at 14:00 +- Error rate jumped from 0.1% to 15% +- Database CPU spiked to 100% -### Understanding Conversation Stats +Available data: +- Application metrics (Grafana dashboard attached) +- Recent deployments: v2.3.1 deployed at 13:45 +- Database slow query log (attached) -The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: +Identify the root cause using the 5 Whys technique. +``` -#### Key Methods and Properties +## Common Incident Patterns -- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. - -- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. +OpenHands can recognize and help diagnose these common patterns: + +- **Connection pool exhaustion**: Increasing connection errors followed by complete failure +- **Memory leaks**: Gradual memory increase leading to OOM +- **Cascading failures**: One service failure triggering others +- **Thundering herd**: Simultaneous requests overwhelming a service +- **Split brain**: Inconsistent state across distributed components -- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. +## Quick Fix Generation -```python icon="python" focus={2, 6, 10} -# Get combined metrics for the entire conversation -total_metrics = conversation.conversation_stats.get_combined_metrics() -print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") +Once the root cause is identified, generate fixes: -# Get metrics for a specific LLM by usage ID -agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") -print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") +``` +We've identified the root cause: a missing null check in OrderProcessor.java line 156. -# Access all usage IDs and their metrics -for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): - print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +Generate a fix that: +1. Adds proper null checking +2. Logs when null is encountered +3. Returns an appropriate error response +4. Includes a unit test for the edge case +5. Is minimally invasive for a hotfix ``` -## Next Steps +## Best Practices -- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs -- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models +### Investigation Checklist -### Observability & Tracing -Source: https://docs.openhands.dev/sdk/guides/observability.md +Use this checklist when investigating: -> A full setup example is available [here](#example:-full-setup)! +1. **Scope the impact** + - How many users affected? + - What functionality is broken? + - What's the business impact? -## Overview +2. **Establish timeline** + - When did it start? + - What changed around that time? + - Is it getting worse or stable? -The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: +3. **Gather data** + - Application logs + - Infrastructure metrics + - Recent deployments + - Configuration changes -- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support -- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing -- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more +4. **Form hypotheses** + - List possible causes + - Rank by likelihood + - Test systematically -The SDK automatically traces: -- Agent execution steps -- Tool calls and executions -- LLM API calls (via LiteLLM integration) -- Browser automation sessions (when using browser-use) -- Conversation lifecycle events +5. **Implement fix** + - Choose safest fix + - Test before deploying + - Monitor after deployment -## Quick Start +### Common Pitfalls -Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. + +Avoid these common incident response mistakes: -### Using Laminar +- **Jumping to conclusions**: Gather data before assuming the cause +- **Changing multiple things**: Make one change at a time to isolate effects +- **Not documenting**: Record all actions for the post-mortem +- **Ignoring rollback**: Always have a rollback plan before deploying fixes + -[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: + +For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. + -```bash icon="terminal" wrap -# Set your Laminar project API key -export LMNR_PROJECT_API_KEY="your-laminar-api-key" -``` +## Related Resources -That's it! Run your agent code normally and traces will be sent to Laminar automatically. +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -### Using Honeycomb or Other OTLP Backends +### Spark Migrations +Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md -For Honeycomb, Jaeger, or any other OTLP-compatible backend: +Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. -```bash icon="terminal" wrap -# Required: Set the OTLP endpoint -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +## Overview -# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" +Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. -# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it -``` +Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. -### Alternative Configuration Methods +Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. -You can also use these alternative environment variable formats: +Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: -```bash icon="terminal" wrap -# Short form for endpoint -export OTEL_ENDPOINT="http://localhost:4317" +1. **Understanding**: Analyze the existing codebase to identify what needs to change and why +2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences +3. **Validation**: Verify that migrated pipelines produce identical results to the originals -# Alternative header format -export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" +In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. -# Alternative protocol specification -export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" -``` +## Understanding -## How It Works +Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. -The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: +Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. -1. **Detects Configuration**: Checks for OTEL environment variables on startup -2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter -3. **Instruments Code**: Automatically wraps key functions with tracing decorators -4. **Captures Context**: Associates traces with conversation IDs for session grouping -5. **Exports Spans**: Sends trace data to your configured backend +If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: -### What Gets Traced +``` +Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. -The SDK automatically instruments these components: +Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. -- **`agent.step`** - Each iteration of the agent's execution loop -- **Tool Executions** - Individual tool calls with input/output capture -- **LLM Calls** - API requests to language models via LiteLLM -- **Conversation Lifecycle** - Message sending, conversation runs, and title generation -- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) +Then, for each source file, identify -### Trace Hierarchy +1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) +2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) +3. Configuration properties that have changed defaults or been renamed +4. Dependencies that need version updates -Traces are organized hierarchically: +Save the results in `migration_inventory.json` in the following format: - - - - - - - - - - - - - +{ + ..., + "src/main/scala/etl/TransformJob.scala": { + "deprecated_apis": [ + {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} + ], + "behavioral_changes": [ + {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} + ], + "config_changes": [], + "risk": "medium" + }, + ... +} +``` -Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single -conversation together in your observability platform. +Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. -Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. +## Migration -## Configuration Reference +With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. -### Environment Variables +To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. -The SDK checks for these environment variables (in order of precedence): +The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. -| Variable | Description | Example | -|----------|-------------|---------| -| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | -| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | -| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | -| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | -| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | -| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | -| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | -| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | +An example prompt is below: -### Header Format +``` +Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. -Headers should be comma-separated `key=value` pairs with URL encoding for special characters: +Use `migration_inventory.json` to guide the changes. -```bash icon="terminal" wrap -# Single header -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" +For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. -# Multiple headers -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +Requirements: +1. Replace all deprecated APIs with their Spark 3.0 equivalents +2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) +3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions +4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical +5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists +6. Update import statements for any relocated classes +7. Preserve all existing business logic and output schemas ``` -### Protocol Options +Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. -The SDK supports both HTTP and gRPC protocols: +## Validation -- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) -- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) +Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. -## Platform-Specific Configuration +The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. -### Laminar Setup +An example prompt for data-level comparison: -1. Sign up at [laminar.sh](https://laminar.sh/) -2. Create a project and copy your API key -3. Set the environment variable: +``` +Validate the migrated Spark application in `/src` against the original. -```bash icon="terminal" wrap -export LMNR_PROJECT_API_KEY="your-laminar-api-key" +1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` +2. Compare outputs: + - Row counts must match exactly + - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns + - Flag any NULL handling differences +3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments +4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory + +Save the results in `validation_report.json` in the following format: + +{ + "jobs": [ + { + "name": "daily_etl", + "data_match": true, + "row_count": {"v2": 1000000, "v3": 1000000}, + "column_diffs": [], + "performance": { + "duration_seconds": {"v2": 340, "v3": 285}, + "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} + } + }, + ... + ] +} ``` -**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. - -### Honeycomb Setup +Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. -1. Sign up at [honeycomb.io](https://www.honeycomb.io/) -2. Get your API key from the account settings -3. Configure the environment: +Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. -```bash icon="terminal" wrap -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" -``` +## Beyond Version Upgrades -### Jaeger Setup +While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: -For local development with Jaeger: +- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. +- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. -```bash icon="terminal" wrap -# Start Jaeger all-in-one container -docker run -d --name jaeger \ - -p 4317:4317 \ - -p 16686:16686 \ - jaegertracing/all-in-one:latest +In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. -# Configure SDK -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" -``` +## Related Resources -Access the Jaeger UI at http://localhost:16686 +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -### Generic OTLP Collector +### Vulnerability Remediation +Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md -For other backends, use their OTLP endpoint: +Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. -```bash icon="terminal" wrap -export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" -export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" -export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" -``` +## The Challenge -## Advanced Usage +The traditional approach to vulnerability remediation is manual and time-consuming: -### Disabling Observability +1. Scan repositories for vulnerabilities +2. Review each vulnerability and its impact +3. Research the fix (usually a version upgrade) +4. Update dependency files +5. Test the changes +6. Create pull requests +7. Get reviews and merge -To disable tracing, simply unset all OTEL environment variables: +This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. -```bash icon="terminal" wrap -unset LMNR_PROJECT_API_KEY -unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT -unset OTEL_EXPORTER_OTLP_ENDPOINT -unset OTEL_ENDPOINT -``` +**What if we could automate this entire process using AI agents?** -The SDK will automatically skip all tracing instrumentation with minimal overhead. +## Automated Vulnerability Remediation with OpenHands -### Custom Span Attributes +The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. -The SDK automatically adds these attributes to spans: +OpenHands assists with vulnerability remediation by: -- **`conversation_id`** - UUID of the conversation -- **`tool_name`** - Name of the tool being executed -- **`action.kind`** - Type of action being performed -- **`session_id`** - Groups all traces from one conversation +- **Identifying vulnerabilities**: Analyzing code for common security issues +- **Understanding impact**: Explaining the risk and exploitation potential +- **Implementing fixes**: Generating secure code to address vulnerabilities +- **Validating remediation**: Verifying fixes are effective and complete -### Debugging Tracing Issues +## Two Approaches to Vulnerability Fixing -If traces aren't appearing in your observability platform: +### 1. Point to a GitHub Repository -1. **Verify Environment Variables**: - ```python icon="python" wrap - import os +Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. - otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') - otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') +### 2. Upload Security Scanner Reports - print(f"OTEL Endpoint: {otel_endpoint}") - print(f"OTEL Headers: {otel_headers}") - ``` +Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. -2. **Check SDK Logs**: The SDK logs observability initialization at debug level: - ```python icon="python" wrap - import logging +This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. - logging.basicConfig(level=logging.DEBUG) - ``` +## Architecture Overview -3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: - ```bash icon="terminal" wrap - curl -v https://api.honeycomb.io:443/v1/traces - ``` +A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. -4. **Validate Headers**: Check that authentication headers are properly URL-encoded +The key architectural components include: -## Troubleshooting +- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) +- **WebSocket interface**: Enables real-time status updates on agent actions and operations +- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider +- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud -### Traces Not Appearing +This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. -**Problem**: No traces showing up in observability platform +## Example: Vulnerability Fixer Application -**Solutions**: -- Verify environment variables are set correctly -- Check network connectivity to OTLP endpoint -- Ensure authentication headers are valid -- Look for SDK initialization logs at debug level +An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: -### High Trace Volume +1. User points to a repository or uploads a security scan report +2. Agent analyzes the vulnerabilities +3. Agent creates fixes and pull requests automatically +4. User reviews and merges the changes -**Problem**: Too many spans being generated +## Security Scanning Integration -**Solutions**: -- Configure sampling at the collector level -- For Laminar with non-browser tools, browser instrumentation is automatically disabled -- Use backend-specific filtering rules +Use OpenHands to analyze security scanner output: -### Performance Impact +``` +We ran a security scan and found these issues. Analyze each one: -**Problem**: Concerned about tracing overhead +1. SQL Injection in src/api/users.py:45 +2. XSS in src/templates/profile.html:23 +3. Hardcoded credential in src/config/database.py:12 +4. Path traversal in src/handlers/files.py:67 -**Solutions**: -- Tracing has minimal overhead when properly configured -- Disable tracing in development by unsetting environment variables -- Use asynchronous exporters (default in most OTLP configurations) +For each vulnerability: +- Explain what the vulnerability is +- Show how it could be exploited +- Rate the severity (Critical/High/Medium/Low) +- Suggest a fix +``` -## Example: Full Setup +## Common Vulnerability Patterns - -This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) - +OpenHands can detect these common vulnerability patterns: -```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py -""" -Observability & Laminar example +| Vulnerability | Pattern | Example | +|--------------|---------|---------| +| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | +| XSS | Unescaped user input in HTML | `
${user_comment}
` | +| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | +| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | +| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | -This example demonstrates enabling OpenTelemetry tracing with Laminar in the -OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. -""" +## Automated Remediation -import os +### Applying Security Patches -from pydantic import SecretStr +Fix identified vulnerabilities: -from openhands.sdk import LLM, Agent, Conversation, Tool -from openhands.tools.terminal import TerminalTool + + + ``` + Fix the SQL injection vulnerability in src/api/users.py: + + Current code: + query = f"SELECT * FROM users WHERE id = {user_id}" + cursor.execute(query) + + Requirements: + 1. Use parameterized queries + 2. Add input validation + 3. Maintain the same functionality + 4. Add a test case for the fix + ``` + + **Fixed code:** + ```python + # Using parameterized query + query = "SELECT * FROM users WHERE id = %s" + cursor.execute(query, (user_id,)) + ``` + + + ``` + Fix the XSS vulnerability in src/templates/profile.html: + + Current code: +
${user.bio}
+ + Requirements: + 1. Properly escape user content + 2. Consider Content Security Policy + 3. Handle rich text if needed + 4. Test with malicious input + ``` + + **Fixed code:** + ```html + +
{{ user.bio | escape }}
+ ``` +
+ + ``` + Fix the command injection in src/utils/network.py: + + Current code: + def ping_host(hostname): + os.system(f"ping -c 1 {hostname}") + + Requirements: + 1. Use safe subprocess calls + 2. Validate input format + 3. Avoid shell=True + 4. Handle errors properly + ``` + + **Fixed code:** + ```python + import subprocess + import re + + def ping_host(hostname): + # Validate hostname format + if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): + raise ValueError("Invalid hostname") + + # Use subprocess without shell + result = subprocess.run( + ["ping", "-c", "1", hostname], + capture_output=True, + text=True + ) + return result.returncode == 0 + ``` + +
+### Code-Level Vulnerability Fixes -# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: -# export LMNR_PROJECT_API_KEY="your-laminar-api-key" -# For non-Laminar OTLP backends, set OTEL_* variables instead. +Fix application-level security issues: -# Configure LLM and Agent -api_key = os.getenv("LLM_API_KEY") -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - model=model, - api_key=SecretStr(api_key) if api_key else None, - base_url=base_url, - usage_id="agent", -) +``` +Fix the broken access control in our API: -agent = Agent( - llm=llm, - tools=[Tool(name=TerminalTool.name)], -) +Issue: Users can access other users' data by changing the ID in the URL. -# Create conversation and run a simple task -conversation = Conversation(agent=agent, workspace=".") -conversation.send_message("List the files in the current directory and print them.") -conversation.run() -print( - "All done! Check your Laminar dashboard for traces " - "(session is the conversation UUID)." -) -``` +Current code: +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int): + return db.get_documents(user_id) -```bash Running the Example -export LMNR_PROJECT_API_KEY="your-laminar-api-key" -cd software-agent-sdk -uv run python examples/01_standalone_sdk/27_observability_laminar.py +Requirements: +1. Add authorization check +2. Verify requesting user matches or is admin +3. Return 403 for unauthorized access +4. Log access attempts +5. Add tests for authorization ``` -## Next Steps +**Fixed code:** -- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces -- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application -- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions +```python +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int, current_user: User = Depends(get_current_user)): + # Check authorization + if current_user.id != user_id and not current_user.is_admin: + logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") + raise HTTPException(status_code=403, detail="Not authorized") + + return db.get_documents(user_id) +``` -### Plugins -Source: https://docs.openhands.dev/sdk/guides/plugins.md +## Security Testing -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Test your fixes thoroughly: -Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: +``` +Create security tests for the SQL injection fix: -- **Skills**: Specialized knowledge and workflows -- **Hooks**: Event handlers for tool lifecycle -- **MCP Config**: External tool server configurations -- **Agents**: Specialized agent definitions -- **Commands**: Slash commands +1. Test with normal input +2. Test with SQL injection payloads: + - ' OR '1'='1 + - '; DROP TABLE users; -- + - UNION SELECT * FROM passwords +3. Test with special characters +4. Test with null/empty input +5. Verify error handling doesn't leak information +``` -The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). +## Automated Remediation Pipeline -## Plugin Structure +Create an end-to-end automated pipeline: - -See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. - +``` +Create an automated vulnerability remediation pipeline: -A plugin follows this directory structure: +1. Parse Snyk/Dependabot/CodeQL alerts +2. Categorize by severity and type +3. For each vulnerability: + - Create a branch + - Apply the fix + - Run tests + - Create a PR with: + - Description of vulnerability + - Fix applied + - Test results +4. Request review from security team +5. Auto-merge low-risk fixes after tests pass +``` - - - - - - - - - - - - - - - - - - - - - - - +## Building Your Own Vulnerability Fixer -Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. +The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. -### Plugin Manifest +To build your own vulnerability remediation agent: -The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: +1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent +2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) +3. Configure the agent to create pull requests automatically +4. Set up human review workflows for critical fixes -```json icon="file-code" wrap -{ - "name": "code-quality", - "version": "1.0.0", - "description": "Code quality tools and workflows", - "author": "openhands", - "license": "MIT", - "repository": "https://github.com/example/code-quality-plugin" -} -``` +As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. -### Skills +## Related Resources -Skills are defined in markdown files with YAML frontmatter: +- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example +- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents +- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts -```markdown icon="file-code" ---- -name: python-linting -description: Instructions for linting Python code -trigger: - type: keyword - keywords: - - lint - - linting - - code quality ---- +### Windows Without WSL +Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl.md -# Python Linting Skill + + This way of running OpenHands is not officially supported. It is maintained by the community and may not work. + -Run ruff to check for issues: +# Running OpenHands GUI on Windows Without WSL -\`\`\`bash -ruff check . -\`\`\` -``` +This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. -### Hooks +## Prerequisites -Hooks are defined in `hooks/hooks.json`: +1. **Windows 10/11** - A modern Windows operating system +2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) +3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet +4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) +5. **Git** - For cloning the repository and version control +6. **Node.js and npm** - For running the frontend -```json icon="file-code" wrap -{ - "hooks": { - "PostToolUse": [ - { - "matcher": "file_editor", - "hooks": [ - { - "type": "command", - "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", - "timeout": 5 - } - ] - } - ] - } -} -``` +## Step 1: Install Required Software -### MCP Configuration +1. **Install Python 3.12 or 3.13** + - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) + - During installation, check "Add Python to PATH" + - Verify installation by opening PowerShell and running: + ```powershell + python --version + ``` -MCP servers are configured in `.mcp.json`: +2. **Install PowerShell 7** + - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) + - Choose the MSI installer appropriate for your system (x64 for most modern computers) + - Run the installer with default options + - Verify installation by opening a new terminal and running: + ```powershell + pwsh --version + ``` + - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors -```json wrap icon="file-code" -{ - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} -``` +3. **Install .NET Core Runtime** + - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose the latest .NET Core Runtime (not SDK) + - Verify installation by opening PowerShell and running: + ```powershell + dotnet --info + ``` + - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. -## Using Plugin Components +4. **Install Git** + - Download Git from [git-scm.com](https://git-scm.com/download/win) + - Use default installation options + - Verify installation: + ```powershell + git --version + ``` -> The ready-to-run example is available [here](#ready-to-run-example)! +5. **Install Node.js and npm** + - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) + - During installation, accept the default options which will install npm as well + - Verify installation: + ```powershell + node --version + npm --version + ``` -Brief explanation on how to use a plugin with an agent. +6. **Install Poetry** + - Open PowerShell as Administrator and run: + ```powershell + (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - + ``` + - Add Poetry to your PATH: + ```powershell + $env:Path += ";$env:APPDATA\Python\Scripts" + ``` + - Verify installation: + ```powershell + poetry --version + ``` - - - ### Loading a Plugin - First, load the desired plugins. +## Step 2: Clone and Set Up OpenHands - ```python icon="python" - from openhands.sdk.plugin import Plugin +1. **Clone the Repository** + ```powershell + git clone https://github.com/OpenHands/OpenHands.git + cd OpenHands + ``` - # Load a single plugin - plugin = Plugin.load("/path/to/plugin") +2. **Install Dependencies** + ```powershell + poetry install + ``` - # Load all plugins from a directory - plugins = Plugin.load_all("/path/to/plugins") - ``` - - - ### Accessing Components - You can access the different plugin components to see which ones are available. + This will install all required dependencies, including: + - pythonnet - Required for Windows PowerShell integration + - All other OpenHands dependencies - ```python icon="python" - # Skills - for skill in plugin.skills: - print(f"Skill: {skill.name}") +## Step 3: Run OpenHands - # Hooks configuration - if plugin.hooks: - print(f"Hooks configured: {plugin.hooks}") +1. **Build the Frontend** + ```powershell + cd frontend + npm install + npm run build + cd .. + ``` - # MCP servers - if plugin.mcp_config: - servers = plugin.mcp_config.get("mcpServers", {}) - print(f"MCP servers: {list(servers.keys())}") - ``` - - - ### Using with an Agent - You can now feed your agent with your preferred plugin. + This will build the frontend files that the backend will serve. - ```python focus={3,10,17} icon="python" - # Create agent context with plugin skills - agent_context = AgentContext( - skills=plugin.skills, - ) +2. **Start the Backend** + ```powershell + # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell + pwsh + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` - # Create agent with plugin MCP config - agent = Agent( - llm=llm, - tools=tools, - mcp_config=plugin.mcp_config or {}, - agent_context=agent_context, - ) + This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. - # Create conversation with plugin hooks - conversation = Conversation( - agent=agent, - hook_config=plugin.hooks, - ) - ``` - - + > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. -## Ready-to-run Example + > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. - -This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) - +3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** + ```powershell + cd frontend + npm run dev + ``` -```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py -"""Example: Loading Plugins via Conversation +4. **Access the OpenHands GUI** -Demonstrates the recommended way to load plugins using the `plugins` parameter -on Conversation. Plugins bundle skills, hooks, and MCP config together. + Open your browser and navigate to: + ``` + http://localhost:3000 + ``` -For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins -""" + > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` -import os -import sys -import tempfile -from pathlib import Path +## Installing and Running the CLI -from pydantic import SecretStr +To install and run the OpenHands CLI on Windows without WSL, follow these steps: -from openhands.sdk import LLM, Agent, Conversation -from openhands.sdk.plugin import PluginSource -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### 1. Install uv (Python Package Manager) +Open PowerShell as Administrator and run: -# Locate example plugin directory -script_dir = Path(__file__).parent -plugin_path = script_dir / "example_plugins" / "code-quality" +```powershell +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +``` -# Define plugins to load -# Supported sources: local path, "github:owner/repo", or git URL -# Optional: ref (branch/tag/commit), repo_path (for monorepos) -plugins = [ - PluginSource(source=str(plugin_path)), - # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), - # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), -] +### 2. Install .NET SDK (Required) -# Check for API key -api_key = os.getenv("LLM_API_KEY") -if not api_key: - print("Set LLM_API_KEY to run this example") - print("EXAMPLE_COST: 0") - sys.exit(0) +The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: -# Configure LLM and Agent -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -llm = LLM( - usage_id="plugin-demo", - model=model, - api_key=SecretStr(api_key), - base_url=os.getenv("LLM_BASE_URL"), -) -agent = Agent( - llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] -) +```powershell +winget install Microsoft.DotNet.SDK.8 +``` -# Create conversation with plugins - skills, MCP config, and hooks are merged -# Note: Plugins are loaded lazily on first send_message() or run() call -with tempfile.TemporaryDirectory() as tmpdir: - conversation = Conversation( - agent=agent, - workspace=tmpdir, - plugins=plugins, - ) +Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). - # Test: The "lint" keyword triggers the python-linting skill - # This first send_message() call triggers lazy plugin loading - conversation.send_message("How do I lint Python code? Brief answer please.") +After installation, restart your PowerShell session to ensure the environment variables are updated. - # Verify skills were loaded from the plugin (after lazy loading) - skills = ( - conversation.agent.agent_context.skills - if conversation.agent.agent_context - else [] - ) - print(f"Loaded {len(skills)} skill(s) from plugins") +### 3. Install and Run OpenHands - conversation.run() +After installing the prerequisites, install OpenHands with: - print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +```powershell +uv tool install openhands --python 3.12 ``` - - +Then run OpenHands: -## Next Steps +```powershell +openhands +``` -- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers -- **[Hooks](/sdk/guides/hooks)** - Understand hook event types -- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers +To upgrade OpenHands in the future: -### Secret Registry -Source: https://docs.openhands.dev/sdk/guides/secrets.md +```powershell +uv tool upgrade openhands --python 3.12 +``` -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +### Troubleshooting CLI Issues -> A ready-to-run example is available [here](#ready-to-run-example)! +#### CoreCLR Error -The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. -It automatically detects secret references in bash commands, injects them as environment variables when needed, -and masks secret values in command outputs to prevent accidental exposure. +If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: -### Injecting Secrets +1. Install the .NET SDK as described in step 2 above +2. Verify that your system PATH includes the .NET SDK directories +3. Restart your PowerShell session completely after installing the .NET SDK +4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell -Use the `update_secrets()` method to add secrets to your conversation. +To verify your .NET installation, run: +```powershell +dotnet --info +``` -Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: +This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. -```python focus={4,11} icon="python" wrap -from openhands.sdk.conversation.secret_source import SecretSource +If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). -# Static secret -conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) +## Limitations on Windows -# Dynamic secret using SecretSource -class MySecretSource(SecretSource): - def get_value(self) -> str: - return "callable-based-secret" +When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: -conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) -``` +1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. -## Ready-to-run Example +2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. - -This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) - +3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. -```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py -import os +4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. -from pydantic import SecretStr +## Troubleshooting -from openhands.sdk import ( - LLM, - Agent, - Conversation, -) -from openhands.sdk.secret import SecretSource -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### "System.Management.Automation" Not Found Error +If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. -# Tools -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +To resolve this issue: -# Agent -agent = Agent(llm=llm, tools=tools) -conversation = Conversation(agent) +1. **Install the latest version of PowerShell 7** from the official Microsoft repository: + - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) + - Download and install the latest MSI package for your system architecture (x64 for most systems) + - During installation, ensure you select the following options: + - "Add PowerShell to PATH environment variable" + - "Register Windows PowerShell 7 as the default shell" + - "Enable PowerShell remoting" + - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default +2. **Restart your terminal or command prompt** to ensure the new PowerShell is available -class MySecretSource(SecretSource): - def get_value(self) -> str: - return "callable-based-secret" +3. **Verify the installation** by running: + ```powershell + pwsh --version + ``` + You should see output indicating PowerShell 7.x.x -conversation.update_secrets( - {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} -) +4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: + ```powershell + pwsh + cd path\to\openhands + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` -conversation.send_message("just echo $SECRET_TOKEN") + > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". -conversation.run() +5. **If the issue persists**, ensure that you have the .NET Runtime installed: + - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose ".NET Runtime" (not SDK) version 6.0 or later + - After installation, verify it's properly installed by running: + ```powershell + dotnet --info + ``` + - Restart your computer after installation + - Try running OpenHands again -conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") +6. **Ensure that the .NET Framework is properly installed** on your system: + - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off + - Make sure ".NET Framework 4.8 Advanced Services" is enabled + - Click OK and restart if prompted -conversation.run() +This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +## OpenHands Cloud - +### Bitbucket Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md -## Next Steps +## Prerequisites -- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP -- **[Security Analyzer](/sdk/guides/security)** - Add security validation +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). -### Security & Action Confirmation -Source: https://docs.openhands.dev/sdk/guides/security.md +## Adding Bitbucket Repository Access -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. -Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user -approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. +## Working With Bitbucket Repos in Openhands Cloud -## Confirmation Policy -> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! +After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! -Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. +![Connect Repo](/openhands/static/img/connect-repo.png) -### Setting Confirmation Policy +## IP Whitelisting -Set the confirmation policy on your conversation: +If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow +OpenHands to access your repositories: -```python icon="python" focus={4} -from openhands.sdk.security.confirmation_policy import AlwaysConfirm +### Core App IP +``` +34.68.58.200 +``` -conversation = Conversation(agent=agent, workspace=".") -conversation.set_confirmation_policy(AlwaysConfirm()) +### Runtime IPs +``` +34.10.175.217 +34.136.162.246 +34.45.0.142 +34.28.69.126 +35.224.240.213 +34.70.174.52 +34.42.4.87 +35.222.133.153 +34.29.175.97 +34.60.55.59 ``` -Available policies: -- **`AlwaysConfirm()`** - Require approval for all actions -- **`NeverConfirm()`** - Execute all actions without approval -- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) +## Next Steps -### Custom Confirmation Handler +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -Implement your approval logic by checking conversation status: +### Cloud API +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md -```python icon="python" focus={2-3,5} -while conversation.state.agent_status != AgentExecutionStatus.FINISHED: - if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not confirm_in_console(pending): - conversation.reject_pending_actions("User rejected") - continue - conversation.run() -``` +For the available API endpoints, refer to the +[OpenHands API Reference](https://docs.openhands.dev/api-reference). -### Rejecting Actions +## Obtaining an API Key -Provide feedback when rejecting to help the agent try a different approach: +To use the OpenHands Cloud API, you'll need to generate an API key: -```python icon="python" focus={2-5} -if not user_approved: - conversation.reject_pending_actions( - "User rejected because actions seem too risky." - "Please try a safer approach." - ) -``` +1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. +2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. +3. Click `Create API Key`. +4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. +5. Copy the generated API key and store it securely. It will only be shown once. -### Ready-to-run Example Confirmation +## API Usage Example (V1) - -Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) - +### Starting a New Conversation -Require user approval before executing agent actions: +To start a new conversation with OpenHands to perform a task, +make a POST request to the V1 app-conversations endpoint. -```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py -"""OpenHands Agent SDK — Confirmation Mode Example""" + + + ```bash + curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests -import os -import signal -from collections.abc import Callable + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/v1/app-conversations" -from pydantic import SecretStr + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } -from openhands.sdk import LLM, BaseConversation, Conversation -from openhands.sdk.conversation.state import ( - ConversationExecutionStatus, - ConversationState, -) -from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.tools.preset.default import get_default_agent + data = { + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + } + response = requests.post(url, headers=headers, json=data) + result = response.json() -# Make ^C a clean exit instead of a stack trace -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + # The response contains a start task with the conversation ID + conversation_id = result.get("app_conversation_id") or result.get("id") + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") + print(f"Status: {result['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/v1/app-conversations"; + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; -def _print_action_preview(pending_actions) -> None: - print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") + const data = { + initial_message: { + content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] + }, + selected_repository: "yourusername/your-repo" + }; + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); -def confirm_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). - """ - _print_action_preview(pending_actions) - while True: - try: - ans = ( - input("\nDo you want to execute these actions? (yes/no): ") - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\n❌ No input received; rejecting by default.") - return False + const result = await response.json(); - if ans in ("yes", "y"): - print("✅ Approved — executing actions…") - return True - if ans in ("no", "n"): - print("❌ Rejected — skipping actions…") - return False - print("Please enter 'yes' or 'no'.") + // The response contains a start task with the conversation ID + const conversationId = result.app_conversation_id || result.id; + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); + console.log(`Status: ${result.status}`); + return result; + } catch (error) { + console.error("Error starting conversation:", error); + } + } -def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: - """ - Drive the conversation until FINISHED. - If WAITING_FOR_CONFIRMATION, ask the confirmer; - on reject, call reject_pending_actions(). - Preserves original error if agent waits but no actions exist. - """ - while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: - if ( - conversation.state.execution_status - == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "⚠️ Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected the actions") - # Let the agent produce a new step or finish - continue + startConversation(); + ``` + + - print("▶️ Running conversation.run()…") - conversation.run() +#### Response +The API will return a JSON object with details about the conversation start task: -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "status": "WORKING", + "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", + "sandbox_id": "sandbox-abc123", + "created_at": "2025-01-15T10:30:00Z" +} +``` -agent = get_default_agent(llm=llm) -conversation = Conversation(agent=agent, workspace=os.getcwd()) +The `status` field indicates the current state of the conversation startup process: +- `WORKING` - Initial processing +- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready +- `PREPARING_REPOSITORY` - Cloning and setting up the repository +- `READY` - Conversation is ready to use +- `ERROR` - An error occurred during startup -# Conditionally add security analyzer based on environment variable -add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) -if add_security_analyzer: - print("Agent security analyzer added.") - conversation.set_security_analyzer(LLMSecurityAnalyzer()) +You may receive an authentication error if: -# 1) Confirmation mode ON -conversation.set_confirmation_policy(AlwaysConfirm()) -print("\n1) Command that will likely create actions…") -conversation.send_message("Please list the files in the current directory using ls -la") -run_until_finished(conversation, confirm_in_console) +- You provided an invalid API key. +- You provided the wrong repository name. +- You don't have access to the repository. -# 2) A command the user may choose to reject -print("\n2) Command the user may choose to reject…") -conversation.send_message("Please create a file called 'dangerous_file.txt'") -run_until_finished(conversation, confirm_in_console) +### Streaming Conversation Start (Optional) -# 3) Simple greeting (no actions expected) -print("\n3) Simple greeting (no actions expected)…") -conversation.send_message("Just say hello to me") -run_until_finished(conversation, confirm_in_console) +For real-time updates during conversation startup, you can use the streaming endpoint: -# 4) Disable confirmation mode and run commands directly -print("\n4) Disable confirmation mode and run a command…") -conversation.set_confirmation_policy(NeverConfirm()) -conversation.send_message("Please echo 'Hello from confirmation mode example!'") -conversation.run() +```bash +curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Your task description here"}] + }, + "selected_repository": "yourusername/your-repo" + }' +``` -conversation.send_message( - "Please delete any file that was created during this conversation." -) -conversation.run() +#### Streaming Response -print("\n=== Example Complete ===") -print("Key points:") -print( - "- conversation.run() creates actions; confirmation mode " - "sets execution_status=WAITING_FOR_CONFIRMATION" -) -print("- User confirmation is handled via a single reusable function") -print("- Rejection uses conversation.reject_pending_actions() and the loop continues") -print("- Simple responses work normally without actions") -print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") -``` +The endpoint streams a JSON array incrementally. Each element represents a status update: - +```json +[ + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} +] +``` ---- +Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. -## Security Analyzer +## Rate Limits -Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: +If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. +If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). -- **LOW** - Safe operations with minimal security impact -- **MEDIUM** - Moderate security impact, review recommended -- **HIGH** - Significant security impact, requires confirmation -- **UNKNOWN** - Risk level could not be determined +--- -Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. +## Migrating from V0 to V1 API -### LLM Security Analyzer + + The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. + Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. + -> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! +### Key Differences -The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. +| Feature | V0 API | V1 API | +|---------|--------|--------| +| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | +| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | +| Repository field | `repository` | `selected_repository` | +| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | -#### Security Analyzer Configuration +### Migration Steps -Create an LLM-based security analyzer to review actions before execution: +1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` -```python icon="python" focus={9} -from openhands.sdk import LLM -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -llm = LLM( - usage_id="security-analyzer", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) -security_analyzer = LLMSecurityAnalyzer(llm=security_llm) -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) -``` +2. **Update the request body**: + - Change `repository` to `selected_repository` + - Change `initial_user_msg` (string) to `initial_message` (object with content array): + ```json + // V0 format + { "initial_user_msg": "Your message here" } -The security analyzer: -- Reviews each action before execution -- Flags potentially dangerous operations -- Can be configured with custom security policy -- Uses a separate LLM to avoid conflicts with the main agent + // V1 format + { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } + ``` -#### Ready-to-run Example Security Analyzer +3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. - -Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) - +--- -Automatically analyze agent actions for security risks before execution: +## Legacy API (V0) - Deprecated -```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py -"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + + The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. + New integrations should use the V1 API documented above. + -This example shows how to use the LLMSecurityAnalyzer to automatically -evaluate security risks of actions before execution. -""" +### Starting a New Conversation (V0) -import os -import signal -from collections.abc import Callable + + + ```bash + curl -X POST "https://app.all-hands.dev/api/conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests -from pydantic import SecretStr + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/conversations" -from openhands.sdk import LLM, Agent, BaseConversation, Conversation -from openhands.sdk.conversation.state import ( - ConversationExecutionStatus, - ConversationState, -) -from openhands.sdk.security.confirmation_policy import ConfirmRisky -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + data = { + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + } -# Clean ^C exit: no stack trace noise -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + response = requests.post(url, headers=headers, json=data) + conversation = response.json() + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") + print(f"Status: {conversation['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/conversations"; -def _print_blocked_actions(pending_actions) -> None: - print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + const data = { + initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + repository: "yourusername/your-repo" + }; -def confirm_high_risk_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. - """ - _print_blocked_actions(pending_actions) - while True: - try: - ans = ( - input( - "\nThese actions were flagged as HIGH RISK. " - "Do you want to execute them anyway? (yes/no): " - ) - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\n❌ No input received; rejecting by default.") - return False + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); - if ans in ("yes", "y"): - print("✅ Approved — executing high-risk actions...") - return True - if ans in ("no", "n"): - print("❌ Rejected — skipping high-risk actions...") - return False - print("Please enter 'yes' or 'no'.") + const conversation = await response.json(); + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); + console.log(`Status: ${conversation.status}`); -def run_until_finished_with_security( - conversation: BaseConversation, confirmer: Callable[[list], bool] -) -> None: - """ - Drive the conversation until FINISHED. - - If WAITING_FOR_CONFIRMATION: ask the confirmer. - * On approve: set execution_status = IDLE (keeps original example’s behavior). - * On reject: conversation.reject_pending_actions(...). - - If WAITING but no pending actions: print warning and set IDLE (matches original). - """ - while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: - if ( - conversation.state.execution_status - == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "⚠️ Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected high-risk actions") - continue + return conversation; + } catch (error) { + console.error("Error starting conversation:", error); + } + } - print("▶️ Running conversation.run()...") - conversation.run() + startConversation(); + ``` + + +#### Response (V0) -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="security-analyzer", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +```json +{ + "status": "ok", + "conversation_id": "abc1234" +} +``` -# Tools -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +### Cloud UI +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md -# Agent -agent = Agent(llm=llm, tools=tools) +## Landing Page -# Conversation with persisted filestore -conversation = Conversation( - agent=agent, persistence_dir="./.conversations", workspace="." -) -conversation.set_security_analyzer(LLMSecurityAnalyzer()) -conversation.set_confirmation_policy(ConfirmRisky()) +The landing page is where you can: -print("\n1) Safe command (LOW risk - should execute automatically)...") -conversation.send_message("List files in the current directory") -conversation.run() +- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), + [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or + [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. +- Launch an empty conversation using `New Conversation`. +- See `Suggested Tasks` for repositories that OpenHands has access to. +- See your `Recent Conversations`. -print("\n2) Potentially risky command (may require confirmation)...") -conversation.send_message( - "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" -) -run_until_finished_with_security(conversation, confirm_high_risk_in_console) -``` +## Settings - +Settings are divided across tabs, with each tab focusing on a specific area of configuration. -### Custom Security Analyzer Implementation +- `User` + - Change your email address. +- `Integrations` + - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). +- `Application` + - Set your preferred language, notifications and other preferences. + - Toggle task suggestions on GitHub. + - Toggle Solvability Analysis. + - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). + - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). +- `LLM` + - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). +- `Billing` + - Add credits for using the OpenHands provider. +- `Secrets` + - [Manage secrets](/openhands/usage/settings/secrets-settings). +- `API Keys` + - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). +- `MCP` + - [Setup an MCP server](/openhands/usage/settings/mcp-settings) -You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. +## Key Features -#### Creating a Custom Analyzer +For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) +section of the documentation. -To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: +## Next Steps -```python icon="python" focus={5, 8} -from openhands.sdk.security.analyzer import SecurityAnalyzerBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.event.llm_convertible import ActionEvent +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -class CustomSecurityAnalyzer(SecurityAnalyzerBase): - """Custom security analyzer with domain-specific rules.""" - - def security_risk(self, action: ActionEvent) -> SecurityRisk: - """Evaluate security risk based on custom rules. - - Args: - action: The ActionEvent to analyze - - Returns: - SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) - """ - # Example: Check for specific dangerous patterns - action_str = str(action.action.model_dump()).lower() if action.action else "" +### GitHub Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation.md - # High-risk patterns - if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): - return SecurityRisk.HIGH - - # Medium-risk patterns - if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): - return SecurityRisk.MEDIUM - - # Default to low risk - return SecurityRisk.LOW +## Prerequisites -# Use your custom analyzer -security_analyzer = CustomSecurityAnalyzer() -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) -``` +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). - - For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). - +## Adding GitHub Repository Access +You can grant OpenHands access to specific GitHub repositories: ---- +1. Click on `+ Add GitHub Repos` in the repository selection dropdown. +2. Select your organization and choose the specific repositories to grant OpenHands access to. + + - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: + - Actions: Read and write + - Commit statuses: Read and write + - Contents: Read and write + - Issues: Read and write + - Metadata: Read-only + - Pull requests: Read and write + - Webhooks: Read and write + - Workflows: Read and write + - Repository access for a user is granted based on: + - Permission granted for the repository + - User's GitHub permissions (owner/collaborator) + -## Configurable Security Policy +3. Click `Install & Authorize`. -> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! +## Modifying Repository Access -Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. +You can modify GitHub repository access at any time by: +- Selecting `+ Add GitHub Repos` in the repository selection dropdown or +- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` +## Working With GitHub Repos in Openhands Cloud -### Using Custom Security Policies +Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the +`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click +on `Launch` to start the conversation! -You can provide a custom security policy template when creating an agent: +![Connect Repo](/openhands/static/img/connect-repo.png) -```python focus={9-13} icon="python" -from openhands.sdk import Agent, LLM +## Working on GitHub Issues and Pull Requests Using Openhands -llm = LLM( - usage_id="agent", - model="anthropic/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key), -) +To allow OpenHands to work directly from GitHub directly, you must +[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is +given, you can use OpenHands by labeling the issue or by tagging `@openhands`. -# Provide a custom security policy template file -agent = Agent( - llm=llm, - tools=tools, - security_policy_filename="my_security_policy.j2", -) -``` +### Working with Issues -Custom security policies allow you to: -- Define organization-specific risk assessment guidelines -- Set custom thresholds for security risk levels -- Add domain-specific security rules -- Tailor risk evaluation to your use case +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a pull request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. -The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. +### Working with Pull Requests -### Ready-to-run Example Security Policy +To get OpenHands to work on pull requests, mention `@openhands` in the comments to: +- Ask questions +- Request updates +- Get code explanations -Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) +The `@openhands` mention functionality in pull requests only works if the pull request is both +*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate +permissions to access both repositories. -Define custom security risk guidelines for your agent: - -```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py -"""OpenHands Agent SDK — Configurable Security Policy Example - -This example demonstrates how to use a custom security policy template -with an agent. Security policies define risk assessment guidelines that -help agents evaluate the safety of their actions. - -By default, agents use the built-in security_policy.j2 template. This -example shows how to: -1. Use the default security policy -2. Provide a custom security policy template embedded in the script -3. Apply the custom policy to guide agent behavior -""" -import os -import tempfile -from pathlib import Path +## Next Steps -from pydantic import SecretStr +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### GitLab Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md +## Prerequisites -logger = get_logger(__name__) +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). -# Define a custom security policy template inline -CUSTOM_SECURITY_POLICY = ( - "# 🔐 Custom Security Risk Policy\n" - "When using tools that support the security_risk parameter, assess the " - "safety risk of your actions:\n" - "\n" - "- **LOW**: Safe read-only actions.\n" - " - Viewing files, calculations, documentation.\n" - "- **MEDIUM**: Moderate container-scoped actions.\n" - " - File modifications, package installations.\n" - "- **HIGH**: Potentially dangerous actions.\n" - " - Network access, system modifications, data exfiltration.\n" - "\n" - "**Custom Rules**\n" - "- Always prioritize user data safety.\n" - "- Escalate to **HIGH** for any external data transmission.\n" -) +## Adding GitLab Repository Access -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. -# Tools -cwd = os.getcwd() -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] +## Working With GitLab Repos in Openhands Cloud -# Example 1: Agent with default security policy -print("=" * 100) -print("Example 1: Agent with default security policy") -print("=" * 100) -default_agent = Agent(llm=llm, tools=tools) -print(f"Security policy filename: {default_agent.security_policy_filename}") -print("\nDefault security policy is embedded in the agent's system message.") +After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! -# Example 2: Agent with custom security policy -print("\n" + "=" * 100) -print("Example 2: Agent with custom security policy") -print("=" * 100) +![Connect Repo](/openhands/static/img/connect-repo.png) -# Create a temporary file for the custom security policy -with tempfile.NamedTemporaryFile( - mode="w", suffix=".j2", delete=False, encoding="utf-8" -) as temp_file: - temp_file.write(CUSTOM_SECURITY_POLICY) - custom_policy_path = temp_file.name +## Using Tokens with Reduced Scopes -try: - # Create agent with custom security policy (using absolute path) - custom_agent = Agent( - llm=llm, - tools=tools, - security_policy_filename=custom_policy_path, - ) - print(f"Security policy filename: {custom_agent.security_policy_filename}") - print("\nCustom security policy loaded from temporary file.") +OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. +To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, +which will override the default token assigned to the agent. While the high-permission API token is still requested +and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. - # Verify the custom policy is in the system message - system_message = custom_agent.static_system_message - if "Custom Security Risk Policy" in system_message: - print("✓ Custom security policy successfully embedded in system message.") - else: - print("✗ Custom security policy not found in system message.") +## Working on GitLab Issues and Merge Requests Using Openhands - # Run a conversation with the custom agent - print("\n" + "=" * 100) - print("Running conversation with custom security policy") - print("=" * 100) + +This feature works for personal projects and is available for group projects with a +[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). - llm_messages = [] # collect raw LLM messages +A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into +OpenHands Cloud. - def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) + - conversation = Conversation( - agent=custom_agent, - callbacks=[conversation_callback], - workspace=".", - ) +Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. - conversation.send_message( - "Please create a simple Python script named hello.py that prints " - "'Hello, World!'. Make sure to follow security best practices." - ) - conversation.run() +### Working with Issues - print("\n" + "=" * 100) - print("Conversation finished.") - print(f"Total LLM messages: {len(llm_messages)}") - print("=" * 100) +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: - # Report cost - cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost - print(f"EXAMPLE_COST: {cost}") +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a merge request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. -finally: - # Clean up temporary file - Path(custom_policy_path).unlink(missing_ok=True) +### Working with Merge Requests -print("\n" + "=" * 100) -print("Example Summary") -print("=" * 100) -print("This example demonstrated:") -print("1. Using the default security policy (security_policy.j2)") -print("2. Creating a custom security policy template") -print("3. Applying the custom policy via security_policy_filename parameter") -print("4. Running a conversation with the custom security policy") -print( - "\nYou can customize security policies to match your organization's " - "specific requirements." -) -``` +To get OpenHands to work on merge requests, mention `@openhands` in the comments to: - +- Ask questions +- Request updates +- Get code explanations -## Next Steps +## Managing GitLab Webhooks -- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools -- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management +The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. -### Agent Skills & Context -Source: https://docs.openhands.dev/sdk/guides/skill.md +### Accessing Webhook Management -import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; +The webhook management table is available on the Integrations page when: -This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). +- You are signed in to OpenHands Cloud with a GitLab account +- Your GitLab token is connected -OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. +To access it: -## Context Loading Methods +1. Navigate to the `Settings > Integrations` page +2. Find the GitLab section +3. If your GitLab token is connected, you'll see the webhook management table below the connection status -| Method | When Content Loads | Use Case | -|--------|-------------------|----------| -| **Always-loaded** | At conversation start | Repository rules, coding standards | -| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | -| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | +### Viewing Webhook Status -## Always-Loaded Context +The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. -Content that's always in the system prompt. +- **Resource**: The name and full path of the project or group +- **Type**: Whether it's a "project" or "group" +- **Status**: The current webhook installation status: + - **Installed**: The webhook is active and working + - **Not Installed**: No webhook is currently installed + - **Failed**: A previous installation attempt failed (error details are shown below the status) -### Option 1: `AGENTS.md` (Auto-loaded) +### Reinstalling Webhooks -Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). +If a webhook is not installed or has failed, you can reinstall it: -```python icon="python" focus={3, 4} -from openhands.sdk.context.skills import load_project_skills +1. Find the resource in the webhook management table +2. Click the `Reinstall` button in the Action column +3. The button will show `Reinstalling...` while the operation is in progress +4. Once complete, the status will update to reflect the result -# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root -skills = load_project_skills(workspace_dir="/path/to/repo") -agent_context = AgentContext(skills=skills) -``` + + To reinstall an existing webhook, you must first delete the current webhook + from the GitLab UI before using the Reinstall button in OpenHands Cloud. + -### Option 2: Inline Skill (Code-defined) +**Important behaviors:** -```python icon="python" focus={5-11} -from openhands.sdk import AgentContext -from openhands.sdk.context import Skill +- The Reinstall button is disabled if the webhook is already installed +- Only one reinstall operation can run at a time +- After a successful reinstall, the button remains disabled to prevent duplicate installations +- If a reinstall fails, the error message is displayed below the status badge +- The resources list automatically refreshes after a reinstall completes -agent_context = AgentContext( - skills=[ - Skill( - name="code-style", - content="Always use type hints in Python.", - trigger=None, # No trigger = always loaded - ), - ] -) -``` +### Constraints and Limitations -## Trigger-Loaded Context +- The webhook management table only displays resources that are accessible with your connected GitLab token +- Webhook installation requires Admin or Owner permissions on the GitLab project or group -Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). +## Next Steps -```python icon="python" focus={6} -from openhands.sdk.context import Skill, KeywordTrigger +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. -Skill( - name="encryption-helper", - content="Use the encrypt.sh script to encrypt messages.", - trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), -) -``` +### Getting Started +Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md -When user says "encrypt this", the content is injected into the message: +## Accessing OpenHands Cloud -```xml icon="file" - -The following information has been included based on a keyword match for "encrypt". -Skill location: /path/to/encryption-helper +OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, +visit [app.all-hands.dev](https://app.all-hands.dev). -Use the encrypt.sh script to encrypt messages. - -``` +You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: -## Progressive Disclosure (AgentSkills Standard) +1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. +2. Review the permissions requested by OpenHands and authorize the application. + - OpenHands will require certain permissions from your account. To read more about these permissions, + you can click the `Learn more` link on the authorization page. +3. Review and accept the `terms of service` and select `Continue`. -For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. +## Next Steps -```python icon="python" -from openhands.sdk.context.skills import load_skills_from_dir +Once you've connected your account, you can: -# Load SKILL.md files from a directory -_, _, agent_skills = load_skills_from_dir("/path/to/skills") -agent_context = AgentContext(skills=list(agent_skills.values())) -``` +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). -Skills are listed in the system prompt: -```xml icon="file" - - - code-style - Project coding standards. - /path/to/code-style/SKILL.md - - -``` +### Jira Data Center Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md - -Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. - +# Jira Data Center Integration ---- +## Platform Configuration -## Full Example +### Step 1: Create Service Account - -Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) - +1. **Access User Management** + - Log in to Jira Data Center as administrator + - Go to **Administration** > **User Management** -```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py -import os +2. **Create User** + - Click **Create User** + - Username: `openhands-agent` + - Full Name: `OpenHands Agent` + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Password: Set a secure password + - Click **Create** -from pydantic import SecretStr +3. **Assign Permissions** + - Add user to appropriate groups + - Ensure access to relevant projects + - Grant necessary project permissions -from openhands.sdk import ( - LLM, - Agent, - AgentContext, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.context import ( - KeywordTrigger, - Skill, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +### Step 2: Generate API Token +1. **Personal Access Tokens** + - Log in as the service account + - Go to **Profile** > **Personal Access Tokens** + - Click **Create token** + - Name: `OpenHands Cloud Integration` + - Expiry: Set appropriate expiration (recommend 1 year) + - Click **Create** + - **Important**: Copy and store the token securely -logger = get_logger(__name__) +### Step 3: Configure Webhook -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) +1. **Create Webhook** + - Go to **Administration** > **System** > **WebHooks** + - Click **Create a WebHook** + - **Name**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` + - Set a suitable webhook secret + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) -# Tools -cwd = os.getcwd() -tools = [ - Tool( - name=TerminalTool.name, - ), - Tool(name=FileEditorTool.name), -] +--- -# AgentContext provides flexible ways to customize prompts: -# 1. Skills: Inject instructions (always-active or keyword-triggered) -# 2. system_message_suffix: Append text to the system prompt -# 3. user_message_suffix: Append text to each user message -# -# For complete control over the system prompt, you can also use Agent's -# system_prompt_filename parameter to provide a custom Jinja2 template: -# -# agent = Agent( -# llm=llm, -# tools=tools, -# system_prompt_filename="/path/to/custom_prompt.j2", -# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, -# ) -# -# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts -agent_context = AgentContext( - skills=[ - Skill( - name="repo.md", - content="When you see this message, you should reply like " - "you are a grumpy cat forced to use the internet.", - # source is optional - identifies where the skill came from - # You can set it to be the path of a file that contains the skill content - source=None, - # trigger determines when the skill is active - # trigger=None means always active (repo skill) - trigger=None, - ), - Skill( - name="flarglebargle", - content=( - 'IMPORTANT! The user has said the magic word "flarglebargle". ' - "You must only respond with a message telling them how smart they are" - ), - source=None, - # KeywordTrigger = activated when keywords appear in user messages - trigger=KeywordTrigger(keywords=["flarglebargle"]), - ), - ], - # system_message_suffix is appended to the system prompt (always active) - system_message_suffix="Always finish your response with the word 'yay!'", - # user_message_suffix is appended to each user message - user_message_suffix="The first character of your response should be 'I'", - # You can also enable automatic load skills from - # public registry at https://github.com/OpenHands/extensions - load_public_skills=True, -) +## Workspace Integration -# Agent -agent = Agent(llm=llm, tools=tools, agent_context=agent_context) +### Step 1: Log in to OpenHands Cloud -llm_messages = [] # collect raw LLM messages +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. +### Step 2: Configure Jira Data Center Integration -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Data Center** section +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The personal access token from Step 2 above + - Ensure **Active** toggle is enabled -conversation = Conversation( - agent=agent, callbacks=[conversation_callback], workspace=cwd -) + +Workspace name is the host name of your Jira Data Center instance. -print("=" * 100) -print("Checking if the repo skill is activated.") -conversation.send_message("Hey are you a grumpy cat?") -conversation.run() +Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 -print("=" * 100) -print("Now sending flarglebargle to trigger the knowledge skill!") -conversation.send_message("flarglebargle!") -conversation.run() +Here the workspace name is **jira.all-hands.dev**. + -print("=" * 100) -print("Now triggering public skill 'github'") -conversation.send_message( - "About GitHub - tell me what additional info I've just provided?" -) -conversation.run() +3. **Complete OAuth Flow** + - You'll be redirected to Jira Data Center to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") +### Managing Your Integration -# Report cost -cost = llm.metrics.accumulated_cost -print(f"EXAMPLE_COST: {cost}") -``` +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view - +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -### Creating Skills +### Screenshots -Skills are defined with a name, content (the instructions), and an optional trigger: + + +![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) + -```python icon="python" focus={3-14} -agent_context = AgentContext( - skills=[ - Skill( - name="AGENTS.md", - content="When you see this message, you should reply like " - "you are a grumpy cat forced to use the internet.", - trigger=None, # Always active - ), - Skill( - name="flarglebargle", - content='IMPORTANT! The user has said the magic word "flarglebargle". ' - "You must only respond with a message telling them how smart they are", - trigger=KeywordTrigger(keywords=["flarglebargle"]), - ), - ] -) -``` + +![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) + -### Keyword Triggers + +![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) + -Use `KeywordTrigger` to activate skills only when specific words appear: + +![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) + + -```python icon="python" focus={4} -Skill( - name="magic-word", - content="Special instructions when magic word is detected", - trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), -) -``` +### Jira Cloud Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md +# Jira Cloud Integration -## File-Based Skills (`SKILL.md`) +## Platform Configuration -For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. +### Step 1: Create Service Account - -Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) - +1. **Navigate to User Management** + - Go to [Atlassian Admin](https://admin.atlassian.com/) + - Select your organization + - Go to **Directory** > **Users** -### Directory Structure +2. **Create OpenHands Service Account** + - Click **Service accounts** + - Click **Create a service account** + - Name: `OpenHands Agent` + - Click **Next** + - Select **User** role for Jira app + - Click **Create** -Each skill is a directory containing: +### Step 2: Generate API Token - - - - - - - - - - - - - - +1. **Access Service Account Configuration** + - Locate the created service account from above step and click on it + - Click **Create API token** + - Set the expiry to 365 days (maximum allowed value) + - Click **Next** + - In **Select token scopes** screen, filter by following values + - App: Jira + - Scope type: Classic + - Scope actions: Write, Read + - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes + - Click **Next** + - Review and create API token + - **Important**: Copy and securely store the token immediately -where +### Step 3: Configure Webhook -| Component | Required | Description | -|-------|----------|-------------| -| `SKILL.md` | Yes | Skill definition with frontmatter | -| `scripts/` | No | Executable scripts | -| `references/` | No | Reference documentation | -| `assets/` | No | Static assets | +1. **Navigate to Webhook Settings** + - Go to **Jira Settings** > **System** > **WebHooks** + - Click **Create a WebHook** +2. **Configure Webhook** + - **Name**: `OpenHands Cloud Integration` + - **Status**: Enabled + - **URL**: `https://app.all-hands.dev/integration/jira/events` + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) +--- -### `SKILL.md` Format +## Workspace Integration -The `SKILL.md` file defines the skill with YAML frontmatter: +### Step 1: Log in to OpenHands Cloud -```md icon="markdown" ---- -name: my-skill # Required (standard) -description: > # Required (standard) - A brief description of what this skill does and when to use it. -license: MIT # Optional (standard) -compatibility: Requires bash # Optional (standard) -metadata: # Optional (standard) - author: your-name - version: "1.0" -triggers: # Optional (OpenHands extension) - - keyword1 - - keyword2 ---- +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. -# Skill Content +### Step 2: Configure Jira Integration -Instructions and documentation for the agent... -``` +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Cloud** section -#### Frontmatter Fields +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API token from Step 2 above + - Ensure **Active** toggle is enabled -| Field | Required | Description | -|-------|----------|-------------| -| `name` | Yes | Skill identifier (lowercase + hyphens) | -| `description` | Yes | What the skill does (shown to agent) | -| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | -| `license` | No | License name | -| `compatibility` | No | Environment requirements | -| `metadata` | No | Custom key-value pairs | + +Workspace name is the host name when accessing a resource in Jira Cloud. - -Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. - +Eg: https://all-hands.atlassian.net/browse/OH-55 -### Loading Skills +Here the workspace name is **all-hands**. + -Use `load_skills_from_dir()` to load all skills from a directory: +3. **Complete OAuth Flow** + - You'll be redirected to Jira Cloud to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view -```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py -"""Example: Loading Skills from Disk (AgentSkills Standard) +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -This example demonstrates how to load skills following the AgentSkills standard -from a directory on disk. +### Screenshots -Skills are modular, self-contained packages that extend an agent's capabilities -by providing specialized knowledge, workflows, and tools. They follow the -AgentSkills standard which includes: -- SKILL.md file with frontmatter metadata (name, description, triggers) -- Optional resource directories: scripts/, references/, assets/ + + +![workspace-link.png](/openhands/static/img/jira-user-link.png) + -The example_skills/ directory contains two skills: -- rot13-encryption: Has triggers (encrypt, decrypt) - listed in - AND content auto-injected when triggered -- code-style-guide: No triggers - listed in for on-demand access + +![workspace-link.png](/openhands/static/img/jira-admin-configure.png) + -All SKILL.md files follow the AgentSkills progressive disclosure model: -they are listed in with name, description, and location. -Skills with triggers get the best of both worlds: automatic content injection -when triggered, plus the agent can proactively read them anytime. -""" + +![workspace-link.png](/openhands/static/img/jira-user-unlink.png) + -import os -import sys -from pathlib import Path + +![workspace-link.png](/openhands/static/img/jira-admin-edit.png) + + -from pydantic import SecretStr +### Linear Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md -from openhands.sdk import LLM, Agent, AgentContext, Conversation -from openhands.sdk.context.skills import ( - discover_skill_resources, - load_skills_from_dir, -) -from openhands.sdk.tool import Tool -from openhands.tools.file_editor import FileEditorTool -from openhands.tools.terminal import TerminalTool +# Linear Integration +## Platform Configuration -# Get the directory containing this script -script_dir = Path(__file__).parent -example_skills_dir = script_dir / "example_skills" +### Step 1: Create Service Account -# ========================================================================= -# Part 1: Loading Skills from a Directory -# ========================================================================= -print("=" * 80) -print("Part 1: Loading Skills from a Directory") -print("=" * 80) +1. **Access Team Settings** + - Log in to Linear as a team admin + - Go to **Settings** > **Members** -print(f"Loading skills from: {example_skills_dir}") +2. **Invite Service Account** + - Click **Invite members** + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Role: **Member** (with appropriate team access) + - Send invitation -# Discover resources in the skill directory -skill_subdir = example_skills_dir / "rot13-encryption" -resources = discover_skill_resources(skill_subdir) -print("\nDiscovered resources in rot13-encryption/:") -print(f" - scripts: {resources.scripts}") -print(f" - references: {resources.references}") -print(f" - assets: {resources.assets}") +3. **Complete Setup** + - Accept invitation from the service account email + - Complete profile setup + - Ensure access to relevant teams/workspaces -# Load skills from the directory -repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) +### Step 2: Generate API Key -print("\nLoaded skills from directory:") -print(f" - Repo skills: {list(repo_skills.keys())}") -print(f" - Knowledge skills: {list(knowledge_skills.keys())}") -print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") +1. **Access API Settings** + - Log in as the service account + - Go to **Settings** > **Security & access** -# Access the loaded skill and show all AgentSkills standard fields -if agent_skills: - skill_name = next(iter(agent_skills)) - loaded_skill = agent_skills[skill_name] - print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") - print(f" - Name: {loaded_skill.name}") - desc = loaded_skill.description or "" - print(f" - Description: {desc[:70]}...") - print(f" - License: {loaded_skill.license}") - print(f" - Compatibility: {loaded_skill.compatibility}") - print(f" - Metadata: {loaded_skill.metadata}") - if loaded_skill.resources: - print(" - Resources:") - print(f" - Scripts: {loaded_skill.resources.scripts}") - print(f" - References: {loaded_skill.resources.references}") - print(f" - Assets: {loaded_skill.resources.assets}") - print(f" - Skill root: {loaded_skill.resources.skill_root}") +2. **Create Personal API Key** + - Click **Create new key** + - Name: `OpenHands Cloud Integration` + - Scopes: Select the following: + - `Read` - Read access to issues and comments + - `Create comments` - Ability to create or update comments + - Select the teams you want to provide access to, or allow access for all teams you have permissions for + - Click **Create** + - **Important**: Copy and store the API key securely -# ========================================================================= -# Part 2: Using Skills with an Agent -# ========================================================================= -print("\n" + "=" * 80) -print("Part 2: Using Skills with an Agent") -print("=" * 80) +### Step 3: Configure Webhook -# Check for API key -api_key = os.getenv("LLM_API_KEY") -if not api_key: - print("Skipping agent demo (LLM_API_KEY not set)") - print("\nTo run the full demo, set the LLM_API_KEY environment variable:") - print(" export LLM_API_KEY=your-api-key") - sys.exit(0) +1. **Access Webhook Settings** + - Go to **Settings** > **API** > **Webhooks** + - Click **New webhook** -# Configure LLM -model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") -llm = LLM( - usage_id="skills-demo", - model=model, - api_key=SecretStr(api_key), - base_url=os.getenv("LLM_BASE_URL"), -) +2. **Configure Webhook** + - **Label**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/linear/events` + - **Resource types**: Select: + - `Comment` - For comment events + - `Issue` - For issue updates (label changes) + - Select the teams you want to provide access to, or allow access for all public teams + - Click **Create webhook** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) -# Create agent context with loaded skills -agent_context = AgentContext( - skills=list(agent_skills.values()), - # Disable public skills for this demo to keep output focused - load_public_skills=False, -) +--- -# Create agent with tools so it can read skill resources -tools = [ - Tool(name=TerminalTool.name), - Tool(name=FileEditorTool.name), -] -agent = Agent(llm=llm, tools=tools, agent_context=agent_context) +## Workspace Integration -# Create conversation -conversation = Conversation(agent=agent, workspace=os.getcwd()) +### Step 1: Log in to OpenHands Cloud -# Test the skill (triggered by "encrypt" keyword) -# The skill provides instructions and a script for ROT13 encryption -print("\nSending message with 'encrypt' keyword to trigger skill...") -conversation.send_message("Encrypt the message 'hello world'.") -conversation.run() +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. -print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") -print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") -``` +### Step 2: Configure Linear Integration - +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Linear** section +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API key from Step 2 above + - Ensure **Active** toggle is enabled -### Key Functions + +Workspace name is the identifier after the host name when accessing a resource in Linear. -#### `load_skills_from_dir()` +Eg: https://linear.app/allhands/issue/OH-37 -Loads all skills from a directory, returning three dictionaries: +Here the workspace name is **allhands**. + -```python icon="python" focus={3} -from openhands.sdk.context.skills import load_skills_from_dir +3. **Complete OAuth Flow** + - You'll be redirected to Linear to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI -repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) -``` +### Managing Your Integration -- **repo_skills**: Skills from `repo.md` files (always active) -- **knowledge_skills**: Skills from `knowledge/` subdirectories -- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view -#### `discover_skill_resources()` +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. -Discovers resource files in a skill directory: +### Screenshots -```python icon="python" focus={3} -from openhands.sdk.context.skills import discover_skill_resources + + +![workspace-link.png](/openhands/static/img/linear-user-link.png) + -resources = discover_skill_resources(skill_dir) -print(resources.scripts) # List of script files -print(resources.references) # List of reference files -print(resources.assets) # List of asset files -print(resources.skill_root) # Path to skill directory -``` + +![workspace-link.png](/openhands/static/img/linear-admin-configure.png) + -### Skill Location in Prompts + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + -The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + -``` - -The following information has been included based on a keyword match for "encrypt". +### Project Management Tool Integrations (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md -Skill location: /path/to/rot13-encryption -(Use this path to resolve relative file references in the skill content below) +# Project Management Tool Integrations -[skill content from SKILL.md] - -``` +## Overview -This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. +OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: +- Adding `@openhands` in ticket comments +- Adding the `openhands` label to tickets -### Example Skill: ROT13 Encryption +## Prerequisites -Here's a skill with triggers (OpenHands extension): +Integration requires two levels of setup: +1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) +2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace -**SKILL.md:** -```markdown icon="markdown" ---- -name: rot13-encryption -description: > - This skill helps encrypt and decrypt messages using ROT13 cipher. -triggers: - - encrypt - - decrypt - - cipher ---- +### Platform-Specific Setup Guides: +- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) +- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) +- [Linear Integration (Coming soon...)](./linear-integration.md) -# ROT13 Encryption Skill +## Usage -Run the [encrypt.sh](scripts/encrypt.sh) script with your message: +Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: -\`\`\`bash -./scripts/encrypt.sh "your message" -\`\`\` +### Method 1: Comment Mention +Add a comment to any issue with `@openhands` followed by your task description: ``` - -**scripts/encrypt.sh:** -```bash icon="sh" -#!/bin/bash -echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +@openhands Please implement the user authentication feature described in this ticket ``` -When the user says "encrypt", the skill is triggered and the agent can use the provided script. +### Method 2: Label-based Delegation +Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. -## Loading Public Skills +### Git Repository Detection -OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. +The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: -### Automatic Loading via AgentContext +#### Specifying the Target Repository -Enable public skills loading in your `AgentContext`: +**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. -```python icon="python" focus={2} -agent_context = AgentContext( - load_public_skills=True, # Auto-load from public registry - skills=[ - # Your custom skills here - ] -) -``` +**Supported Repository Formats:** +- Full HTTPS URL: `https://github.com/owner/repository.git` +- GitHub URL without .git: `https://github.com/owner/repository` +- Owner/repository format: `owner/repository` -When enabled, the SDK will: -1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run -2. Load all available skills from the repository -3. Merge them with your explicitly defined skills +#### Platform-Specific Behavior -### Skill Naming and Triggers +**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. -**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. +**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. -**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. +## Troubleshooting -```python icon="python" -# Both skills will be triggered by "/codereview" -agent_context = AgentContext( - load_public_skills=True, # Loads public "code-review" skill - skills=[ - Skill( - name="custom-codereview-guide", # Different name = coexists - content="Project-specific guidelines...", - trigger=KeywordTrigger(keywords=["/codereview"]), - ), - ] -) -``` +### Platform Configuration Issues +- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) +- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings +- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions - -**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. - +### Workspace Integration Issues +- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) +- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first +- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access -### Programmatic Loading +### General Issues +- **Agent not responding**: Check webhook logs in your platform settings and verify service account status +- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access +- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on +- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed -You can also load public skills manually and have more control: +### Getting Help +For additional support, contact OpenHands Cloud support with: +- Your integration platform (Linear, Jira Cloud, or Jira Data Center) +- Workspace name +- Error logs from webhook/integration attempts +- Screenshots of configuration settings (without sensitive credentials) -```python icon="python" -from openhands.sdk.context.skills import load_public_skills +### Slack Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md -# Load all public skills -public_skills = load_public_skills() + -# Use with AgentContext -agent_context = AgentContext(skills=public_skills) + +OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. +While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to +validate critical information independently. + -# Or combine with custom skills -my_skills = [ - Skill(name="custom", content="Custom instructions", trigger=None) -] -agent_context = AgentContext(skills=my_skills + public_skills) -``` +## Prerequisites -### Custom Skills Repository +- Access to OpenHands Cloud. -You can load skills from your own repository: +## Installation Steps -```python icon="python" focus={3-7} -from openhands.sdk.context.skills import load_public_skills + + -# Load from a custom repository -custom_skills = load_public_skills( - repo_url="https://github.com/my-org/my-skills", - branch="main" -) -``` + **This step is for Slack admins/owners** -### How It Works + 1. Make sure you have permissions to install Apps to your workspace. + 2. Click the button below to install OpenHands Slack App Add to Slack + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. -The `load_public_skills()` function uses git-based caching for efficiency: + -- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` -- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date -- **Offline mode**: Uses the cached version if network is unavailable + -This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. + **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** - -Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. - + Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: + 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. + 2. Click `Install OpenHands Slack App`. + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. -## Customizing Agent Context + Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. -### Message Suffixes + -Append custom instructions to the system prompt or user messages via `AgentContext`: + -```python icon="python" -agent_context = AgentContext( - system_message_suffix=""" - -Repository: my-project -Branch: feature/new-api - - """.strip(), - user_message_suffix="Remember to explain your reasoning." -) -``` -- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) -- **`user_message_suffix`**: Appended to each user message +## Working With the Slack App -### Replacing the Entire System Prompt +To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. -For complete control, provide a custom Jinja2 template via the `Agent` class: +Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. -```python icon="python" focus={6} -from openhands.sdk import Agent +To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. +You must be the user who started the conversation. -agent = Agent( - llm=llm, - tools=tools, - system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path - system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} -) -``` +## Example conversation -**Custom template example** (`custom_system_prompt.j2`): +### Start a new conversation, and select repo -```jinja2 -You are a helpful coding assistant for {{ repo_name }}. +Conversation is started by mentioning `@openhands`. -{% if cli_mode %} -You are running in CLI mode. Keep responses concise. -{% endif %} +![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) -Follow these guidelines: -- Write clean, well-documented code -- Consider edge cases and error handling -- Suggest tests when appropriate -``` +### See agent response and send follow up messages -**Key points:** -- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory -- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location -- Pass variables to the template via `system_prompt_kwargs` -- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt +Initial request is followed up by mentioning `@openhands` in a thread reply. -## Next Steps +![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) -- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools -- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers -- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval +## Pro tip + +You can mention a repo name when starting a new conversation in the following formats + +1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) +2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) + +The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. +If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. + +![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) ## OpenHands Overview diff --git a/llms.txt b/llms.txt index d1a8407d..a8ea9e56 100644 --- a/llms.txt +++ b/llms.txt @@ -5,94 +5,6 @@ The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI) from the OpenHands Software Agent SDK. -## OpenHands Web App Server - -- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) -- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. -- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. -- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK -- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). -- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) -- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands -- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). -- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. -- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. -- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. -- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) -- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands -- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. -- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. -- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands -- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) -- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands -- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) -- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). -- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents -- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. -- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) -- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. -- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. -- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. -- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) -- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you -- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands -- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). -- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. -- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. -- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle -- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). -- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. -- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. -- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. -- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. -- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. -- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. -- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. -- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) -- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. -- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. -- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. -- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands -- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) -- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples -- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase -- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) -- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task -- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker - -## OpenHands Cloud - -- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once -- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. -- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on -- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. -- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once -- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) -- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. -- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. -- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. -- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. -- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. - -## OpenHands CLI - -- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options -- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users -- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker -- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines -- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol -- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system -- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs -- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities -- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI -- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes -- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI -- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface -- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents -- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension -- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser -- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol - ## OpenHands Software Agent SDK - [Agent](https://docs.openhands.dev/sdk/arch/agent.md): High-level architecture of the reasoning-action loop @@ -166,6 +78,94 @@ from the OpenHands Software Agent SDK. - [Tool System & MCP](https://docs.openhands.dev/sdk/arch/tool-system.md): High-level architecture of the action-observation tool framework - [Workspace](https://docs.openhands.dev/sdk/arch/workspace.md): High-level architecture of the execution environment abstraction +## OpenHands CLI + +- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options +- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users +- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker +- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines +- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol +- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system +- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs +- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities +- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI +- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes +- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI +- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface +- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents +- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension +- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser +- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol + +## OpenHands Web App Server + +- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) +- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. +- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. +- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK +- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). +- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) +- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands +- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). +- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. +- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. +- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. +- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) +- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands +- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. +- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. +- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands +- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) +- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands +- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) +- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). +- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents +- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. +- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) +- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. +- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. +- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. +- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) +- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you +- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands +- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). +- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. +- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. +- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle +- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). +- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. +- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. +- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. +- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. +- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. +- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. +- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. +- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) +- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. +- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. +- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. +- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands +- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) +- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples +- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase +- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) +- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task +- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker + +## OpenHands Cloud + +- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once +- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. +- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on +- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. +- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once +- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) +- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. +- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. +- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. +- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. +- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. + ## OpenHands Overview - [Community](https://docs.openhands.dev/overview/community.md): Learn about the OpenHands community, mission, and values diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py index 8d45a379..543456af 100755 --- a/scripts/generate-llms-files.py +++ b/scripts/generate-llms-files.py @@ -154,10 +154,10 @@ def iter_doc_pages() -> list[DocPage]: LLMS_SECTION_ORDER = [ + "OpenHands Software Agent SDK", + "OpenHands CLI", "OpenHands Web App Server", "OpenHands Cloud", - "OpenHands CLI", - "OpenHands Software Agent SDK", "OpenHands Overview", "Other", ] From fdf13d688e5ed9266eff7d4738193d393c0403f6 Mon Sep 17 00:00:00 2001 From: openhands Date: Tue, 24 Feb 2026 21:03:24 +0000 Subject: [PATCH 5/6] ci: verify llms files are in sync with generator Co-authored-by: openhands --- .github/workflows/check-llms-files.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 .github/workflows/check-llms-files.yml diff --git a/.github/workflows/check-llms-files.yml b/.github/workflows/check-llms-files.yml new file mode 100644 index 00000000..fd5d798d --- /dev/null +++ b/.github/workflows/check-llms-files.yml @@ -0,0 +1,17 @@ +name: Verify llms context files + +on: + pull_request: + workflow_dispatch: + +jobs: + verify: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Regenerate llms.txt and llms-full.txt + run: python3 scripts/generate-llms-files.py + + - name: Ensure committed llms files are up-to-date + run: git diff --exit-code llms.txt llms-full.txt From 4f39824352943f1f03c9810920f745e358e4fa5e Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 26 Feb 2026 02:14:39 +0000 Subject: [PATCH 6/6] chore: add make targets for llms regeneration Co-authored-by: openhands --- AGENTS.md | 12 ++++++++++-- Makefile | 12 ++++++++++++ 2 files changed, 22 insertions(+), 2 deletions(-) create mode 100644 Makefile diff --git a/AGENTS.md b/AGENTS.md index ae172910..b4581fa1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -34,9 +34,17 @@ Mintlify auto-generates `/llms.txt` and `/llms-full.txt`, but this repo **overri We do this so LLMs get **V1-only** context while legacy V0 pages remain available for humans. - Generator script: `scripts/generate-llms-files.py` -- Regenerate: +- Regenerate (recommended): ```bash - ./scripts/generate-llms-files.py + make llms + ``` + Or directly: + ```bash + python3 scripts/generate-llms-files.py + ``` +- Verify they are up-to-date: + ```bash + make llms-check ``` - Exclusions: `openhands/usage/v0/` and any `V0*`-prefixed page files. diff --git a/Makefile b/Makefile new file mode 100644 index 00000000..76599445 --- /dev/null +++ b/Makefile @@ -0,0 +1,12 @@ +.PHONY: llms llms-check + +# Regenerate the Mintlify llms context files (V1-only override). +# +# See: scripts/generate-llms-files.py +llms: + python3 scripts/generate-llms-files.py + +# Regenerate and fail if llms files changed (useful for local verification). +llms-check: + python3 scripts/generate-llms-files.py + git diff --exit-code llms.txt llms-full.txt